Abstract
We present a Kullback–Leibler (KL) control treatment of the fundamental problem of erasing a bit. We introduce notions of reliability of information storage via a reliability timescale , and speed of erasing via an erasing timescale . Our problem formulation captures the tradeoff between speed, reliability, and the KL cost required to erase a bit. We show that rapid erasing of a reliable bit costs at least , which goes to when .
Keywords:
erasing; information; thermodynamics; Kullback–Leibler; optimal control; reliability; speed; tradeoff 1. Introduction
Biological systems are remarkably ordered at multiple scales and dimensions, from the spatial order witnessed in the packing of DNA inside the nucleus, the arrangement of cells to form tissues, and organs, and whole organisms, to the temporal order witnessed in the execution of various cellular processes. Superficially, such order might appear to violate the second law of thermodynamics which requires an increase in disorder with overwhelmingly high probability. In fact, there is no violation since biological systems expend energy to bring about and maintain this increase in order.
We would like to understand this “energy to order” conversion quantitatively. What are the fundamental limits to this conversion? Order can be measured in terms of information by counting the number of bits required to describe that order. From this point of view, understanding how much energy is required to create order becomes an instance of the investigation of the connection between information processing and thermodynamics. The basic information processing operation that increases order is the operation of “erasing” or resetting a bit to state 0. To fix ideas, imagine erasing random chalk marks from a blackboard, to leave it in a neat and ordered state.
Szilard [1] and later Landauer [2] have argued from the second law of thermodynamics that erasing at temperature T requires at least units of energy, where is Boltzmann’s constant. The Szilard engine is a simple illustration of this result. Imagine a single molecule of ideal gas in a cylindrical vessel. If this molecule is in the left half of the vessel, think of that as encoding the bit “0”, and the bit “1” otherwise. Erasing this Brownian bit corresponds to ensuring that the molecule lies on the left half, for example by compressing the ideal gas to half its volume. For a heuristic analysis, we may use the ideal gas law , integrating the expression for work from limits V to to obtain . More rigorous and general versions of this calculation are known, which also clarify why this is a lower bound [3,4,5].
In practice, one finds that both man-made and biological instrumentation often require energy substantially more than to perform erasing [6,7]. John von Neumann remarked on this large gap in his 1949 lectures at the University of Illinois [8]. Bennett [9] has remarked that DNA polymerases come close to the bound. To copy a single base, a DNA polymerase hydrolyzes a triphosphate molecule to a monophosphate, which provides close to at temperature K. Note that this is still almost two orders of magnitude away from . Furthermore, it is not clear whether the comparison is valid at all since copying and erasing are different operations.
How does one explain this large gap? Note that the result of holds only in the isothermal limit, which takes infinite time. In practice, we want erasing to be performed rapidly, say in time , which requires extra entropy production. For intuition, suppose one wants to compress a gas in finite time . The gas heats up, and pushes back, increasing the work required.
Several groups [10,11,12] have recognized that rapid erasing requires entropy production which pushes up the cost of erasing beyond , and have obtained bounds for this problem. A grossly oversimplified, yet qualitatively accurate, sketch of these various results is obtained by considering the energy cost of compressing the Szilard engine rapidly. Specializing a result from finite-time thermodynamics [13] to the case of the Szilard engine, one obtains an energy cost where σ is the coefficient of heat conductivity of the vessel.
The bounds obtained by such considerations depend on technological parameters like the heat conductivity σ, and not just on fundamental constants of physics and the requirement specifications of the problem. If one varies over the technological parameters as well, e.g., allowing , the energy cost tends to . Does there exist a more fundamental analysis for the cost of erasing that is independent of technological parameters, and improves on ? This is the open question we address in this paper.
Our contribution: We follow up on von Neumann’s suggestion [8] that the gap was “due to something like a desire for reliability of operation”. Swanson [14] and Alicki [15] have also looked into issues of reliability. We introduce the notion of “reliability timescale” , and explicitly consider the three-way trade-off between speed, reliability, and cost.
The other novelty of our approach is in bringing the tools of Kullback–Leibler (KL) control [16,17] to bear on the problem of erasing a bit. The intuitive idea is that the control can reshape the dynamics as it pleases but pays for the deviation from the uncontrolled dynamics. The cost of reshaping the dynamics is a relative entropy or KL divergence between the controlled and the uncontrolled dynamics, expressed as measures on path space.
We find the optimal control for rapid erasing of a reliable bit, and argue that it requires cost of at least , which goes to when . Importantly, our answer does not depend on any technological parameters, but only on the requirement specifications and of the problem.
2. The Erasing Problem
As a model of a bit, consider a two-state continuous-time Markov chain with states 0 and 1 and the passive or uncontrolled dynamics given by transition rates from state 0 to state 1 and from state 1 to state 0.
The transition rates and model spontaneous transitions between the states when no one is looking at the bit or trying to erase it. The time independence of these rates represents the physical fact that the system is not being driven.
Such finite Markov chain models often arise in physics by “coarse-graining”. For example, for the case of the Szilard engine, the transition rate models the rate at which the molecule enters the left side, conditioned on it currently being on the right side.
Apart from their importance in approximating the behavior of real physical systems, finite Markov chains are also important to thermodynamics from a logical point of view. They may be viewed as finite models of a mathematical theory of thermodynamics. The terms “theory” and “model” are to be understood in their technical sense as used in mathematical logic. We develop this remark no further here since doing so would take us far afield.
Suppose the distribution at time t is with . Then, the time evolution of the bit is described by the ordinary differential equation (ODE):
Setting and the reliability timescale , this admits the solution
Here, represents the time scale on which memory is preserved. The smaller the rates and , the larger is the value of , and the slower the decay to equilibrium, so that the system remembers information for longer.
Fix a required erasing time . Fix . We want to control the dynamics with transition rates and to achieve , where
We want to find the cost of the optimal protocol and to achieve this objective, according to a cost function which we introduce next. In particular, when , the equilibrium distribution takes the value , and we can interpret this task as erasing a bit of reliability in time .
Kullback–Leibler Cost
Define the path space of the two-state Markov chain. This is the set of all paths in the time interval that jump between states 0 and 1 of the Markov chain. Each path can also be succinctly described by its initial state, and the times at which jumps occur. We can also effectively think of the path space as the limit as of the space corresponding to the discrete-time Markov chain that can only jump at clock ticks of h units (Figure 1).
Figure 1.
The discrete-time path space . A specific path is labeled in red.
Once the rates and the initial distribution for the Markov chain are fixed, there is a unique probability measure on path space which intuitively assigns to every path the probability of occurrence of that path according to the Markov chain evolution (Equation (3)) with initial conditions p.
For pedagogic reasons, we first describe the discrete-time measure for a single path . First, we describe the transition probabilities of the discrete-time Markov chain. For with , for all times t, define and as the probability of jumping to a and to b, respectively, in the time step t, conditioned on being in state a. Then, the probability of the path i under control u is given by:
We describe the continous-time case now. We could obtain the measure from by sending , but it can also be described more directly. Fix , and consider the set of paths starting at with jumps occurring around times within infinitesimal intervals and leading to the trajectory . Setting :
where is the probability of starting at , is the probability of not jumping in the time interval , is the probability of jumping from to in the interval and so on. This is the well-known Feynam–Kac formula for this Markov chain.
Specializing to and , we obtain the probability measure induced on by the passive dynamics (Equation (1)) with initial conditions p.
We declare the Kullback–Leibler (KL) cost as the cost for implementing the control u. More generally, for a physical system with path space , passive dynamics corresponding to a measure ν on , and a controlled dynamics with a control corresponding to a measure μ on , we declare as the cost for implementing the control. This cost function has been widely used in control theory [16,17,18,19,20,21,22,23,24,25,26,27]. In Section 4, we will explore some other interpretations of this cost function.
3. Solution to the Erasing Problem
Out of all controls , we want to find a control that starts from
and achieves while minimizing the relative entropy ,
This question can be described within the framework of a well-studied problem in optimal control theory that has a closed-form solution [16,17,28]. Following Todorov [16], we introduce the optimal cost-to-go function . We intend to denote the expected cumulative cost for starting at state i at time , and reaching a distribution close to at time .
To discourage the system from being in state 1 at time , define and . Suppose the control performs actions and at time t. Fix a small time . Define the transition probability as the probability that a trajectory starting in state i at time t will be found in state j at time . When , , whereas ignoring terms of size . We define similarly.
Let “log” denote the natural logarithm. To derive the law satisfied by the optimal cost-to-go , we approximate by the backward recursion relations:
where the first expectation is over , and the second is over , and the approximation ignores terms of size . As , the second terms approach the relative entropy cost in path space over the time interval .
Equation (4) says that the cost-to-go from state 0 at time t equals the cost of the control plus the expected cost-to-go in the new state i reached at time . The cost of the control is measured by relative entropy of the control dynamics relative to the passive dynamics, over the time interval .
Define the desirability and . Define
We can rewrite Equation (4) as:
Since the last term is the relative entropy of relative to the probability distribution , its minimum value is 0, and is achieved by the protocol given by:
when .
It remains to solve for and the optimal cost. From Equation (5), at the optimal control , the desirability must satisfy the equation , so that:
which simplifies to in the limit , where is the infinitesimal generator of the Markov chain. This equation has the formal solution where . In the symmetric case, ,
where . Substituting and taking logarithms, we find the cost-to-go function at time 0:
When with , the cost required for erasing a bit of reliability in time is at least:
Note that with equality when , since . From Equation (7), when .
4. Interpreting the KL Cost
One motivation for our cost function comes from the field of KL control theory. We now compare other possible meanings to this cost function.
4.1. Path Space Szilard–Landauer Correspondence
The correspondence between information and thermodynamics was revealed in the work of Szilard, and clarified by Landauer. More rigorous and general treatments of this correspondence have been worked out recently [3,4,5]. We first recall this result, and then show how our cost function is a formal extension of this result.
Consider a physical system with finite state space S and energy . (More general state spaces S can be handled by replacing the sum by an appropriate integral. For our present purposes, it suffices to assume S is finite.) Define the Gibbs distribution π at temperature T by
for all . Define the free energy:
where p is a probability distribution.
Define the relative entropy with Euler’s constant for the base of the logarithm. Following Jaynes [29], assume that equilibrium π corresponds to a maximally uninformative state of the system, so that we have zero information about the system when it is at equilibrium. Recall that a nat is the unit of information when logarithms are taken to the base of Euler’s constant. 1 bit nats. Then, the relative entropy has an axiomatic identification with the amount of information in nats that we know about the system when it is in a nonequilibrium state p [4].
The following identity is easily verified:
The conceptual significance of this simple identity is that it supplies a dictionary between thermodynamics and information theory [4]. In particular, erasing a bit corresponds to increasing relative entropy, which, in turn, corresponds—via the identity—to increasing available free energy by , recovering the classical result of Szilard as an alternative statement of the second law of thermodynamics. In the other direction, charging a battery corresponds to increasing available free energy which in turn corresponds—via Identity Equation (8)—to erasing of information. This relates the energy efficiency of charging a battery to the energy required to erase a bit.
Now, consider our cost function . The relative entropy counts the number of nats erased by the control in path space, relative to the passive dynamics. Since the Szilard–Landauer principle asserts that erasing one bit requires at least units of energy, our cost function may be viewed as a Path Space Szilard–Landauer Principle, formally extending Identity Equation (8) to path space.
4.2. Thermodynamic Interpretation
We wish to compare the cost with the usual thermodynamic expected work . We will quickly outline how thermodynamic quantities can be defined for a two-state Markov chain.
4.2.1. Thermodynamics on a Two-State Markov Chain
The ideas we present here are well-known in the nonequilibrium thermodynamics community, for example see Propp’s thesis [30]. The construction can be carried out more generally, but the generalization is not necessary for our present purposes.
- Consider again the two-state continuous-time Markov chain with passive dynamics given by transition rates and .
Let and denote the internal energy of states “0” and “1”, respectively. Then, the equilibrium distribution is given by and . We also have from detailed balance. Together this yields - Now consider the same two-state system with a control applied to it by means of a field of potential so that the potential energy in state i becomes . The transition rates due to the control become and . By a reasoning similar to how we derived Equation (9), we getCombining with Equation (9), this yields
- Given a distribution on the states, we can define the following thermodynamic quantities:
- Expected internal energy .
- Entropy .
- Nonequilibrium free energy .
- Given a transition from state i to state j in the presence of the control field, we can define the following thermodynamic quantities:
- Heat dissipated .
- Work done by the control . This expression for work can be traced back to Sekimoto [31], and is commonly employed in the field of Stochastic Thermodynamics to describe the work done by switching on a control field [32].
- Entropy increase of the system .
- Suppose the system is described at time t by a distribution . Define the Current so that .
- We can further compute
- Define Total Entropy Production to be the total entropy produced from time 0 to time t. In other words, andAfter simplification,which is a statement of the second law of thermodynamics.
- The following identity is immediateand is another form of the first law.
4.2.2. Thermodynamic Cost for Rapid Erasing of a Reliable Bit
How much does it cost for rapid erasing of a reliable bit, with the cost function equal to ? We claim that it costs . In particular, neither the reliability timescale nor the erasing timescale appear in this answer.
Suppose we can erase a bit for work W. First, note that is a function of , and as in Equation (6). In particular, simultaneously sending the rates and as low as possible while keeping their ratio the same has no effect on the work. Thus, if we can erase a bit for work W, then we can erase a bit for work W, for an arbitrarily large constant A. In particular, it is enough for us to demonstrate a protocol when .
Now, note that depends only on the ratio and not on the actual values of the rates. We can also erase a bit for work W by taking the -protocol and defining new rates . Since , it follows from a simple calculation that the work required does not change.
By taking a limit of this time scaling argument, we only need to erase a bit. Here, the infinite-time isothermal protocol, which proceeds by raising the `1’ well infinitesimally, waiting for the system to equilibrate, and repeating, erases for a total work of since that is the free energy difference between the initial and final state, and there is no extra dissipation. This establishes our claim.
A more detailed version of this calculation can be found in [33]. This work assumes that there is a maximum energy limit to which a state can be raised, so that there will be some small error to erasing. It also makes another assumption about thermalization timescale which translates in our setting to assuming that there is a maximum value to the rates and . With these assumptions, they show that the cost of rapid erasing is slightly more than and goes to very quickly as the timescale of thermal relaxation becomes smaller and goes to infinity.
4.2.3. Link between KL-Cost and Thermodynamic Work
We will now characterize entropy production in terms of time reversal. This will allow us to make a link between KL-cost and the thermodynamic work W.
We first recall the notion of time reversal of a Markov chain. Usually time-reversal is defined for time-homogeneous Markov chains. However, for the purposes of characterizing entropy production in terms of time reversal, we will work with a definition that applies to time-inhomogeneous Markov chains also. Instead of giving this definition in full generality, we work with a Markov chain with a finite state space. This is sufficient for our purposes and allows us to avoid dealing with certain technical issues.
Note that given a matrix U with positive non-diagonal entries and , there is a nonnegative vector v such that . This can be shown by applying the Perron–Frobenius theorem to the exponential matrix .
Definition 1
(Time-reversal). Consider a continuous-time time-inhomogeneous Markov chain with state space described by a time-dependent transition matrix (so that at time t, denotes the rate of jumping to state j given that the system is in state i). Let be a sequence of stationary probability distributions on , i.e., and for all and . Then, the time-reversal Markov chain is described by the time-dependent transition matrix where
A Markov chain is reversible if .
The justification for considering time-reversal comes from Bayes’ rule. Reversible Markov chains are well-known to be characterized by the conditions of existence of a detailed balanced equilibrium, as well as by the Kolmogorov chain conditions. In particular, two-state Markov chains are always reversible.
For the special case of Equation (3) in particular, given a distribution q at time , the time reversal Markov chain evolves in time according to the ODE:
We see that the difference is that, in Equation (3), the boundary condition was specified at time 0, whereas here the boundary condition is specified at time .
We define the time-reversed measure as the measure on path space corresponding to the process described by Equation (12). Strictly speaking, we should write to denote the time at which the boundary condition is provided to the differential equation, but we will avoid this by using the convention that we are always going to set the boundary condition at time when considering the time-reversal.
The following result is key to our comparison.
Theorem 1.
Run the control dynamics Equation (3) forward from initial condition upto time to obtain the distribution . Consider the measure . Then, the total entropy production from time 0 to time equals
Proof.
We will show that the time derivative of the right-hand side (RHS) equals the rate of entropy production. This will prove the theorem.
Fix a time . Let the probability distribution at time t be represented by . Let denote the flow rate from state i to state at time t. Then, where and denotes terms such that .
We will consider the probabilities of the four Markov chain transitions , , and in the interval in the limit according to and according to . Up to terms of size , we have for :
The increment in the relative entropy in the time interval equals, up to terms:
The off-diagonal terms contribute:
Divide by h, and take the limit . We can ignore the off-diagonal terms. The diagonal terms sum to the rate of entropy production as in Equation (11), and we are done. ☐
By the First Law of Thermodynamics and Theorem 1,
where the increase in free energy of the system
by Equation (8). Now, to compare our cost function with .
Theorem 2.
Proof.
Using Equation (8), we can rewrite the claim as
Both left-hand side (LHS) and RHS equal . The assertion for the LHS is straightforward. The assertion for the RHS is true because time-reversal dynamics was defined to keep the stationary distribution π remaining stationary under time reversal. ☐
Comparing Equations (13) and (14), a KL control treatment replaces the total entropy production in Equation (13) by the new term which compares the control dynamics with the time reversal of the passive dynamics. This suggests an interpretation as follows. If we applied the control during some time interval , and remembered what control we applied, then the entropy production is correctly given by . However, the information that a control was applied also needs to be stored somewhere. If we forget that a control was applied, and if application of the control is very rare, then our default model for the dynamics should be much closer to the passive dynamics. In this case, entropy production may be closer to the value .
4.3. Large Deviations Interpretation
Our cost function also admits a large deviation interpretation which was, remarkably, already noted by Schrödinger in 1931 [34,35,36,37]. Motivated by quantum mechanics, Schrödinger asked: conditioned on a more or less astonishing observation of a system at two extremes of a time interval, what is the least astonishing way in which the dynamics in the interval could have proceeded? Specializing to our problem of erasing, suppose an ensemble of two-state Markov chain with passive dynamics given by Equation (1) was observed at time 0 and at time . Suppose the empirical state distribution over the ensemble was found to be the equilibrium distribution π at time 0, and (1, 0) at time , respectively. This would be astonishing because no control has been applied, yet the ensemble has arrived at a state of higher free energy. Conditioned on this rare event having taken place, what is the least unlikely measure on path space via which the process took place?
By a statistical treatment of multiple single particle trajectories, Schrödinger found that the likelihood of an empirical measure μ on path space falls exponentially fast with the relative entropy , where ν is the measure induced by the passive dynamics. In particular, the least unlikely measure is that measure which—among all μ whose marginals at time 0 and time respect the observations—minimizes . Thus, for the problem of erasing, where , the measure μ varies over all measures that have marginal at time 0 and marginal (1, 0) at time , and is that measure among all such μ that minimizes . Thus, our optimal control produces in expectation the least surprising trajectory among all controls that perform rapid erasing.
4.4. Gibbs Measure
Equation (6) is not accidental for this example, but is in fact a general feature when the cost function is relative entropy [28]. More abstractly, the Radon–Nikodym derivative (i.e., “probability density”) of the measure induced on path space by the optimal control is a Gibbs measure with respect to the measure ν induced by the passive dynamics, with the cost-to-go function playing the role of an energy function. In other words, mathematically our problem is precisely the free energy minimization problem so familiar from statistical mechanics. There is also a possible physical interpretation: we are choosing paths in as microstates, instead of points in phase space. The idea of paths as microstates has occurred before [38].
5. Conclusions
Since charging a battery can also be thought of as erasing a bit [4], our result may also hold insights into the limits of efficiencies of rapidly charging batteries that must simultaneously hold their energy for a long time.
So long as the noise is Markovian, we conjecture that the KL cost for erasing the two-state Markov chain is a lower bound for more general cases—for example, for bits with Langevin dynamics [39]—which is a stochastic differential equation expressing Newton’s laws of motion with Brownian noise perturbations.
Acknowledgments
I thank Sanjoy Mitter, Vivek Borkar, Nick S. Jones, Mukul Agarwal, and Krishnamurthy Dvijotham for helpful discussions. I thank Abhishek Behera for drawing Figure 1.
Conflicts of Interest
The author declares no conflict of interest.
References
- Szilard, L. Über die Entropieverminderung in einem thermodynamischen System bei Eingriffen intelligenter Wesen. Z. Phys. 1929, 53, 840–856. (In German) [Google Scholar] [CrossRef]
- Landauer, R. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 1961, 5, 183–191. [Google Scholar] [CrossRef]
- Esposito, M.; van den Broeck, C. Second law and Landauer principle far from equilibrium. Europhys. Lett. 2011, 95, 40004. [Google Scholar] [CrossRef]
- Gopalkrishnan, M. The Hot Bit I: The Szilard–Landauer correspondence. 2013; arXiv:1311.3533. [Google Scholar]
- Reeb, D.; Wolf, M.M. An improved Landauer principle with finite-size corrections. New J. Phys. 2014, 16, 103011. [Google Scholar] [CrossRef]
- Laughlin, S.B.; de Ruyter van Steveninck, R.R.; Anderson, J.C. The metabolic cost of neural information. Nat. Neurosci. 1998, 1, 36–41. [Google Scholar] [CrossRef] [PubMed]
- Mudge, T. Power: A first-class architectural design constraint. Computer 2001, 34, 52–58. [Google Scholar] [CrossRef]
- Von Neumann, J. Theory of Self-Reproducing Automata; University of Illinois Press: Urbana, IL, USA, 1966; p. 66. [Google Scholar]
- Bennett, C.H. The thermodynamics of computation—A review. Int. J. Theor. Phys. 1982, 21, 905–940. [Google Scholar] [CrossRef]
- Aurell, E.; Gawȩdzki, K.; Mejía-Monasterio, C.; Mohayaee, R.; Muratore-Ginanneschi, P. Refined second law of thermodynamics for fast random processes. J. Stat. Phys. 2012, 147, 487–505. [Google Scholar] [CrossRef]
- Diana, G.; Bagci, G.B.; Esposito, M. Finite-time erasing of information stored in fermionic bits. Phys. Rev. E 2013, 87, 012111. [Google Scholar] [CrossRef] [PubMed]
- Zulkowski, P.R.; DeWeese, M.R. Optimal finite-time erasure of a classical bit. Phys. Rev. E 2014, 89, 052140. [Google Scholar] [CrossRef] [PubMed]
- Salamon, P.; Nitzan, A. Finite time optimizations of a Newton’s law Carnot cycle. J. Chem. Phys. 1981, 74, 441482. [Google Scholar] [CrossRef]
- Swanson, J.A. Physical versus logical coupling in memory systems. IBM J. Res. Dev. 1960, 4, 305–310. [Google Scholar] [CrossRef]
- Alicki, R. Information is not physical. 2014; arXiv:1402.2414. [Google Scholar]
- Todorov, E. Efficient computation of optimal actions. Proc. Natl. Acad. Sci. USA 2009, 106, 11478–11483. [Google Scholar] [CrossRef] [PubMed]
- Fleming, W.H.; Mitter, S.K. Optimal control and nonlinear filtering for nondegenerate diffusion processes. Stochastics 1982, 8, 63–77. [Google Scholar] [CrossRef]
- Kappen, H.J. Path integrals and symmetry breaking for optimal control theory. J. Stat. Mech. 2005, 2005, P11011. [Google Scholar] [CrossRef]
- Kappen, H.J. Linear theory for control of nonlinear stochastic systems. Phys. Rev. Lett. 2005, 95, 200201. [Google Scholar] [CrossRef] [PubMed]
- Theodorou, E.A. Iterative Path Integral Stochastic Optimal Control: Theory and Applications to Motor Control. Ph.D. Thesis, University of Southern California, Los Angeles, CA, USA, 2011. [Google Scholar]
- Theodorou, E.; Todorov, E. Relative entropy and free energy dualities: Connections to path integral and KL control. In Proceedings of the 51st IEEE Conference on Decision and Control (CDC), Maui, HI, USA, 10–13 December 2012; pp. 1466–1473.
- Stulp, F.; Theodorou, E.A.; Schaal, S. Reinforcement learning with sequences of motion primitives for robust manipulation. IEEE Trans. Robot. 2012, 28, 1360–1370. [Google Scholar] [CrossRef]
- Dvijotham, K.; Todorov, E. A unified theory of linearly solvable optimal control. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 2011), Barcelona, Spain, 14–17 July 2011.
- Kappen, H.J.; Gómez, V.; Opper, M. Optimal control as a graphical model inference problem. Mach. Learn. 2012, 87, 159–182. [Google Scholar] [CrossRef]
- Van den Broek, B.; Wiegerinck, W.; Kappen, B. Graphical model inference in optimal control of stochastic multi-agent systems. J. Artif. Intell. Res. 2008, 32, 95–122. [Google Scholar]
- Wiegerinck, W.; van den Broek, B.; Kappen, H. Stochastic optimal control in continuous space-time multi-agent systems. 2012; arXiv:1206.6866. [Google Scholar]
- Horowitz, M.B. Efficient Methods for Stochastic Optimal Control. Ph.D. Thesis, California Institute of Technology, Pasadena, CA, USA, 2014. [Google Scholar]
- Dupuis, P.; Ellis, R.S. A Weak Convergence Approach to the Theory of Large Deviations; Wiley: New York, NY, USA, 2011. [Google Scholar]
- Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
- Propp, M.B. The Thermodynamic Properties of Markov Processes. Ph.D. Thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, 1985. [Google Scholar]
- Sekimoto, K. Kinetic characterization of heat bath and the energetics of thermal ratchet models. J. Phys Soc. Jpn. 1997, 66, 1234–1237. [Google Scholar] [CrossRef]
- Seifert, U. Stochastic thermodynamics, fluctuation theorems and molecular machines. Rep. Prog. Phys. 2012, 75, 126001. [Google Scholar] [CrossRef] [PubMed]
- Browne, C.; Garner, A.J.P.; Dahlsten, O.C.O.; Vedral, V. Guaranteed energy-efficient bit reset in finite time. Phys. Rev. Lett. 2014, 113, 100603. [Google Scholar] [CrossRef] [PubMed]
- Schrödinger, E. Uber die umkehrung der naturgesetze, sitzung ber preuss. Akad. Wiss. Berlin Phys. Math. 1931, 2, 144–153. (In German) [Google Scholar]
- Beurling, A. An automorphism of product measures. Ann. Math. 1960, 72, 189–200. [Google Scholar] [CrossRef]
- Föllmer, H. Random fields and diffusion processes. In École d’Été de Probabilités de Saint-Flour XV–XVII, 1985–87; Springer: Berlin/Heidelberg, Germany, 1988; pp. 101–203. [Google Scholar]
- Aebi, R. Schrödinger Diffusion Processes; Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
- Wissner-Gross, A.D.; Freer, C.E. Causal entropic forces. Phys. Rev. Lett. 2013, 110, 168702. [Google Scholar] [CrossRef] [PubMed]
- Zwanzig, R. Nonequilibrium Statistical Mechanics; Oxford University Press: New York, NY, USA, 2001. [Google Scholar]
© 2016 by the author; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).