Jarzyski’s Equality and Crooks’ Fluctuation Theorem for General Markov Chains with Application to Decision-Making Systems

We define common thermodynamic concepts purely within the framework of general Markov chains and derive Jarzynski’s equality and Crooks’ fluctuation theorem in this setup. In particular, we regard the discrete-time case, which leads to an asymmetry in the definition of work that appears in the usual formulation of Crooks’ fluctuation theorem. We show how this asymmetry can be avoided with an additional condition regarding the energy protocol. The general formulation in terms of Markov chains allows transferring the results to other application areas outside of physics. Here, we discuss how this framework can be applied in the context of decision-making. This involves the definition of the relevant quantities, the assumptions that need to be made for the different fluctuation theorems to hold, as well as the consideration of discrete trajectories instead of the continuous trajectories, which are relevant in physics.


Introduction
Over the last 20 years, several advances in thermodynamics have led to the development of results relating equilibrium quantities to nonequilibrium trajectories. Those advances have crystallized in a new area of research, nonequilibrium thermodynamics, where these relations play a major role [1][2][3]. Among them, two of the most remarkable are Jarzynski's equality [4][5][6] and Crooks' fluctuation theorem [7,8], for which experimental evidence has been reported in several contexts: unfolding and refolding processes involving RNA [9,10], electronic transitions between electrodes manipulating a charge parameter [11], the rotation of a macroscopic object inside a fluid surrounded by magnets where the current of a wire attached to the macroscopic object is manipulated [12], and a trapped-ion system [13,14].
These two results have been derived under several assumptions in the context of nonequilibrium thermodynamics, including both deterministic [5,6] and stochastic dynamics [4,7,8,15]. Moreover, it has been argued that both results can be obtained as a consequence of Bayesian retrodiction in a physical context [16]. Here, we derive both of them using only the concepts from the theory of Markov chains. This allows us to both distinguish the mathematical from the physical assumptions underlying them and, thus, to make them available for application in other areas where the framework of thermodynamics may be useful. The distinction between mathematical and physical assumptions will be of particular importance for the definition of work, as we will see, since the usual definition based on physical considerations leads to an asymmetry of the definition in processes that run either forward or backward in time-see Figure 1 for a simple example. This is relevant, for instance, when analysing trajectories in terms of their work value, if we do not know whether they were recorded in the forward direction or whether they have been generated by playing them backwards. Ideally, we would like to be able to ascribe work values directly to trajectories without any additional information.
x 0 x 1 x 2 x 0 x 1 x 2 x 0 x 1 x 2 Figure 1. Relation between a forward process, its corresponding backward process, and the definition of work. We consider a trajectory x = (x 0 , x 1 , x 2 ) and three energy functions E 0 , E 1 , E 2 . The upper (bottom) line of arrows represents the forward (backward) process. A work step W i = E i (x i−1 ) − E i−1 (x i−1 ) is typically defined as the change in energy due to the external change of the energy function, whereas a heat step Q i = E i (x i ) − E i (x i−1 ) is defined as the change in energy due to internal state changes. (a) Typical relation between the forward and backward processes in physics [8]. Work in the forward process would be W F = E 1 (x 0 ) − E 0 (x 0 ) + E 2 (x 1 ) − E 1 (x 1 ), whereas the backward work under the same definition would be W B = E 1 (x 1 ) − E 2 (x 1 ). Instead, backward work is usually defined as W B = E 1 (x 1 ) − E 2 (x 1 ) + E 0 (x 0 ) − E 1 (x 0 ) to fulfil the physical time reversal symmetry W F = −W B . (b) Another typical protocol in physics [15,17]. In this case, the asymmetry is the other way round, where E 2 does not influence the forward process. (c) Symmetric protocol where both forward and backward work follow the same definition with W F = E 1 (x 1 ) − E 0 (x 0 ) and W B = E 0 (x 1 ) − E 1 (x 1 ) = −W F . This is the protocol we propose in Section 4.
One of the application areas where the framework of thermodynamics has recently been investigated outside the realm of physics is the analysis of simple learning systems [1,[18][19][20][21][22][23][24][25][26] and, in particular, the problem of decision-making under uncertainty and resource constraints [22,[27][28][29][30][31]. The basic analogy follows from the idea that decision-making involves two opposing forces: (i) the tendency of the decision-maker toward better options (equivalently, to maximize a function called utility) and (ii) the restrictions on this tendency given by the limited information-processing capabilities of the decision-maker, which prevents him/her from always picking the best option and is usually modelled by a bound on the entropy of the probability distribution that describes the decision-maker's behaviour. Thermodynamic systems are also explained in terms of two opposing forces, the first being the energy, which the system tries to minimize, and the second being the entropy, which prevents the minimization of the energy to its full extent. Thus, in both cases, we formally deal with optimization problems under information constraints, and thus, we can conceptualize both decision-making and thermodynamics in terms of information theory. In particular, we can consider the environment in which a decision is being made or a thermodynamic system is immersed as a source of information (in the form of either utility or energy), which, due to the noise (modelled by entropy), reaches the decision-making or thermodynamic system with some error. This results in an imperfect response by the system.
The analogy between thermodynamics and decision-making is not restricted to the equilibrium case [22,[27][28][29][30][31], but can be taken further from equilibrium to nonequilibrium systems. In particular, the aforementioned fluctuation theorems of Jarzynski and Crooks have been previously suggested to apply to decision-makers that adapt to changing environments [32]. In this previous work, hysteresis and adaptation were investigated in decision-makers, however, based on the physical convention of defining work differently for forward and backward processes. Here, we improve on the work there by replacing this convention with a different energy protocol that naturally entails a symmetric definition of work and by weakening the assumptions that are actually needed in order for the fluctuation theorems to hold in the context of general Markov chains. Given the fact that the literature on this topic belongs for the most part to thermodynamics, we adopt the thermodynamic notation here. In particular, we consider energy functions instead of utilities and take Markov chains as the starting point.
Our manuscript is organized as follows. In Section 2, we introduce the notions of work and other thermodynamic concepts that are inherent to Markov chains, that is, in contrast to the formalism in physics [7,8,15], we start with the assumption of a Markov chain and deduce all other concepts from that without the need to presuppose the existence of an external energy function. We discuss under what conditions these concepts are uniquely specified from the Markov chain. In Section 3, we use this framework to weaken the derivation of Jarzynski's equality (Theorem 1) in the context of decision-making that was presented in [32]. In Section 4, we prove Crooks' fluctuation theorem (Theorem 2 and Corollary 2) within the same setup. In particular, we use an additional assumption that is not mandatory in Crooks' work [7,8,15], but here is needed given the inherent nature of our definition of work. In fact, we provide an example in which the new requirement is violated and, as a consequence, Crooks' theorem is false everywhere (Proposition 3). In Section 5, we discuss how the concepts we have developed can be applied to decision-making systems.
Notice, for simplicity, we develop the results for discrete-time Markov chains with finite state spaces. However, the ideas can be translated, for example, to continuous state spaces by assuming for densities of Markov kernels the properties we use here for transition matrices. We include a few more details regarding this scenario in the discussion (Section 5), where we also briefly address the case of continuous-time Markov chains.

Definitions of Energy, Heat, and Work
In this section, we assume there is some stochastic process that can be modelled as a Markov chain and discuss the definition of the energy, partition function, free energy, work, heat, and dissipated work in such a context. We follow the terminology in [33].
We call a finite number of random variables X = (X n ) N n=0 over a finite state space S a Markov chain if we have for any 0 < n ≤ N that for all (x 0 , x 1 , . . . , x n ) ∈ S n+1 . Notice that we can characterize X by a probability distribution p 0 , where p 0 (x) = P(X 0 = x) and N transition matrices (M n ) N n=1 given by (M n ) xy := P(X n = x|X n−1 = y) for all x, y ∈ S and 1 ≤ n ≤ N. Notice that any transition matrix M n is a stochastic matrix, that is we have ∑ x∈S (M n ) xy = 1 for all y ∈ S. If p is a distribution, M n a transition matrix of X for some fixed n, and we have M n p = p, where (M n p)(x) := ∑ y∈S (M n ) xy p(y), then we say p is a stationary distribution of M n . Note that this terminology applies to a single M n , that is a stationary distribution p of M n is stationary with respect to the (homogeneous) Markovian dynamics of that fixed transition matrix, and generally not with respect to the (inhomogeneous) Markovian dynamics of X. If p fulfils then we say p satisfies detailed balance with respect to M n . Note that such a p is a stationary distribution of M n . In case M n has a unique stationary distribution p that satisfies detailed balance with respect to M n , then we may simply say M n satisfies detailed balance. We say a transition matrix M n is irreducible if, for any pair x, y ∈ S, there exists an integer m ≥ 1 such that (M m n ) xy > 0. Irreducible transition matrices have a useful property, which we present in Lemma 1 (see ([33], Corollary 1.17 and Proposition 1.19) for a proof).

Lemma 1.
If a transition matrix is irreducible, then it has a unique stationary distribution. Furthermore, the stationary distribution has non-zero entries.
If X has initial distribution p 0 , irreducible transition matrices (M n ) N n=1 , and p N is the unique stationary distribution of M N , then we say the Markov chain Y := (Y n ) N n=0 with initial distribution p N and transition matrices (M N−(n−1) ) N n=1 is the time reversal of X. Notice that there are different notions of time reversal. A discussion can be found in ( [34], Section III).
Given a distribution p on S such that p(x) > 0 for all x ∈ S and some β > 0, we can associate with p an energy function E, that is a function E : S → R such that where Z := ∑ x∈S e −βE(x) is called a partition function and F := − 1 β log(Z) the corresponding free energy. Notice that, given a distribution p with two energy functions E and E , there is a constant c ∈ R such that we have and, accordingly, where F and F are the free energies for p using E and E , respectively. Thus, each distribution where all entries are strictly positive has, up to a constant, a unique energy function associated with it. For the remainder of this section, let X := (X n ) N n=0 be a Markov chain such that p 0 has non-zero entries and each transition matrix M n has a unique stationary distribution p n with non-zero entries. In this context, we call a family E = (E n ) N n=0 of functions E n : S → R a family of energies of X, if E n is an energy function of p n for all 0 ≤ n ≤ N. We define the work of a realization x = (x 0 , x 1 , .., x N ) ∈ S N+1 of X, with respect to a family of energies E of X, as Given another family of energies E = (E n ) N n=0 of X, we have where c n := E n − E n are constants by (2). Hence, without fixing a family of energies of X, the work defined in (4) is unique up to a constant. Whenever X and E are clear from the context, we may simply use W instead of W X,E for brevity.
Similarly, the heat of a realization x of X with respect to a family of energies E is given by Given another family of energies E := (E n ) N n=0 of X, we have by (2). We may, thus, use Q X instead of Q X,E , or even Q in case X is clear. Moreover, if F n is the free energy associated with p n for any 0 ≤ n ≤ N, we call ∆F X,E := F N − F 0 the free energy difference associated with E, satisfying by (3). While both W X,E and ∆F X,E depend on the difference between the constants c N and c 0 , the so-called dissipated work: does not, that is, for any realization x of X, we have as a consequence of (5) and (8).
Notice, in the context of the Markov chain framework we have adopted here, the first law of thermodynamics is a direct consequence of the definitions of work and heat (see (12) below): The second law of thermodynamics can be obtained as well, which is a direct consequence of Jarzynski's equality, as we will see in Section 5.

Main Result: Fluctuation Theorems for Markov Chains
Our main results are versions of Jarzynski's equality [4][5][6] and Crooks' fluctuation theorem [7,8,15] for Markov chains, the derivations of which can be found in Sections 3 and 4 below.
Consider a Markov chain X = (X n ) N n=0 on a finite state space whose initial distribution p 0 has non-zero entries and whose transition matrices (M n ) N n=1 are irreducible. Then, for any family of energies where · denotes the expectation operator and W = W X,E , ∆F = ∆F X,E . In physics, this result is known as Jarzynski's equality, which has been shown in the past to hold under various conditions. In Section 3, we give a simple proof in the context of Markov chains as a direct consequence of the definitions of work and free energy (Theorem 1). Moreover, if all transition matrices of X satisfy detailed balance and p 0 is the stationary distribution of M 1 , then for any possible work value, w, we have where Y is the time reversal of X with energies E = ( E n ) N n=0 . The analogous result in physics is known as Crooks' fluctuation theorem. In Section 4, we prove a slightly more general version in the context of Markov chains without detailed balance (Theorem 2), which is then applied in Corollary 2 to obtain Crooks' theorem.

Jarzynski's Equality for Markov Chains
Jarzynski's equation was originally derived for deterministic dynamics [5,6] (see also [35]) and later extended to stochastic dynamics [4] using a Master equation approach. Shortly after that, it was shown in the non-deterministic Markov chain context relying on assumptions about the time reversal of the dynamics [8]. In Theorem 1 below, we see that, in the context of Markov chains, Jarzynski's equality is a straightforward consequence of the definitions of work and free energy. Importantly, it does not require any assumptions regarding time reversal, in contrast to the requirements in a previous decision-theoretic approach to fluctuation theorems in [32].
For the proof of Jarzynski's equality for Markov chains, the basic observation is that we start with the expected value of a quantity closely related to the equilibrium distributions of our initial Markov chain X, namely e −βW(X) . With this in mind, we define a new Markov chain Y using the transition matrices of X and the equilibrium distributions of the individual steps. In particular, we define it in a way such that we cancel the dependency of e −βW(X) on X and end up with a constant, whose expected value over Y is the constant itself. We include the details in the following theorem.
Theorem 1 (Jarzynski's equality for Markov chains). If X = (X n ) N n=0 is a Markov chain on a finite state space S whose initial distribution p 0 has non-zero entries and whose transition matrices (M n ) N n=1 are irreducible, then we have, for any family of energies where W = W X,E and ∆F = ∆F X,E .
Proof. Notice that, by Lemma 1, we have p n (x) > 0 for all x ∈ S and 0 ≤ n ≤ N. We first define a new Markov chain Y := (Y n ) N n=0 with initial distribution p N and with transition matrices ( M n ) N n=1 , where for all x, y ∈ S, for 0 ≤ n ≤ N − 1. Notice that M n is a stochastic matrix for 1 ≤ n ≤ N, as we have for all where we applied (11) in the first equality and the fact that p n is a stationary distribution for M n for 1 ≤ n ≤ N by the assumption in the last equality. Thus, Y is well defined. Note that, by definition we have We can now use Y to show the result: where we use the Markov property in (i) and apply (11) in (ii). In (iii), we cancel the repeated terms coming from the definition of ( M n ) N n=1 and from e −βW(x) , since we have which leads to Lastly, we apply the definition of ∆F and the normalization of Y in (iv).
The main purpose of proving Theorem 1 is to show that Jarzynski's equality can be obtained under milder conditions than the ones that were considered before in decisionmaking. It should be noted that, in contrast to the usual approaches such as [8], where Y is assumed to be the time reversal of X, here, it is just a convenient mathematical object for the proof. A similar approach to Jarzynski's equality in the context of continuous-time Markov chains can be found in [36]. Our approach to the discrete-time case in Theorem 1 is much simpler, since we do not need measure-theoretic concepts nor smoothness assumptions.

Crooks' Fluctuation Theorem for Markov Chains
The original derivation of Crooks' fluctuation theorem for Markovian dynamics [7,8,15] was carried out using a definition of work different from the one in (4). In this section, we derive the theorem for Markov chains using (4) and comment on the difference between these approaches in the discussion. As discussed in the Introduction, an additional hypothesis is needed for the result to hold in our setup. We derive Crooks' fluctuation theorem using this additional assumption in Theorem 2 and Corollary 2 below, and then, in Proposition 3, we provide an example where this requirement is violated and the theorem is false everywhere.
Before proving the intermediate results and, finally, Crooks' fluctuation theorem, let us briefly sketch the procedure we follow throughout this section. We start with Proposition 1, where we use the same Markov chain Y that we defined in the proof of Theorem 1 and obtain, along similar lines, a more precise relation between X and Y, in particular between the probability of some realization of X and that of the same realization (with the events taking place in reversed order) of Y. As a matter of fact, we show that, for a given X, Y is (roughly) the only Markov chain fulfilling such a relation. Then, in Proposition 2, we show how the driving signals of X and Y are related. In order to do so, we exploit the relation between the equilibrium distributions of X and Y, which comes from the fact the both Markov chains share the same equilibrium distributions. By combining these two propositions, we reach a relation between the probability distributions of the driving signals of X and Y in Theorem 2. Lastly, in Corollary 2, we impose an extra condition on the equilibrium distributions of X in order to obtain Crooks' fluctuation theorem in its usual form. We proceed now to show the details involved in the argument we just presented. We start by proving Proposition 1.
n=0 is a Markov chain on a finite state space S whose initial distribution p 0 has non-zero entries and whose transition matrices (M n ) N n=1 are irreducible, then there exists a unique Markov chain for any family of energies E = (E n ) N n=0 of X. Moreover, this unique Y satisfies where W = W X,E , ∆F = ∆F X,E , and x R := (x N , . . . , x 0 ) denotes the reversal of x. In particular, the probability of X following x is the same as the probability of Y following x R if and only if ∆F = W(x).
Proof. Consider the Markov chain Y = (Y n ) N n=0 defined in the proof of Theorem 1, which is well defined, since we have the same hypotheses. We can proceed in the same way as in Theorem 1 to obtain where we applied the Markov property of X in (i), (11) in (ii), and the definition of Q plus the Markov property of Y in (iii). This proves the existence of a Markov chain with the desired properties. Moreover, we have where we applied (4), the definition of conditional probability and the fact X 0 follows p 0 in (i), the definition of the conditional probability, the fact that Y 0 follows p N in (ii), and both the definitions of Q and ∆F plus (12) in (iii).
It remains to show the uniqueness of Y. Assume Z = (Z n ) N n=0 is a Markov chain with transition matrices (M n ) N n=1 such that Z 0 ∼ p N , (13) holds, and (M n+1 ) xx = (M N−n ) xx for all x ∈ S and 0 ≤ n ≤ N − 1. Consider some n such that 1 < n < N and some a, b ∈ S with a = b. By (13), we have Applying the Markov property, the fact that (M n+1 ) xx = (M N−n ) xx ∀x ∈ S, 0 ≤ n ≤ N − 1, and the definition of E N−(n−1) , we obtain Since the argument also works for n = 1 and n = N, the case a = b holds by definition, and We say X is microscopically reversible [15] if (13) is satisfied for Y being the time reversal of X, i.e., if the unique Markov chain Y that exists by Proposition 1 has initial distribution p N and transition matrices (M N−n+1) ) N n=1 . Notice, if this property holds, then (14) relates the probability of observing a realization x of a Markov chain X with that of observing the reversed realization x R in the time reversal Y of X, that is when starting with the equilibrium distribution of the last environment and choosing according to the same conditional probabilities, but in reversed order. This is the case if and only if the transition matrices of X satisfy detailed balance, as we show in the following lemma.
n=0 is a Markov chain on a finite state space S with irreducible transition matrices (M n ) N n=1 and initial distribution with non-zero entries, then M n satisfies detailed balance for 1 ≤ n ≤ N if and only if X is microscopically reversible.
Proof. If X satisfies detailed balance, that is if each p n satisfies (1), then by the definition of M in (11), we have M n = M N−(n−1) for each 1 ≤ n ≤ N. Hence, in this case, the Markov chain Y constructed in Theorem 1 and Proposition 1 is the time reversal of X, and so, X is microscopically reversible by definition.
It remains to show that, if X is microscopically reversible, then its transition matrices satisfy detailed balance. Let Y be the time reversal of X. Since, by assumption, Y 0 ∼ p N and M N−(n−1) = M n , where ( M n ) N n=1 are the transition matrices of Y, we can follow the proof of Proposition 1 to obtain for 1 ≤ n ≤ N Thus, for each 1 ≤ n ≤ N, p n satisfies detailed balance with respect to M n .
In particular, this means that, if X satisfies detailed balance, then the time reversal Y of X satisfies (14), that is for any family of energies E = (E n ) N n=0 of X, where W d := W d X is the dissipated work of X. Thus, in this case, dissipated work is an unambiguous measure of the discrepancy between the probability of observing a realization of X and the probability of observing the same trajectory in reversed order in the time reversal of X. We have, hence, an unambiguous measure of hysteresis (see Section 5).
Before showing Theorem 2 and Crooks' fluctuation theorem, we relate the work of X with that of its time reversal. Proposition 2. If X = (X n ) N n=0 is a Markov chain on a finite state space S with the initial distribution with non-zero entries and irreducible transition matrices, then there exists a Markov chain Y = (Y n ) N n=0 and a constant k ∈ R such that where E and E are families of energies of X and Y, respectively. Moreover, if the stationary distribution p 1 of M 1 coincides with the initial distribution p 0 of X, then there exists a constant Proof. Let Y = (Y n ) N n=0 be the Markov chain defined in the proof of Theorem 1. Since work is well defined (up to a constant) for both X and Y, by Lemma A2 (see Appendix A), we can use the relation between the energy functions of both chains in (A2) to show (16). We have where in (i) we defined c := E 1 − E 0 , which is a constant since E 0 = E N + k 0 and E 1 = E N + k 1 by Lemma A2 (see Appendix A). In (ii), we applied the definition of x R and (A2), cancelled the repeated k n , defined as in Lemma A2, for all 1 < n < N, and introduced k : we rewrite the sum in terms of m := N − n.
For the second statement, notice that, if p 0 is the stationary distribution of M 1 , then there exists a constant c such that E 1 = E 0 + c by (2). Thus, we have k : If detailed balance holds, then (16) and (17) relate the work along a realization x of X with the reversed realization x R of its time reversal Y. More precisely, we obtain the following corollary.

Corollary 1 (When work is odd under time reversal).
If all transition matrices of X in Proposition 2 satisfy detailed balance and Y is the time reversal of X, then the constants k in (16) and (17) can be taken to be zero.
Proof. For the constant in (16), we simply choose E N = E 1 and E 0 = E N , and for the constant in (17), we choose E N = E 0 and E 0 = E N , which we can do since p 0 is the stationary distribution of M 1 and, by Lemma A2 (see Appendix A), also that of M N .

Remark 1.
Note that choosing the energy functions in Corollary 1 is unnecessary whenever both X and Y are thermodynamic processes. Although energy is defined only up to a constant in thermodynamics, it would make no sense to pick the constants differently when dealing with a system where the same dynamics occur more than once. Thus, there, we have E n = E N−(n−1) for 1 ≤ n ≤ N, E 0 = E 1 , and in case p 0 is a stationary distribution of M 1 , E 1 = E 0 = E N . In particular, when taking Y to be the time reversal of X in thermodynamics, we always have k = 0 in (16), and in case p 0 is a stationary distribution of M 1 , k = 0 in (17).
The non-constant term E 1 (x 0 ) − E 0 (x 0 ) in (16), which remains even when X satisfies detailed balance, follows from an asymmetry between X and Y. In particular, X goes from E 0 to E N , whereas Y goes from E N to E 1 , because while X 0 ∼ p 0 ∝ e −βE 0 and the stationary distribution of M N is p N ∝ e −βE N , which is also the initial distribution of Y, the stationary distribution of the final transition matrix M N of Y is p 1 ∝ e −βE 1 (and not p 0 ). Furthermore, while X may begin with a change in the energy function, since p 1 = p 0 is allowed, Y does not, as p N is both the initial distribution of Y and the stationary distribution of M 1 . An example can be found in Figure 2. This asymmetry is erased if we assume that p 0 is the stationary distribution of M 1 , in which case, for Markov chains X that satisfy detailed balance, the work along any realization of X has the opposite sign of the work along the reversed realization of Y. That is, thermodynamic work becomes odd under time reversal.
The following theorem contains Crooks' fluctuation theorem as the special case when X satisfies detailed balance (see Corollary 2 below).
x 0 x Simple example showing how the asymmetry between X and its time reversal Y manifests in thermodynamics. We consider  (16) and Corollary 1. Thus, thermodynamic work is not odd under time reversal in general.

Theorem 2.
If X = (X n ) N n=0 is a Markov chain on a finite state space S whose initial distribution p 0 has non-zero entries, whose transition matrices (M n ) N n=1 are irreducible, and where p 0 is the stationary distribution of M 1 , that is p 1 = p 0 , then there exists a Markov chain Y = (Y n ) N n=0 and a constant k ∈ R such that P(W X,E = w) P(W Y, E = −w + k) = e β(w−∆F) ∀w ∈ supp(P(W X,E )), (18) where E = (E n ) N n=0 and E = ( E n ) N n=0 are families of energies of X and Y, respectively, and supp(P(W X,E )) denotes the support of the probability distribution of W X,E , that is the values that can be taken by W X,E with non-zero probability.
n=0 be the Markov chain defined in the proof of Theorem 1. Note that, for any family of energies E of X, there exists a constant c such that E 1 = E 0 + c by (2), since p 0 is the stationary distribution of M 1 . Given some w ∈ W X,E (S N+1 ), we have where we applied (14) in (i) and Proposition 2 plus the fact that E 1 = E 0 + c in (ii). To obtain (18), it remains to show that P(W Y, E = −w + k) > 0 for all w ∈ supp(P(W X,E )). By definition, there exists some x ∈ S N+1 such that W X,E (x) = w and P(X = x) > 0. By Proposition 2, W Y, E (x R ) = −w + k. Since P(X = x) > 0 and the entries of the (unique) stationary distributions of X are also non-zero by Lemma 1, we can use (11) plus the Markov property for both X and Y to show P(Y = x R ) > 0, implying P(W Y, E = −w + k) > 0.
As the special case when each transition matrix of X in Theorem 2 satisfies detailed balance, we obtain Crooks' fluctuation theorem modified by the additional assumption of p 0 = p 1 . Proof. As can be seen from the proof of Theorem 2 and Corollary 1, in the case of detailed balance, we can choose Y to be the time reversal of X. Moreover, the origin of the constant k in Theorem 2 is Equation (17). By Corollary 1, this constant can be set to zero if the transition matrices of X satisfy detailed balance.
Notice, in most of the literature on Crooks' fluctuation theorem, one writes P F (W) for the probability P(W X,E ) of the work along the so-called forward process X and P B (W) for the probability P(W Y,Ê ) of the work along the so-called backward process Y (the time reversal of X), so that, by Corollary 2, under detailed balance, Equation (18) reads The condition that p 0 is the stationary distribution of M 1 is not necessary in Crooks' original work [7,8,15,17]. Nonetheless, it is fundamental in our approach: Crooks' fluctuation theorem can even be false for every work value if p 0 is not the stationary distribution of M 1 , as we show in Proposition 3. Proof. We consider a state space with three components S = {A, B, C}, a Markov chain with two steps X = (X 0 , X 1 ) and a, b, c > 0, where b < c and a + b + c = 1. We take E 0 as the energy function associated with X 0 , where E 0 (A) = log 1 a , E 0 (B) = log 1 b and E 0 (C) = log 1 c , and E 1 as the energy function associated with X 1 , where E 1 (A) = E 0 (A), E 1 (B) = E 0 (C) and E 1 (C) = E 0 (B). Taking β = 1, we obtain both free energies F 1 and F 0 equal to one. Notice that we have p 0 = (a, b, c), with non-zero entries, and p 1 =  (a, c, b). Notice, also, that W X ( We can easily see that (19) is not defined for w = w B , w C and that it is false, although defined, for w = w A . We have W Y (x) = 0 ∀x ∈ S 2 , since p 1 is both the starting distribution of the time reversal of X and the stationary distribution of its only transition matrix. Thus, we have P F (W = w C ) = P(X 0 = C) = c > 0 and P B (W = −w C ) = 0, which means (19) is not defined at W = w C . We obtain, analogously, that it is not defined for W = w B . For W = w A , we have P B (W = −w A ) = 1 and P F (W = w A ) = P(X 0 = A) = a < 1 = e βw A which means (19) is defined and false there. Although the argument is independent of the transition matrix for X, fix for completeness, since it has non-zero entries and fulfils detailed balance with respect to p 1 .

Discussion: Application to Decision-Making
The bridge connecting thermodynamics and decision-making is an optimization principle directly inspired by the maximum entropy principle [27,30,37,38]. In particular, given a finite set of possible choices S, the optimal behaviour is given by the distribution p ∈ P S that optimally trades utility and uncertainty according to the following optimization principle: where P S is the set of probability distributions over S, H is the Shannon entropy, E q [ · ] denotes the expected value over q ∈ P S , and U : S → R is a utility function, that is a function that assigns larger values to options in S that are preferred by the decision-maker. Note that the main difference between (20) and the maximum entropy principle is the the substitution of an energy function E : S → R by a utility U, which behaves as a negative energy function U = −E (in the sense that it is the force that opposes uncertainty). As a result of their similarity, both principles yield the same result, namely the Boltzmann distribution: where β is a trade-off parameter between uncertainty and utility/energy and Z is a normalization constant. The analogy between thermodynamics and decision-making can be taken further by considering not only their optimal distribution, but how they transition between different distributions in the path towards the optimal one. Here, the notion of uncertainty is useful again, although this time, it is relative to the optimal distribution p. More specifically, we can think of the transitions the decision-maker undergoes as being driven by the reduction of uncertainty with respect to the optimal distribution p, which can be modelled by the dual of p-majorization [39]. In this approach, given q, q ∈ P S , the decision-maker transitions from q to q , which we denote by q p q , if q is closer to p than q (see [39] for a rigorous definition of the dual of p and, hence, of closer). For us, the important fact about p is that, as it turns out ( [40], Theorem 2), the transitions that are allowed by p are precisely the ones that result from applying a transition matrix that has p as a stationary distribution. More precisely, where M p is a matrix whose rows are normalized and fulfils M p p = p, that is a stochastic matrix for which p ∈ P S is a stationary distribution [33]. Importantly, the transition matrix assumption is common in the study of both thermodynamic and decision-making systems [8,15]. In our current study, the situation we have in mind is that of a decision-maker that has to make a sequence of decisions under varying environmental conditions. We model this as a stochastic process that behaves like a Markov chain and introduce, starting from a Markov chain, the thermodynamic tools we use to describe it: energy, partition function, free energy, work, heat, and dissipated work. The behaviour of the decision-maker then corresponds to a decision vector (x 0 , x 1 , . . . , x n ) collected over n potentially different environments. In this most general decision-making scenario, where the environment is changing over time, the optimal distribution p is changing as well. Thus, we can regard the decision-making process as a sequence of transition matrices M p , where each M p corresponds to a particular environment with optimal response p. We could imagine, for example, a gradient descent learner that would converge to p for any given environment, presuming we allow for sufficient gradient update steps. Otherwise, the gradient learner, or any other optimizationbased decision-making agent (e.g., following a Metropolis-Hastings optimization scheme), would lag behind the environmental changes and the environment would outpace the learner. In this general decision-making scenario, we can then study the relation between the optimal behaviour and the non-optimal one by fluctuation theorems [1][2][3] like Jarzynski's equality [4][5][6] and Crooks' fluctuation theorem [7,8]. While we focus on these two fluctuation theorems in our current study, similar arguments may be suitable to transfer other fluctuation theorems that have been considered in the thermodynamic literature (see, for example, [1]) to a decision-making scenario.
Jarzynski's equality in decision-making: Although the strong requirements in Lemma 2 were used in [8] to derive Jarzynski's equality through (15) and were assumed in the only approach we know for it in decisionmaking [32], weaker assumptions that do not involve the time reversal of X are sufficient (cf. Theorem 1). The same properties have been used to derive Jarzynski's equality following a different method in [4].
Example application: Jarzynski's theorem: In a decision scenario, the energy becomes a loss function that a decision-maker is trying to minimize. If this loss function changes over time, we can conceptually distinguish changes in loss that are induced externally by changes in the environment (e.g., given data), from changes in loss due to internal adaptation when a learning system changes its parameter settings. The externally induced changes in loss correspond to the physical concept of work and drive the adaptation process. Hence, we can consider the decision-theoretic equivalent of physical work as a driving signal: the (negative) surprise experienced by the decision-maker, given that it adds the (negative) surprise that he/she experiences at each step (which can be quantified by the difference in energy/utility evaluated at the decision-maker's state when the environment changes) [32,41]. With this in mind, we can use Jarzynski's equality to obtain a bound on the decision-maker's expected surprise. In particular, by applying Jensens' inequality on Jarzynski's equality, one obtains (Note that this is a version of the second law of thermodynamics [2].) Hence, (22) provides a bound on the expected surprise. While a similar bound has been previously pointed out for decision-making systems [32], here, we re-derive it under a novel energy protocol and with weaker assumptions regarding time reversibility.
Crooks' theorem in decision-making: Even though the assumption that p 0 must be the stationary distribution of M 1 seems to restrict the applicability of Crooks' fluctuation theorem in our decision-theoretic setup when compared to the usual thermodynamics one (see Corollary 2 and Proposition 3), it is actually not an issue from an experimental point of view when the Markov chains correspond to thermodynamic processes. This is the case because of the way one is able to sample from a Boltzmann distribution given a thermodynamic system. One of the assumptions in Corollary 2 is that X 0 should follow such a distribution for E 0 . For this to be fulfilled, one needs to wait until the system relaxes to such a state. Because of that, one can think of any trajectory as having an additional point that was also sampled from the Boltzmann distribution for E 0 . Thus, the assumption that p 0 is the equilibrium distribution of M 1 is always fulfilled, and the experimental range of validity of Crooks' fluctuation theorem in our setup remains equal to the one in nonequilibrium thermodynamics [8,15].
In particular, the new constraint is fulfilled in previous experimental setups supporting the theorem (see, for example, [9] or [11]).

Example application: Crooks' theorem:
Hysteresis is a well-known effect that takes place in some physical systems and refers to the difference in the system's behaviour when interacting with a series of environments compared to its response when facing the same conditions in reversed order [2]. The same idea also applies to decision-making systems. In fact, hysteresis has been reported in both simulations of decision-making systems [32], as well as in biological decision-makers recorded experimentally [41,42]. Given that it refers to the difference between decisions when the order in which the environments are presented is reversed, (15) and (19) constitute quantitative measures of hysteresis. In particular, Reference [41] used this measure successfully to quantify hysteresis in human sensorimotor adaptation, where human learners had to solve a simple motor coordination task in a dynamic environment with changing visuomotor mappings. While a simple Markov model proved adequate to model sensorimotor adaptation, it should be noted that more complex learning scenarios involving long-term memory and abstraction would not be captured by such a simple model.

Detailed balance:
Detailed balance is not required neither for Jarzynski's equality (Theorem 1), nor for the more general form of Crooks' fluctuation theorem we presented in Theorem 2. It is, however, required in order to choose Y to be the time reversal of X, which leads to Crooks' fluctuation theorem in Corollary 2.
While the definition of detailed balance (1) we adopted here is standard in the Markov chain literature, there is some ambiguity regarding its use in thermodynamics, where it has, at least, two more meanings. It is used both for the weaker condition that the Boltzmann distribution p n is a stationary distribution of M n for 1 ≤ n ≤ N [4] and as a synonym of microscopic reversibility [8,15]. Although we have shown microscopic reversibility and detailed balance are indeed equivalent under some conditions (see Lemma 2), we followed its definition in [15], which is not the only one in the literature (see [35,43,44]).
Notice what is called a stationary distribution in the literature on Markov chains is referred to as a nonequilibrium steady state in thermodynamics [45]. In order for it to be an equilibrium state, it needs to fulfil detailed balance (1) with respect to the the transition matrix in question. Notice, also, detailed balance is not fulfilled in several applications of nonequilibrium thermodynamics throughout physics [46] and biology [47].

Continuous-time Markov chains:
Notice, in the case that we have a continuous-time Markov chain, work becomes an integral, where the integrand for X at x and the one for Y at x R differ, aside from the sign, in a single point. Thus, work is odd under time reversal and the assumption that p 0 = p 1 can be dropped in both Theorem 2 and Corollary 2. However, the technical tools required to show Crooks' fluctuation theorem or Jarzynski's equality are technically more involved in the continuous-time case, as one can see in [36].

Continuous state space:
In case the state space is continuous, the results can be derived in a similar fashion. What we ought to notice is that, in this scenario, the role of the transition matrices is played by the densities of the Markov kernels (see for example [48]). These densities allow us to write conditions such as detailed balance analogously to how we do in the discrete case. In the case of Jarzynski's equality for a Markov chain on a continuous state space X, one can see that the result follows like the one with a discrete state space. To convince ourselves, the only thing to take into account is the substitution of the sum in the expected value by the integral and that of the probability distribution by the density of the Markov kernel. Then, following the proof of Theorem 1, we can define a stochastic process Y whose density Markov kernels are defined through the stationary distributions and density Markov kernels of X, in analogy to how we defined them in Theorem 1. The rest follow exactly in the same fashion. Crooks' fluctuation theorem requires a longer explanation, but, essentially, follows from the same considerations.

Conclusions
In this paper, we investigated the potential of thermodynamic fluctuation theorems to serve probabilistic laws of decision-making. In particular, we derived two thermodynamic fluctuation theorems, Jarzynski's equality and Crooks' theorem, in the context of general Markov chains X. We started by defining several thermodynamic concepts for Markov chains and discussing how these definitions do not correspond in general to the ones used in thermodynamics. Right after that, we derived Jarzynski's equality in Theorem 1 without any assumption involving the time reversal of X. Thus, we improved on the previous attempt to derive it in the context of decision-making [32], which was based on the physical conventions pioneered in [8]. Regarding Crooks' fluctuation theorem, we showed in Theorem 2, Corollary 2, and Proposition 3 that, in our decision-theoretic setup, it requires the additional assumption that the initial distribution of X must be the stationary distribution of its first transition matrix. This results from the fact that our notion of work is inherent to Markov chains, which contrasts with the definition used in previous derivations [8,15,17], where the work along the forward and backward paths was calculated in a different way for physical reasons. Instead, we calculated the work along the two paths in the same way, in order for the quantities involved in the final result to have an interpretation that is relevant for decision-making.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of the data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

Appendix A
Here, we show two results relating a Markov chain X to the Markov chain Y defined by X as in Theorem 1. First, in Lemma A1, we relate the irreducibility of transition matrices for X with that of Y.
Lemma A1. If X = (X n ) N n=0 is a Markov chain on a finite state space S whose transition matrices are irreducible, then the transition matrices of the Markov chain Y = (Y n ) N n=0 defined in Theorem 1 are irreducible.