Open Access
This article is

- freely available
- re-usable

*Entropy*
**2009**,
*11*(4),
675-687;
https://doi.org/10.3390/e11040675

Article

The Maximum Entropy Rate Description of a Thermodynamic System in a Stationary Non-Equilibrium State

Department of Pure and Applied Mathematics, University of Padova, via Trieste 63 – 35121, Padova, Italy

Received: 14 September 2009 / Accepted: 27 October 2009 / Published: 29 October 2009

## Abstract

**:**

In this paper we present a simple model to describe a rather general system in a stationary non-equilibrium state, which is an open system traversed by a stationary flux. The probabilistic description is provided by a non-homogeneous Markov chain, which is not assumed on the basis of a model of the microscopic interactions but rather derived from the knowledge of the macroscopic fluxes traversing the system through a maximum entropy rate principle.

Keywords:

Markov chain; asymptotic equipartition property; entropy rate; entropy production; information divergence## 1. Introduction

In a recent (2006, [1]) survey on the Maximum Entropy Production principle, reference is made to an early attempt (1967, [2,3]) to extend (a pioneering paper dealing with this extension problem was written by E.T. Jaynes himself, see [4]) the celebrated Jaynes’ Maximum Entropy principle to thermodynamic systems in a stationary non-equilibrium. Informally stated, assumed that the dynamic is described by a Markov chain, the authors want to find a stochastic transition matrix such that: (i) it has a prescribed probability distribution as a stationary distribution, (ii) the associated chain evolution satisfies given constraints on the macroscopic scale (admissible microscopic evolution) and (iii) the selected transition matrix generates the maximum number of equally probable microscopic evolution paths. Motivated by their derivation, we proceed here to a self-contained independent approach to the same problem with, we think, more far–reaching consequences.

To state the main ideas, let us begin by recalling a restricted version of the ergodic theorem for Markov chains ([5]). Let us suppose that the system has a finite state space $\chi =\{1,\dots ,n\}$ and that its statistical description is given by a stationary, time-homogeneous Markov chain.

**Theorem 1**Let $P\in Mat\left(n\right)$ be a stochastic matrix with positive entries and denote with ${P}^{N}$ the ${N}^{th}$ power of P. Then there exists a unique distribution π which is stationary for P, i.e., ${P}^{*}\pi =\pi $ where ${P}^{*}$ denotes the transposed matrix of P. Moreover,

$$\underset{N\to +\infty}{lim}{P}_{ij}^{N}={\pi}_{j}>0\phantom{\rule{2.em}{0ex}}\forall i,j=1,\dots ,n$$

$$\underset{N\to +\infty}{lim}\left|\right|{\left({P}^{N}\right)}^{*}\nu -\pi \left|\right|=0$$

It is immediate to realize from the above Theorem that the stochastic matrix P determines its stationary distribution, while the contrary is false. In this paper we put forth a selection criterion to choose among all stochastic matrices admitting a fixed distribution π as stationary. In a sense, we want to select a preferred dynamics for the approach to equilibrium π. To this, recall that if we take the stationary distribution π as the initial distribution, then the Markov chain $(P,\pi )$ defines a discrete-time, finite-states stationary stochastic process. Let ${X}_{i},i\ge 0$ be the χ-valued random variable which describe the state of the system at time i. For a stationary process, we can define its entropy rate (see [6]) as
where $\Omega ={\chi}^{N}$ and
In a sense, the entropy rate $\mathcal{H}$ is the thermodynamic N-limit of the entropy of the system formed by all realizations of length N of the process. Therefore, we will call it the system entropy rate in the sequel. Moreover, for the stationary process associated to $(P,\pi )$ it holds that (see [5],[6])

$$\mathcal{H}=\underset{N\to +\infty}{lim}\frac{1}{N}H({X}_{0},\dots ,{X}_{N-1})$$

$$H({X}_{0},\dots ,{X}_{N-1})=-\sum _{\omega \in \Omega}p\left(\omega \right)lnp\left(\omega \right)$$

$$\mathcal{H}=\mathcal{H}\left(P\right)=-\sum _{i,j}{\pi}_{i}{P}_{ij}ln{P}_{ij}$$

Furthermore, in agreement with (ii) above, we assume that the knowledge of the macroscopic fluxes acting on the system in a stationary non-equilibrium state will force constraints on the two-dimensional probability distribution
in the form of constraints $P\in \Lambda $ on the associated stochastic matrix (see Section 2 below). We can now state our selection criterion, which we call the Maximum Entropy Rate principle and amounts to the following constrained extremum problem:

$${\mathcal{P}}_{ij}=Prob({X}_{t+1}=j,{X}_{t}=i)={\pi}_{i}{P}_{ij}$$

M.E.R.P.Given a positive probability distribution π, find the stochastic matrix $\widehat{P}\in \Lambda $ which admits π as a stationary distribution and that has maximum entropy rate $\mathcal{H}\left(\widehat{P}\right)$.

This principle can be seen as an instance of a Maximum Entropy principle for the case of constraints on two-dimensional distributions. Its justification is provided in [7] (see also [8],[9]) using a large deviation theory type estimate on the empirical second order distributions $\widehat{\mathcal{P}}$, in the same spirit of Sanov’ Theorem for first order empirical distributions.

**Remark.**In the case that the only constraints on P are the stationarity and normalization ones, ${\widehat{P}}_{ij}={\pi}_{j},i,j=1,\dots ,n.$ The answer to this problem can be easily found by using the elementary inequality, see [6]

$$\mathcal{H}\left(P\right)=-\sum _{i,j}{\pi}_{i}{P}_{ij}ln{P}_{ij}\le H\left(\pi \right)=-\sum _{i}{\pi}_{i}ln{\pi}_{i}$$

The starting point of the above M.E.R. principle is the probability distribution π, which is, apart from being positive, completely arbitrary. It affords the statistical description of our system at equilibrium. In case we have some macroscopic information on the system, for example its average energy, that we represent as
where ${E}_{i}$ is the energy of the system in state i, then this information can be used to select a probability distribution $\widehat{\pi}$ among all these satisfying the above constraint. By applying the Maximum Entropy principle one gets
the Gibbs probability distribution. Here the inverse temperature $\beta =\frac{1}{kT}$ is uniquely determined by the value e.

$$e={\mathbb{E}}_{\pi}\left(E\right)=\sum _{i=1}^{n}{\pi}_{i}{E}_{i}$$

$${\widehat{\pi}}_{i}=\frac{{e}^{-\beta {E}_{i}}}{Z\left(\beta \right)}$$

## 2. Model of the System in a Stationary Non-Equilibrium State

We begin with the model of the energy exchange between our system χ and another system $\mathcal{A}$. Let us suppose that the system is described by our stationary Markov chain $(P,\pi )$. We suppose also that the whole system $\mathcal{A}\cup \chi $ is energetically insulated, that is if ${X}_{t}=i$ and ${X}_{t+1}=j$, then the microscopic energy conservation law holds
Let us denote with
the skew-symmetric matrix of the difference of energy between states and with
the joint probability at one time step. The average energy transfer between $\mathcal{A}$ and χ is
and it follows easily that if π is stationary for P, and if our microscopic energy conservation law holds, then
hence the stationary distribution π describes the system χ at macroscopic equilibrium with $\mathcal{A}$.

$$\Delta {E}^{\chi}={E}_{j}-{E}_{i}=-\Delta {E}^{\mathcal{A}}\phantom{\rule{2.em}{0ex}}\forall \phantom{\rule{4pt}{0ex}}i,j=1,\dots ,n$$

$${\mathcal{E}}_{ij}={E}_{j}-{E}_{i}$$

$${\mathcal{P}}_{ij}=Prob({X}_{t+1}=j,{X}_{t}=i)={\pi}_{i}{P}_{ij}$$

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}]=\mathcal{E}\xb7\mathcal{P}=\sum _{i,j}{\mathcal{P}}_{ij}{\mathcal{E}}_{ij}=\sum _{i,j}{\pi}_{i}{P}_{ij}{E}_{j}-\sum _{i,j}{\pi}_{i}{P}_{ij}{E}_{i}={\mathbb{E}}_{{P}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)$$

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}]=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{A}}]=0$$

#### 2.1. Coupling of the system with two environments

We now want to model the coupling of the system with two environments $\mathcal{A}$ and $\mathcal{B}$. As before, the statistical description of the system is given by a Markov chain. As a simplifying assumption, we suppose that at every time step along the evolution of the chain, the system χ is in contact (i.e., the microscopic energy conservation holds) with only one of the environments, alternatively $\mathcal{A}$ and $\mathcal{B}$. Therefore, now the system is described by the non-homogeneous Markov chain ($m\in \mathbb{N}$)

$${P}_{t}=\left\{\begin{array}{c}A\phantom{\rule{1.em}{0ex}}t=2m,\hfill \\ B\phantom{\rule{1.em}{0ex}}t=2m+1\hfill \end{array}\right.$$

We will see that it is not restrictive to suppose that the stochastic matrices $A,B$ have positive entries. Also, notice that the chain “observed” at even times $t=2m$ is a time-homogeneous chain described by the positive stochastic matrix $AB$, hence the ergodic theorem applies. Let us denote with π the unique stationary distribution for $AB$. If π is the distribution describing the system χ at $t=0$, then we have

$$\pi \left(0\right)=\pi ,\phantom{\rule{1.em}{0ex}}\pi \left(1\right)={A}^{*}\pi ,\phantom{\rule{1.em}{0ex}}\pi \left(2\right)={B}^{*}{A}^{*}\pi ={\left(AB\right)}^{*}\pi =\pi ,\phantom{\rule{1.em}{0ex}}\pi \left(3\right)={A}^{*}\pi ,\phantom{\rule{1.em}{0ex}}\dots $$

Therefore, the probability distribution describing the chain switches between π (at even times) and ${A}^{*}\pi $ (at odd times). Let us compute as before the joint probability at one and two time-steps starting from an even time $t=2m$ . We have

$$\begin{array}{c}{\mathcal{P}}_{ij}(2m,2m+1)=Prob({X}_{2m+1}=j,{X}_{2m}=i)={\pi}_{i}{A}_{ij}\hfill \\ \\ {\mathcal{P}}_{jk}(2m+1,2m+2)=Prob({X}_{2m+2}=k,{X}_{2m+1}=j)=\sum _{i}{\pi}_{i}{A}_{ij}{B}_{jk}\hfill \\ \\ {\mathcal{P}}_{ijk}(2m,2m+2)=Prob({X}_{2m+2}=k,{X}_{2m+1}=j,{X}_{2m}=i)={\pi}_{i}{A}_{ij}{B}_{jk}\hfill \end{array}$$

Reasoning as before, we can compute the averaged energy differences

$$\begin{array}{c}{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+1)=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{A}}]={\mathbb{E}}_{{A}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)\hfill \\ \\ {\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m+1,2m+2)=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{B}}]={\mathbb{E}}_{{\left(AB\right)}^{*}\pi}\left(E\right)-{\mathbb{E}}_{{A}^{*}\pi}\left(E\right)\hfill \\ \\ {\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+2)={\mathbb{E}}_{{\left(AB\right)}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)\hfill \end{array}$$

Modifications of the above formulae for the case that we start the observation of the chain at an odd time are straightforward. By a simple inspection of the above equalities we can draw the following conclusions:

**Proposition 1**If π is a stationary distribution for $AB$, then

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{A}}]=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+1)={\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m+1,2m+2)=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{B}}]$$

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+2)=0.$$

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{A}}]=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{B}}]={\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+2)=0$$

Therefore, with a distribution π stationary for $AB$ and A we model a system at equilibrium with the two environments, and with a probability distribution π stationary for $AB$ but not for A, we model a system in a stationary non equilibrium state. Moreover, the condition of stationarity of π with respect to $AB$, which is sufficient (but not necessary in the general case) for the balance of energy to hold, is the one that will allow to compute the entropy rate. The value of the macroscopic energy flux can be specified by the macroscopic constraint introduced by the first equality in (2.2)

$${\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\chi}](2m,2m+1)=-{\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{A}}]={\mathbb{E}}_{\mathcal{P}}[\Delta {E}^{\mathcal{B}}]={\mathbb{E}}_{{A}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)=q\ge 0$$

Note that, as before, we made no assumption on the distribution π. We have to choose matrices A and B satisfying the following constraints (here $\mathbf{1}={(1,\dots ,1)}^{*}$)

$$\left\{\begin{array}{c}A\xb7\mathbf{1}=B\xb7\mathbf{1}=\mathbf{1}\phantom{\rule{1.em}{0ex}}i.e.\phantom{\rule{1.em}{0ex}}\sum _{j}{A}_{ij}=\sum _{j}{B}_{ij}=1\phantom{\rule{1.em}{0ex}}\forall i\phantom{\rule{1.em}{0ex}}(\text{normalization of}A\text{and}B)\hfill \\ \\ {\mathbb{E}}_{{A}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)=q,\phantom{\rule{1.em}{0ex}}i.e.\phantom{\rule{1.em}{0ex}}\sum _{ij}{\pi}_{i}{A}_{ij}{\mathcal{E}}_{ij}=q,\phantom{\rule{1.em}{0ex}}(\text{specify}\text{inflow})\hfill \\ \\ {\mathbb{E}}_{{\left(AB\right)}^{*}\pi}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)=0,\phantom{\rule{1.em}{0ex}}i.e.\phantom{\rule{1.em}{0ex}}\sum _{ij}{\pi}_{i}{A}_{ij}{B}_{jk}\phantom{\rule{4pt}{0ex}}={\pi}_{k}\phantom{\rule{1.em}{0ex}}\forall k\phantom{\rule{1.em}{0ex}}(\pi \text{is}\text{stationary}\text{for}AB)\hfill \end{array}\right.$$

The above constraints do not specify in the general case the matrices A and B. Therefore, we need to invoke a selection criterion, which will be our M.E.R. principle. We are led to investigate the existence of an entropy rate for the stochastic process described by the non-homogeneous Markov chain (2.1) with initial distribution π stationary for $AB$. Before turning to this, we look at the solution of our problem in case of zero flux, $q=0$. It is immediate to see that if we limit ourselves to the case $A=B$, then the chain is ergodic and
is the maximum entropy rate solution.

$$A=B=\widehat{P},\phantom{\rule{1.em}{0ex}}where\phantom{\rule{1.em}{0ex}}{\widehat{P}}_{ij}={\pi}_{j}$$

#### 2.2. Computation of the entropy rate

It is easy to see that the chain described by (2.1) with initial distribution π stationary for $AB$ but not for A describes a non-stationary stochastic process. Moreover, the chain is not strongly ergodic, but, with the non restricting assumption that A and B have positive entries it is weakly ergodic (see e.g., [10] for the notion of strong and weak ergodicity and [11],[12] and the bibliography therein for the study of convergence of Markov processes using information theoretic tools.) We now show that the entropy rate is well defined and finite for the process at hand. This is not surprising since the inhomogeneity of the chain is very mild and the chain will become homogeneous with a suitable time-reparameterization, However, we will proceed with the chain as it is for simplicity’ sake. Recall that the probability of a typical sequence $\omega \in \Omega ={\chi}^{N}$ of length N of the chain with initial distribution π is

$$p\left(\omega \right)=p({i}_{0},{i}_{1},\dots ,{i}_{N-1})={\pi}_{{i}_{0}}{A}_{{i}_{0}{i}_{1}}{B}_{{i}_{1}{i}_{2}}{A}_{{i}_{2}{i}_{3}}\dots \phantom{\rule{1.em}{0ex}}$$

To compute $H({X}_{0},\dots ,{X}_{N-1})$ we use the well known chain rule (see [6]):
Therefore, we have
Hence, if the condition of stationary ${\left(AB\right)}^{*}\pi =\pi $ holds, the terms following $H\left({X}_{0}\right)$ alternate between the form of
and
Hence,
or

$$H({X}_{0},\dots ,{X}_{N-1})=\sum _{k=0}^{N-1}H\left({X}_{k}\right|{X}_{0},\dots ,{X}_{k-1})=H\left({X}_{0}\right)+H\left({X}_{1}\right|{X}_{0})+H\left({X}_{2}\right|{X}_{1},{X}_{0})+\dots .$$

$$\begin{array}{c}H\left({X}_{0}\right)=H\left(\pi \right)=-\sum _{{i}_{0}}{\pi}_{{i}_{0}}ln{\pi}_{{i}_{0}}\hfill \\ \\ H\left({X}_{1}\right|{X}_{0})=-\sum _{{i}_{0},{i}_{1}}p({i}_{0},{i}_{1})lnp\left({i}_{1}\right|{i}_{0})=-\sum _{{i}_{0},{i}_{1}}{\pi}_{{i}_{0}}{A}_{{i}_{0}{i}_{1}}ln{A}_{{i}_{0}{i}_{1}}\hfill \\ \\ H\left({X}_{2}\right|{X}_{1},{X}_{0})=-\sum _{{i}_{0},{i}_{1},{i}_{2}}p({i}_{0},{i}_{1},{i}_{2})lnp\left({i}_{2}\right|{i}_{0},{i}_{1})=-\sum _{{i}_{0},{i}_{1},{i}_{2}}{\pi}_{{i}_{0}}{A}_{{i}_{0}{i}_{1}}{B}_{{i}_{1}{i}_{2}}ln{B}_{{i}_{1}{i}_{2}}\hfill \end{array}$$

$$H\left({X}_{1}\right|{X}_{0})=H(\pi ,A)=-\sum _{{i}_{0},{i}_{1}}{\pi}_{{i}_{0}}{A}_{{i}_{0}{i}_{1}}ln{A}_{{i}_{0}{i}_{1}}$$

$$H\left({X}_{2}\right|{X}_{1},{X}_{0})=H({A}^{*}\pi ,B)=-\sum _{{i}_{0},{i}_{1},{i}_{2}}{\pi}_{{i}_{0}}{A}_{{i}_{0}{i}_{1}}{B}_{{i}_{1}{i}_{2}}ln{B}_{{i}_{1}{i}_{2}}$$

$$\mathcal{H}=\underset{N\to +\infty}{lim}\frac{1}{N}H({X}_{0},\dots ,{X}_{N-1})=\frac{1}{2}(H(\pi ,A)+H({A}^{*}\pi ,B))$$

$$\mathcal{H}=\mathcal{H}(A,B)=-\frac{1}{2}(\sum _{i,j=1}^{n}{\pi}_{i}{A}_{ij}ln{A}_{ij}+\sum _{i,j,k=1}^{n}{\pi}_{i}{A}_{ij}{B}_{jk}ln{B}_{jk})$$

#### 2.3. Application of the M.E.R. principle and solution

In this section we solve the constrained extremum problem for the objective function $\mathcal{H}(A,B)$ subject to the constraints (2.4) using the Lagrange multipliers method. The Lagrange function is
and the necessary condition for the extremum are

$$G(A,B)=\mathcal{H}(A,B)-\sum _{ij}{\gamma}_{i}{A}_{ij}--\sum _{jk}{\lambda}_{j}{B}_{jk}-\sum _{ijk}{\mu}_{k}{\pi}_{i}{A}_{ij}{B}_{jk}-\beta \sum _{ij}{\pi}_{i}{A}_{ij}{\mathcal{E}}_{ij}\phantom{\rule{4pt}{0ex}}$$

$$\frac{\partial G}{\partial {A}_{ij}}=0,\phantom{\rule{2.em}{0ex}}\frac{\partial G}{\partial {B}_{jk}}=0,\phantom{\rule{1.em}{0ex}}\forall i,j,k.$$

By simple computations we get the expression for the solution A and B in terms of the unknown multipliers as

$$\begin{array}{c}{\widehat{A}}_{ij}={e}^{\frac{{\lambda}_{j}}{{\left({A}^{*}\pi \right)}_{j}}}{e}^{-\frac{{\gamma}_{i}}{{\pi}_{i}}}{e}^{-\beta {\mathcal{E}}_{ij}}=:{x}_{j}\left(\lambda \right){y}_{i}\left(\gamma \right){Q}_{ij}\left(\beta \right)\hfill \\ \\ {\widehat{B}}_{jk}={e}^{-\frac{{\lambda}_{j}}{{\left({A}^{*}\pi \right)}_{j}}}{e}^{{\mu}_{k}+1}=:{x}_{j}^{-1}\left(\lambda \right){z}_{k}\left(\mu \right)\hfill \end{array}$$

In the following we use $x,y,z,Q$ as unknowns in place of the Lagrange multipliers. By using the normality constraint on A and B respectively we get

$${y}_{i}=\frac{1}{{\sum}_{j}{Q}_{ij}{x}_{j}},\phantom{\rule{2.em}{0ex}}{x}_{j}=\sum _{k}{z}_{k}:=z$$

By using the stationarity constraint of π with respect to $AB$ we get the following equations for ${z}_{k}$, $k=1,\dots ,n$,

$${\pi}_{k}=\sum _{ij}{\pi}_{i}\frac{{Q}_{ij}}{{\sum}_{j}{Q}_{ij}}\frac{{z}_{k}}{z}:=\sum _{ij}{\pi}_{i}{\tilde{Q}}_{ij}\frac{{z}_{k}}{z}\phantom{\rule{2.em}{0ex}}\text{where}\phantom{\rule{1.em}{0ex}}{\tilde{Q}}_{ij}=\frac{{Q}_{ij}}{{\sum}_{j}{Q}_{ij}}$$

Since the above introduced matrix $\tilde{Q}$ is a stochastic one with positive entries ${\sum}_{j}{\tilde{Q}}_{ij}=1$, it is easy to see that the solution of the above equation for ${z}_{k}$ is ${z}_{k}={\pi}_{k}$. Hence, ${x}_{j}=1$ and from (2.7)

$${\widehat{A}}_{ij}={\tilde{Q}}_{ij},\phantom{\rule{1.em}{0ex}}{\widehat{B}}_{jk}={\pi}_{k}.$$

Before turning to the inflow constraint to determine the multiplier β, we note that the matrix $\tilde{Q}$ admits a simpler form using the definition ${\mathcal{E}}_{ij}={E}_{j}-{E}_{i}$

$${\tilde{Q}}_{ij}=\frac{{Q}_{ij}}{{\sum}_{j}{Q}_{ij}}=\frac{{e}^{-\beta {\mathcal{E}}_{ij}}}{{\sum}_{j}{e}^{-\beta {\mathcal{E}}_{ij}}}=\frac{{e}^{-\beta {E}_{j}}}{{\sum}_{j}{e}^{-\beta {E}_{j}}}=:{\tilde{\pi}}_{j}$$

Therefore ${\widehat{A}}^{*}\pi =\tilde{\pi}$, since

$${\left({\widehat{A}}^{*}\pi \right)}_{i}=\sum _{j}{\tilde{Q}}_{ji}{\pi}_{j}=\sum _{j}{\tilde{\pi}}_{i}{\pi}_{j}={\tilde{\pi}}_{i}$$

Now the inflow constraint can be rewritten as

$$q=\sum _{ij}{\pi}_{i}{\widehat{A}}_{ij}{\mathcal{E}}_{ij}=\sum _{ij}{\pi}_{i}{\tilde{Q}}_{ij}{\mathcal{E}}_{ij}=\sum _{ij}{\pi}_{i}{\tilde{\pi}}_{j}{\mathcal{E}}_{ij}={\mathbb{E}}_{\tilde{\pi}}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right)$$

Hence, for any given π and q, setting $e={\mathbb{E}}_{\pi}\left(E\right)$, the Lagrange multiplier $\beta =\beta (e+q)$ is uniquely determined by the equation

$${\mathbb{E}}_{\tilde{\pi}}\left(E\right)=-\frac{\partial lnZ\left(\beta \right)}{\partial \beta}=\sum _{j}{E}_{j}\frac{{e}^{-\beta {E}_{j}}}{Z\left(\beta \right)}=q+e$$

We conclude by noting that, if the relation $e={\mathbb{E}}_{\pi}\left(E\right)$ is seen as a constraint for the unknown probability distribution π, then the maximum entropy assignation for π is the Gibbs distribution

$${\pi}_{i}=\frac{{e}^{-\beta {E}_{i}}}{Z\left(\beta \right)}\phantom{\rule{2.em}{0ex}}\phantom{\rule{4pt}{0ex}}where\phantom{\rule{1.em}{0ex}}\beta =\beta \left(e\right).$$

We have found that the maximum entropy rate assignation for the non-stationary stochastic process described by the non-homogeneous Markov chain (2.1) defined by the stochastic matrices A and B and by the probability distribution π which is stationary for $AB$ is
where $\tilde{\pi}$ is the Gibbs distribution for $\beta =\beta (q+e)$ and π is the Gibbs distribution for $\beta =\beta \left(e\right)$. As expected, the solution depends only on the macroscopic information supplied: the equilibrium energy e and the flow q.

$${A}_{ij}={\tilde{\pi}}_{j},\phantom{\rule{1.em}{0ex}}{B}_{jk}={\pi}_{k}$$

#### 2.4. Entropy rate and entropy production

By a direct computation from (1.1), (2.8), (2.10), the entropy rate of the process is the sum of two terms of the type
Hence, from (2.5)

$$H\left(\pi \right)=lnZ\left(\beta \right)+e\beta =S\left(e\right)$$

$$\mathcal{H}(A,B)=\mathcal{H}(q,e)=\frac{1}{2}(H\left(\tilde{\pi}\right)+H\left(\pi \right))=\frac{1}{2}(S(e+q)+S\left(e\right))$$

Since the chain spends “half of its time” in a state with average energy e and the remaining half in a state with average energy $e+q$, the above formula proposes that the entropy rate be the time average of its “instantaneous” entropies.

Which is the relation between the entropy rate of the stochastic process and the thermodynamic entropy of the system in a stationary non-equilibrium state? If q is small, we can consider the Taylor expansion of $\mathcal{H}(q,e)$ w.r.t. q and get
Hence, by the well known identification $\beta =1/kT$,
In the above formula, the entropy rate is the sum of two terms, one of which $S\left(e\right)$ is non negative while the other has the sign of q. It is appealing to interpret the first as the source term an the other as the flux term.

$$\mathcal{H}(q,e)=\mathcal{H}(0,e)+\frac{\partial \mathcal{H}}{\partial q}(0,e)q+\mathcal{O}\left({q}^{2}\right)=S\left(e\right)+\frac{1}{2}\frac{\partial S}{\partial e}\left(e\right)q+\mathcal{O}\left({q}^{2}\right)$$

$$\mathcal{H}(q,e)=S\left(e\right)+\frac{1}{2}\beta \left(e\right)q+\mathcal{O}\left({q}^{2}\right)=S\left(e\right)+\frac{1}{2}\frac{q}{kT\left(e\right)}+\mathcal{O}\left({q}^{2}\right)$$

Moreover, from the relation between the average energy e and the related multiplier $\beta \left(e\right)$, we have that, if q is small
Hence, up to $\mathcal{O}\left(2\right)$ order terms,

$$e+q=e\left(\beta \right)+\frac{\partial e}{\partial \beta}\left(\beta \right)d\beta +\mathcal{O}\left(2\right)$$

$$q=\frac{\partial e}{\partial \beta}\left(\beta \right)d\beta =-\frac{{\partial}^{2}lnZ\left(\beta \right)}{\partial {\beta}^{2}}d\beta =-k{T}^{2}{C}_{v}d\left(\frac{1}{kT}\right)={C}_{v}dT$$

#### 2.5. Entropy production of the $\mathcal{A}\cup \chi \cup \mathcal{B}$ system

Let us consider the insulated system $\mathcal{A}\cup \chi \cup \mathcal{B}$ and let us suppose that the two environments $\mathcal{A}$ and $\mathcal{B}$ are two thermostats respectively at temperatures $\tilde{T}=1/k\beta (e+q)$ and $T=1/k\beta \left(e\right)$. The net effect of putting the system χ alternatively in contact with $\mathcal{A}$ and with $\mathcal{B}$ is the flow in two time steps of the chain of an average energy amount $q>0$ from a reservoir at higher temperature $\tilde{T}$ to a reservoir a lower temperature T leaving the system χ unchanged, since the equilibrium distribution π is stationary for $AB$. By a standard non-equilibrium thermodynamics formula (see e.g. [13]) the entropy production in the $\pi \stackrel{A}{\to}\tilde{\pi}\stackrel{B}{\to}\pi $ cycle is

$${S}_{pr}=d{S}_{A}+d{S}_{B}=\frac{-q}{\tilde{T}}+\frac{q}{T}=kq(\beta -\tilde{\beta})\ge 0$$

If we now compute the information divergence (also called relative entropy, see [6]) of the Gibbs distribution $\tilde{\pi}$ with respect to Gibbs distribution π we find

$$D(\tilde{\pi}\parallel \pi )=\sum _{i}{\tilde{\pi}}_{i}ln\frac{{\tilde{\pi}}_{i}}{{\pi}_{i}}=ln\frac{Z\left(\beta \right)}{Z\left(\tilde{\beta}\right)}+(\beta -\tilde{\beta}){\mathbb{E}}_{\tilde{\pi}}\left(E\right)\ge 0$$

The information divergence it is not a symmetric function of the two probability distributions $p,q$ while the symmetrized information divergence (see [14]) of p and q is a symmetric and non negative one

$$\Delta (p,q):=D(p\parallel q)+D(q\parallel p)=\sum _{i}({p}_{i}-{q}_{i})ln\frac{{p}_{i}}{{q}_{i}}$$

The standard interpretation (see again [14]) of the symmetrized information divergence is a measure of the difficulty of assessing which is the statistical description (p or q) of the system on the basis of the observations of the system state. Since
we have the following

$$\Delta (\pi ,\tilde{\pi}):=D(\tilde{\pi}\parallel \pi )+D(\pi \parallel \tilde{\pi})=(\beta -\tilde{\beta})({\mathbb{E}}_{\tilde{\pi}}\left(E\right)-{\mathbb{E}}_{\pi}\left(E\right))=(\beta -\tilde{\beta})q\ge 0.$$

**Proposition 2**The entropy production of the closed system $\mathcal{A}\cup \chi \cup \mathcal{B}$ is equal to the symmetrized information divergence between the probability distributions π, $\tilde{\pi}$

$${S}_{pr}=k\Delta (\pi ,\tilde{\pi})\ge 0$$

**Remark**. In the literature (see e.g the books [15],[16] or the papers [17], [18]) there is a well established notion of entropy production rate for a stationary Markov chain with countable state space. Let $(P,\pi )$ be the transition matrix and its unique stationary distribution respectively. Then the entropy production rate, also called the information gain of the stationary chain with respect to its time reversal, is defined as

$${e}_{P}=\sum _{i,j}({\pi}_{i}{P}_{ij}-{\pi}_{j}{P}_{ji})ln\frac{{\pi}_{i}{P}_{ij}}{{\pi}_{j}{P}_{ji}}.$$

## 3. Generalizations

In this section, we generalize the previous results to the case of a open system with k macroscopic observables ${E}^{\alpha}$ with possibly different speed of relaxation to equilibrium and k stationary fluxes ${q}^{\alpha}$.

#### 3.1. The case of k isochronous fluxes and Onsager’ reciprocity relations

The considerations in Section 2 and Section 2.1 developed for E apply without changes for every macroscopic observable ${E}^{\alpha}$. We consider therefore our M.E.R. problem with k constraints of the form (2.4). Since the constraints specifying the values ${q}^{\alpha}$ of the k inflows are linear and independent, the last term in the r.h.s. of the Lagrange function (2.6) becomes a sum over the index α

$$\sum _{ij\alpha}{\beta}_{\alpha}{\pi}_{i}{A}_{ij}{E}_{ij}^{\alpha}$$

By a simple inspection of the previous computations, one finds that the matrix $\tilde{Q}$ has the form
where the k Lagrange multipliers ${\beta}_{\alpha}$ are uniquely determined by the k equations

$${\tilde{Q}}_{ij}=\frac{{e}^{-{\sum}_{\alpha}{\beta}_{\alpha}{E}_{j}^{\alpha}}}{{\sum}_{j}{e}^{-{\sum}_{\alpha}{\beta}_{\alpha}{E}_{j}^{\alpha}}}=:{\tilde{\pi}}_{j}$$

$${\mathbb{E}}_{\tilde{\pi}}\left({E}^{\alpha}\right)=-\frac{\partial lnZ\left(\beta \right)}{\partial {\beta}_{\alpha}}=\sum _{r}{E}_{r}^{\alpha}\frac{{e}^{-{\sum}_{\sigma}{\beta}_{\sigma}{E}_{r}^{\sigma}}}{Z\left(\beta \right)}={q}^{\alpha}+{\mathbb{E}}_{\pi}\left({E}^{\alpha}\right)={q}^{\alpha}+{e}^{\alpha}$$

The considerations and the results of Section 2.4. can be reformulated without changes for the vector of fluxes $q=({q}^{1},\dots ,{q}^{k})$ and equilibrium values $e=({e}^{1},\dots ,{e}^{k})$. In particular, the linear approximation of small fluxes formula (2.12) becomes
and, by a computation like the one done before in Section 2.4., we get
which are Onsager reciprocity relations.

$$\mathcal{H}\left(q\right)=S\left(e\right)+\frac{1}{2}\sum _{\alpha}{\beta}_{\alpha}\left(e\right){q}^{\alpha}+\mathcal{O}\left({q}^{2}\right)$$

$$\frac{\partial {e}_{r}}{\partial {\beta}_{l}}=\frac{{\partial}^{2}lnZ\left(\beta \right)}{\partial {\beta}_{r}\partial {\beta}_{l}}=\frac{\partial {e}_{l}}{\partial {\beta}_{r}}$$

#### 3.2. The case of different speed of relaxation to equilibrium

In this section we sketch how to model a system where there are two observables $X,Y$ with different speed of relaxation to equilibrium. As before, let the system χ be alternatively in contact with the two environments $\mathcal{A}$ and $\mathcal{B}$ but now we suppose that for the observable X its average value switches from e to $e+q$ and back to e upon the contact with $\mathcal{A}$ and $\mathcal{B}$, while for the observable Y its average value switches from u to $u+r$ and back to u upon the contact with $\mathcal{A}$, $\mathcal{B}$, $\mathcal{A}$, $\mathcal{B}$ (two cycles). Therefore, the system returns (relaxes) to equilibrium with respect to observable X in a two-step cycle $A\to B$ or $C\to D$ while, with respect to observable Y relaxes to equilibrium in a full (four step) cycle $A\to B\to C\to D$. Can we infer average values of slow quantity Y at intermediate states B, C from knowledge of average values of fast observable X ?

To this we introduce a periodic Markov chain with four stochastic matrices A, B, C, D generalizing (2.1) and the following set of constraints generalizing (2.4)

$$\left\{\begin{array}{c}A\xb7\mathbf{1}=B\xb7\mathbf{1}=C\xb7\mathbf{1}=D\xb7\mathbf{1}=\mathbf{1}\phantom{\rule{1.em}{0ex}}\left(\text{normalization}\right)\hfill \\ \\ {\mathbb{E}}_{{A}^{*}\pi}\left(X\right)-{\mathbb{E}}_{\pi}\left(X\right)=q,\phantom{\rule{1.em}{0ex}}(\text{specify}\text{inflow}\text{of}X\text{in}A)\hfill \\ \\ {\mathbb{E}}_{{\left(AB\right)}^{*}\pi}\left(X\right)-{\mathbb{E}}_{\pi}\left(X\right)=0,\phantom{\rule{1.em}{0ex}}(\text{specify}\text{outflow}\text{of}X\text{in}AB)\hfill \\ \\ {\mathbb{E}}_{{\left(ABC\right)}^{*}\pi}\left(X\right)-{\mathbb{E}}_{\pi}\left(X\right)=q,\phantom{\rule{1.em}{0ex}}(\text{specify}\text{inflow}\text{of}X\text{in}ABC)\hfill \\ \\ {\mathbb{E}}_{{\left(AB\right)}^{*}\pi}\left(Y\right)-{\mathbb{E}}_{\pi}\left(Y\right)=r,\phantom{\rule{1.em}{0ex}}(\text{specify}\text{inflow}\text{of}Y\text{in}AB)\hfill \\ \\ {\left(ABCD\right)}^{*}\pi =\pi ,\phantom{\rule{1.em}{0ex}}(\pi \text{is}\text{stationary}\text{for}ABCD)\hfill \end{array}\right.$$

By a computation entirely analogous to the previous one, we find the expression of the entropy rate

$$\mathcal{H}=\frac{1}{4}[H(\pi ,A)+H({A}^{*}\pi ,B)+H({\left(AB\right)}^{*}\pi ,C)+H({\left(ABC\right)}^{*}\pi ,D)]$$

Moreover, if the equilibrium state is defined by the macroscopic values $e={\mathbb{E}}_{\pi}\left(X\right)$ and $u={\mathbb{E}}_{\pi}\left(Y\right)$, one can show that then system switches between three Gibbsian states

$$\pi (e,u)\stackrel{A}{\to}\pi (e+q)\stackrel{B}{\to}\pi (e+q,u+r)\stackrel{C}{\to}\pi (e+q)\stackrel{D}{\to}\pi (e,u)$$

Then, the inferred average value of Y at the intermediate states is
and also

$${\left(Y\right)}_{inf}={\mathbb{E}}_{\pi (e+q)}\left[Y\right]$$

$$\frac{d{\left(Y\right)}_{inf}}{dq}dq=-\frac{Co{v}_{\pi (e+q)}(X,Y)}{Co{v}_{\pi (e+q)}(X,X)}dq$$

## 4. Conclusions

In this paper we have dealt with a simple model describing a system in a stationary non-equilibrium state. The probabilistic description is provided by a non-homogeneous Markov chain, which is not assumed on the basis of a model of the microscopic interactions but rather derived from the knowledge of the macroscopic fluxes traversing the system through a maximum entropy rate principle. With respect to existing applications of the M.E.R. principle, here we have introduced a Markov chain defined by two or more stochastic matrices. In this way we are able to describe a system in a stationary non-equilibrium state that may exhibit different speed of relaxation to equilibrium. We made reference to physical quantities such as the energy for the ease of interpretation of the results, but this is not necessary and the same model can be applied in other domains, such as economy or biology, as it is sufficiently general to take into account macroscopic constraints of different nature.

## Acknowledgements

The author gratefully thanks the referees for useful remarks and comments that have substantially improved the content and presentation of the manuscript.

## References

- Martyushev, L.M.; Selenezev, V.D. Maximum entropy production principle in physics, chemistry and biology. Phys. Rep.
**2006**, 426, 1–45. [Google Scholar] [CrossRef] - Filyukov, A.A.; Karpov, V.Y. Description of steady transport processes by the method of the most probable path of evolution. Inzhenerno-Fizicheskii Zhurnal
**1967**, 13, 624–630. [Google Scholar] [CrossRef] - Filyukov, A.A.; Karpov, V.Y. Method of the most probable path of evolution in the theory of stationa ry irreversible processes. Inzhenerno-Fizicheskii Zhurnal
**1967**, 13, 798–804. [Google Scholar] - Jaynes, E.T. The minimum entropy production principle. Ann. Rev. Phys. Chem.
**1980**, 31, 579–601. [Google Scholar] [CrossRef] - Koralov, L.B.; Sinai, Y.G. Theory of Probability and Random Processes, 2nd ed.; Springer-Verlag: Berlin, Germany, 2007. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley & Sons, Inc.: New York, NY, USA, 1991. [Google Scholar]
- Csiszar, I.; Cover, T.M.; Choi, B. Conditional limit theorems under Markov conditioning. IEEE Trans. Inf. Theory
**1987**, 33, 788–801. [Google Scholar] [CrossRef] - Dembo, A.; Zeitouni, O. Large Deviations Techniques and Applications, 2nd ed.; Springer-Verlag: New York, NY, USA, 1998. [Google Scholar]
- Justesen, J.; Hoholdt, T. Maxentropic Markov chains. IEEE Trans. Inf. Theory
**1984**, 30, 665–667. [Google Scholar] [CrossRef] - Bremaud, P. Markov Chains: Gibbs fields, Monte Carlo simulations and queues, Texts in Applied Mathematics 31; Springer-Verlag: New York, NY, USA, 1999. [Google Scholar]
- Barron, A.R. Limits of information, Markov chains and projections. In Proceedings 2000 IEEE International Symposium on Information Theory, Sorrento, Italy, June 2000; p. 25.
- Harremoës, P.; Holst, K.K. Convergence of Markov chains in information divergence. J. Theor. Probability
**2009**, 22, 186–202. [Google Scholar] [CrossRef] - De Groot, S.R.; Mazur, P. Non-equilibrium Thermodynamics; North Holland: Amsterdam, The Netherlands, 1962. [Google Scholar]
- Kullback, S. Information Theory and Statistics; Wiley: New York, NY, USA, 1959. [Google Scholar]
- Jiang, D.-Q.; Qian, M.; Qian, M. Mathematical Theory of Nonequilibrium Steady States, Lecture Notes in Mathematics 1833; Springer-Verlag: Berlin, Germany, 2004. [Google Scholar]
- Kalpazidou, S.L. Cycle Representations of Markov Processes, Applications of Mathematics 28; Springer-Verlag: New York, NY, USA, 1995. [Google Scholar]
- Gaspard, P. Time-reversed dynamical entropy and irreversibility in Markovian random processes. J. Stat. Phys.
**2004**, 117, 599–615. [Google Scholar] [CrossRef] - Jiang, D.-Q.; Qian, M.; Qian, M. Entropy production, information gain and Lyapunov exponents of random hyperbolic dynamical systems. Forum Math.
**2004**, 16, 281–315. [Google Scholar] [CrossRef]

© 2009 by the author; licensee Molecular Diversity Preservation International, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/3.0/.