Dissipation in Non-Steady State Regulatory Circuits

In order to respond to environmental signals, cells often use small molecular circuits to transmit information about their surroundings. Recently, motivated by specific examples in signaling and gene regulation, a body of work has focused on the properties of circuits that function out of equilibrium and dissipate energy. We briefly review the probabilistic measures of information and dissipation and use simple models to discuss and illustrate trade-offs between information and dissipation in biological circuits. We find that circuits with non-steady state initial conditions can transmit more information at small readout delays than steady state circuits. The dissipative cost of this additional information proves marginal compared to the steady state dissipation. Feedback does not significantly increase the transmitted information for out of steady state circuits but does decrease dissipative costs. Lastly, we discuss the case of bursty gene regulatory circuits that, even in the fast switching limit, function out of equilibrium.


Introduction
Cells rely on molecular signals to inform themselves about their surroundings and their own internal state [1]. These signals can describe the surrounding sugar type and concentration, which is the case of many bacterial operons, such as those used for lactose or galactose breakdown [2]. Signaling and activation of phosphorylated receptors provide a means of informing bacterial cells on faster timescales about a wide range of conditions including crowding, growth signals, and stress [3]. Triggered by these signals cells activate regulatory networks and cascades that allow them to respond in an appropriate way to existing signals.
A response is usually trigerred by a change in the environment, which perturbs the previous state of the cell and the regulatory system. Specifically, if the regulatory circuit was functioning in steady state, a change in the concentration of the signaling molecule, or the appearance of a new molecule will kick it out of steady state. Here we investigate the response to such perturbations.
In this paper we study abstract mathematical models whose goal is to capture the main regulatory features of biochemical circuits. Our models do not capture many of the details of biochemical complexity of regulatory units in real cells. By "circuit" or "network" throughout the paper we mean a set of stochastic processes that transform an input signal through a regulatory function to produce an output response. This use of the word "circuit" or network is standard in the biophysics literature [1][2][3]. Abstract models of biochemical circuits have proven useful in understanding molecular regulation in many biological systems from development to immunology . and review some of our previous results from work that studied information transmission [65] and the trade-offs between information transmission and dissipation for regulatory circuits in steady state [66] (Sections 6.1 and 6.2). A signal often perturbs the system out of steady state, to which it then relaxes back. In this paper we calculate the non-equilibrium dissipation for circuits that function out of steady state and maximally transmit information between the input and a potentially delayed output given constraints on dissipation. While the setup of the optimization problem (Equation (13)) is the same as in our previous work [66], considering average dissipation (Equation (14)) is new (Sections 6.3 and 6.4).
Lastly, we include some comments on dissipation in simple gene regulatory circuits with bursty transcription (Section 7) [67][68][69][70][71][72][73]. We show how even a fast switching gene promoter need not be in equilibrium. Our goal is not to provide an exhaustive review of the field but to illustrate with simple examples some trade-offs that appear in these molecular circuits.

Model
We consider a system at time t consisting of two discreet random variables z t and x t , which describe the input state and output state of the systems, respectively. We previously used these abstract stochastic processes to study regulation in biochemical circuits in Mancini et al. [66] and such binary models of biochemical circuits have been studied by others [40,43,[48][49][50][51]. For simplicity we assume that x and z can take only two values: + (active state) and − (inactive state). The input state corresponds to the presence or absence of a signaling molecule (or a high or low concentrations of a signaling molecule), whereas the output state is activation or not of a response pathway or regulator. The specific regulatory interactions between them will be defined later within the specfic studied model(s). At every time t, the system is in one of four possible states (z t , x t ): (−, −), (−, +), (+, −), or (+, +). The master equation for the temporal evolution of the conditional probability distribution p(z t , x t |z 0 , x 0 ) of the system is: where L is a 4 × 4 matrix with transition rates between the four states. We will be interested in the joint probability p(x t , z 0 ), that is we will look at the output variable x at time t and the initial state of the input variable z: p(x t , z 0 ) = ∑ x 0 ,z t =±1 p(z t , x t |z 0 , x 0 ) · p(x 0 , z 0 ). (2) This probability is needed in the computation of the central quantity we optimize: the time-delayed mutual information between the initial state of the input and the state of the output at t (defined in Section 3). After marginalization over possible states of z 0 we will obtain p(x t ) = ∑ z 0 p(x t , z 0 ), which in turn is indispensable for calculating the dissipation of the system defined in Section 4.
We restrict our analysis to symmetric models, in which we do not distinguish between the (−, −) and (+, +) states, and, analogously, between the (−, +) and (+, −) states. This is a simplification that is not motivated by a biological observation. The symmetry of the model allows us to write the probability distribution at any time t as p(x t , z 0 ) = 1+µ t 4 , , assuming the initial probability distribution also assumes the same symmetry: For the models in which the initial distribution is the steady state one, p init = p(x 0 , z 0 ) = p ss , which imposes a condition on µ 0 .

Model without Feedback: S andS
The first, simplest model we analyze is a symmetric model in which only the input affects the output and there is no feedback from the output to the input. The output variable either aligns or anti-aligns to the input variable with rate r, regardless of the state of the input (see Figure 1A). The input variable z flips between active and inactive states with rate u and the output variable x aligns with rate r and anti-aligns with rate s (see Figure 1). The dynamics is given by a transition rate matrix given in Appendix A.
A cartoon of the possible states and transitions for both models: without feedback (A), and with feedback (B). Since there are two binary variables there are four states; transition rates are marked next to respective arrows. Note the symmetry between the "pure" ((−, −) and (+, +)) states and the "mixed" states ((−, +) and (+, −)) in both models. Representation of a possible time evolution of the system. Two variables flip between active (+) and inactive (−) states with respective rates. In the model without feedback (C) the output variable depends on the input variable (the output aligns to the input with rate r or anti-aligns, with rate s), the input variable z flips freely between its active and inactive state, regardless of the state of the output. In the model with feedback (D), there is a difference in rates of flipping of the input that depends on the state of the output.
We calculate analytically the joint probability distribution p(x t , z 0 ) (a four-dimensional vector) and marginal probability distributions p(x t ) and p(z 0 ) (two-dimensional random vectors), needed to find the mutual information, that we will define in Equation (3), as a function of the transition rates u, s, r, and a parameter µ 0 that parametrizes the initial state of the system (see Appendix B). We set, without loss of generality, one rate equal to 1, specifically r = 1. The specific expressions for the probability distributions for the occupancy of the four states for the model without feedback are given in Appendix A. In steady state the probability distribution for the occupancy of the four states simplifies to p ∞ = u+1 2s+4u+2 , s+u 2s+4u+2 , s+u 2s+4u+2 , u+1 2s+4u+2 . We will consider this model in steady state, and we will call it model S. We will also allow for the initial conditions to be out of steady state, and then we will call it modelS.

Models with Feedback: F andF
In the second analyzed model we allow the input variable to be dependent on the output, i.e., we allow for a feedback from x to z. We keep as much symmetry as possible, while still not distinguishing between the states (−, −) and (+, +), and between (−, +) and (+, −). The scheme is given in Figure 1B. In terms of the rates we allow the original input z t switching parameters, to be different depending on the state of the output x t introducing the rate α for anti-aligning the two variables and y for aligning the two variables. The notion of input and output is no longer meaningful since both variables influence each other. We note that this scheme is not the most general model possible since we impose the symmetry between the 'pure' states, i.e., (−, −) and (+, +), and the 'mixed' states, i.e., (−, +) and (+, −), which reduces the number of parameters from 8 (as was studied in Mancini et al. [65]) to 4 (as was considered in Mancini et al. [66]). The transition matrix for this model, and the steady state probabilities are given in Appendix B.
Similarly to the case of the model without feedback, we consider the model with feedback in steady state and call it model F, or let the initial conditions be out of steady state by considering all values of µ 0 (modelS). To sumarize, we use the following notation: • S -no feedback, stationary initial condition; •S -no feedback, optimal initial condition; • F -with feedback, stationary initial condition; •F -with feedback, optimal initial condition.

Information
The mutual information measured between the input z at time 0 and output x at time t is defined as [53,74]: In order to analyse the system in its natural timescale, we set t = τ/λ, where λ is the inverse of the relaxation time (smallest, non-zero eigenvalue of the matrix L) and calculate I[ The term under the logarithm, which has been called the thermodynamic coupling function for systems with many degrees of freedom [75,76], describes the degree of correlation of the two variables, and is zero if the joined probability distribution factorizes. The thermodynamic coupling function has been shown to be useful to quantify the contributions of specific energy terms in binary models of allosteric systems [75,76].
Again exploiting the symmetry of the problem, the mutual information can be written as where |µ| ≤ 1. Since we have fixed r = 1, the symmetry of clockwise and counter-clockwise rotations is broken and µ ∈ [0, 1]. Information is an increasing function of µ and is maximized at I[x t , z 0 ] = 1 bit for µ = 1. The specific values for µ are given in Appendixes A and B for the models with and without feedback.

Non-Equilibrium Dissipation
We consider the limitations on the regulatory functions coming from having a fixed amount of energy to dissipate during the signaling process that transmits information. Large amounts of dissipated energy allow systems to function far out of equilibrium, whereas no dissipated energy corresponds to equlibrium circuits. We quantify the degree to which the system functions out of equilibrium by comparing the probability of a forward, P → ( x), and backward, P ← ( x), trajectory along the same path [34,77]: where the paths are defined as x = (x 1 , x 2 , . . . , x N ) and x = (x N , x N−1 , . . . , x 1 ) and each state x i is a four dimensional probability of the input and output at time i. Using the Markov nature of the transitions P(x t+1 |x t ) we write the probability of the forward path starting from the initial state x 1 as and analogously for the backward path. Equation (5) now becomes: where we multiplied both the numerator and the denominator by the same product of probabilities P(x 2 ) · ... · P(x N ). Simplifying further and marginalizing over the elements of x not equal to x t or x t+1 : which defines the time dependent dissipation production rate, σ(t).
Noting that P t→t+1 (x t+1 |x t ) = P t+1→t (x t |x t+1 ) = P(x t+1 = i|x t = j) and by explicitly defining the transition rates: and renaming P t (x t+1 ) = p j (t) and P y (x t ) = p i (t) we obtain [34,77,78]: that in the limit of t → ∞ results in the steady state entropy dissipation rate: where p ss j is the steady state probability distribution. We describe an alternative derivation of dissipation in Appendix C.
Again, we rescale the time in the above quantities by setting t = τ/λ (λ being is the inverse of the relaxation time):σ

Setup of the Optimization
With these definitions, following Mancini et al. [66], we can ask what are the circuits that optimally transmit information given a limited constrained amount of steady state dissipationσ ss : over the circuit's reaction rates, L. The energy expense of a circuit that remains in steady state is well defined by this quantity. However the total expense of circuits that function out of steady state must be calculated as the integral of the entropy dissipation rate in Equation (10) over the entire time the circuit is active, τ p , such as the duration of the cell cycle or the interval between new inputs that kick the system into the initial non-equilibrium state. After some time the circuit will relax to steady state (see the diagram in Figure 2) and its energetic expense is well described by the steady state dissipation. But the initial non-equilibrium steady state costs the system some energy. We can compare the performance of circuits with different regulatory designs by considering the average energy expenditure until a given time τ p : Figure 2. Schematic representation of system's relaxation. The entropy dissipation rate,σ(τ) relaxes with time to its steady state value,σ ss . At τ p the system is "kicked out" or reset, thus the pink area represents the total energy dissipated until that time. The information is collected at an earlier readout time τ.
We can foresee that circuits that spend most of their time in steady state will have their expenditure dominated by σ ss , whereas circuits that spend a lot of time relaxing to steady state will be dominated by the additional out of steady state dissipation cost ∆Σ = Σ avg −σ ss . When τ p → ∞, all circuits spend most of their time in steady state and the average integral in (14) converges toσ(τ) →σ ss as τ → ∞, so that the cost is dominated by the steady state dissipation.
Using the steady state distribution for model S and Equation (11) we can evaluate the non-rescaled steady state dissipation calculated for the model without feedback [66] σ ss (u, s) = (s − 1)u log 2 (s) If we impose a non-equlibrium state by setting s → 0, the dissipation rescaled by the characteristic decay time (the lowest non-zero eigenvalue given by the minimum of the two non-zero eigenvalues 1 + s, and 2u) tends to infinitŷ as expected. We also verify numerically that even in a non-steady state system that is kept out of equilibrium (Equation (10)) the rescaled dissipation (Equation (16)) tends to infinity,σ = ∞ as s → 0, for all τ, µ 0 and u. The steady state dissipation rescaled by the smallest eigenvalue for models F andF is [66]: where

Results
The task is to find maximal mutual information between the input and the output, with or without constraints, for all model variants, (regulation with and without feedback; starting at steady state, or starting out of steady state) and compare their performance-the amount of information transmitted and the energy dissipated. To build intuition we first summarize the results of the unconstrained optimization obtained by Mancini et al. [65]. Then, a constraint will be set on the steady state dissipation rateσ ss as in Mancini et al. [66]. We extend the latter results to modelsS andF by performing the optimization also with respect to the initial distribution. Finally, to compare not only the information transmitted in the models, but also its cost, we will calculate the average dissipation of the models.
In all cases we are looking for the maximum mutual information between the input at time 0 and the output at time τ, in the space of parameters (u, s and r for the model without feedback and α, y, s and r for the model with feedback). We can also treat the initial distribution (parametrized by a single parameter, µ 0 ), as an additional constraint or set µ 0 to be equal to µ ss 0 , i.e., fix the initial distribution to be the steady state one. Optimizing with a constraint is looking for the maximum of the function not in the whole parameter space (R N + ), but on the manifold given by σ ss (parameters) = constraint. Finally, to compare not only the information transmitted in the models, but also its cost, we will calculate the average dissipation of the models.

Unconstrained Optimization
The results of the unconstrained optimization are summarized in Figure 3. As expected the maximum amount of information that can be transmitted decays with the readout time for all models. Feedback allows for better information transmission only in the case when the initial distribution is fixed to its steady state value. Optimizing over the initial distribution renders the models considered here without (F) and with feedback (S) equivalent. In this case the system relies on its initial condition and information loss is due to the system decorrelating and loosing information about its initial state. For a fixed initial distribution the model with feedback performs better than the model without feedback. We note that the feedback model considered here is a simplified model compared to the one studied in Mancini et al. [65], with less parameters. A full asymmetric model with feedback can transmit more information than a model without feedback if the initial conditions are not in steady state. However these architerctures correspond to infinite dissipation solutions since all backward rates are forbidden and the circuit can never regain its initial state since one of the states i becomes absorbing, p ∞ (y ) = δ y ,i , and attracts the whole probability weight. We are therefore restricting our exploration of models with feedback to the subclass without an absorbing steady state. Figure 3. Results of the unconstrained optimization-mutual information for the models without feedback (S andS) and with feedback (F andF) with respect to the readout time τ. Optimization done both when the initial distribution is fixed to its steady state value (no tilde) and when the parameter is subjected to optimization as well (with tilde).
The modes of regulation of the circuits corresponding to the optimal solutions were discussed in previous work [65,66]. In short, the information-optimal steady state system uses rates that break the detailed balance and induce an order in visiting the four states i. Feedback increases the transmitted information for long time delays by implementing these cycling solutions using a mixture of fast and slow rates. Allowing for out of steady state initial conditions, circuits relax to absorbing final states that need to be externally reset. In this case the optimal solution with and without feedback both result in the stochastic processes cycling through the four states and simply relies on the decorrelation of the initial state.

Constrainingσ ss
We next looked for rates that maximize the transmitted information I[x τ , z 0 ] at a fixed time τ given a fixed steady state dissipation rateσ ss . We first plot the maximal mutual information as function of the readout time, τ, for models without feedback, S (dashed lines) andS (solid lines), (Figure 4). Not surprisingly, maximum information is a decreasing function of τ for both models, larger values of steady state dissipation,σ ss , allow for more information transmitted, and modelS with optimized initial conditions transmits more information than model S, which remains in steady state. However, comparing all four models, the conclusion about the equivalence of the out of steady state model with (F) and without (S) feedback no longer holds when we constrainσ ss ( Figure 5). The difference between optimal mutual information transmitted in modelsS andF is higher for systems that have smaller dissipation budgetsσ ss , and, as shown previously (Figure 3), the difference vanishes asσ ss → ∞. The remaining conclusions from Figure 4 hold: models with feedback transmits more information than models without feedback and models with free initial distributions transmit more information than the steady state models, as in the unconstrained optimization case (Figure 3).
Phase diagrams describing the optimal modes of regulation for steady state circuits are reported in Mancini et al. [66]. At large dissipation rates, the optimal out-of-equilibrium cicruits exploit the increased decorrelation time of the system since cycling solutions are permitted. Close to equilibrium, circuits with no feedback cannot transmit a lot of information. Circuits with feedback use a combination of slow and fast rates to transmit information. The optimal close to equilibrium regulatory functions rapidly align the two variables z t and x t (y > α, s small), and slowly anti-aligns them, increasing the probability to be in the aligned (+, +) and (−, −) states. This results in a positive feedback loop. The same strategy of adjusting rates is used far from equilibrium but this time results in a cycling solution which translated into a negative feedback loop (α > y, s ≈ 0). Figure 5. Results of the optimization problem with constrained steady state dissipation for all four models. Optimal mutual information as function of the readout time, τ, for two different constrained steady state dissipation rates,σ ss , for the models S and F (dashed lines), and the modelsS andF (solid lines).
Allowing the circuit to function out of steady state optimizes the initial condition µ 0 to be as far as possible from the steady state. The optimal initial condition is µ 0 = 1, where only the aligned states are occupied (the initial distribution is p 0 = (0.5, 0, 0, 0.5)). This initial condition combined with u < r and s < r ( Figure A1) decreases the decorrelation time and even a circuit with no feedback can transmit non-zero information. The rates of the circuits without feedback are simply set by the dissipation constraint, with s → 0 for large dissipation and taking the value to balance u close to equilibrium ( Figure A1). Optimal circuits far from equilibrium were reported in Mancini et al. [66] and close to equilibrium are shown in Figure 6. Circuits with feedback also mostly rely on the decorrelation of the initial state. Since the majority of the initial probability weight is in the aligned states, the y and α are always roughly equal ( Figure A2). Only at intermediate dissipation rates, y slightly smaller than α and small s stabliize the initial aligned states and further decrease the decorrelation time ( Figure 6), encoding small negative feedback in the circuit.  Figure A1 (modelS) and Figure A2 (modelF). The depicted circuits are close to equilibrium. The gray arrow indicates a smaller rate than the black arrow. Optimal non-steady state initial states that have highest probability are shown in red.
To summarize, for allσ ss < ∞, as well as for circuits that have no constraints onσ ss , we found

Cost of Optimal Information
The maximum information is obtained for maximum allowed steady state dissipation. Interestingly the steady state dissipationσ ss combined with the circuit topology impose a constraint on the maximum allowed Σ avg (τ p ). This result follows from the fact that the system strongly relies on the initial condition to increase the information transmitted at small times. Larger µ 0 values allow the system to transmit more information, since the equilibration time is longer. However, fixing the value ofσ ss constrains the allowed value of µ 0 that determine the initial condition. To gain intuition, additionally to fixingσ ss , we will fix the mean dissipation Σ avg (τ p ) until a reset time τ p > τ and find the transition rates returning the optimal mutual information for a chosen readout time τ ≤ τ p . The results of this optimization presented in Figure 7, show that as Σ avg increases, µ 0 tends towards 1, which corresponds to a probability distribution where only the asymmetric states (p 0 = (0.5, 0, 0, 0.5)) are occupied and the transmitted information increases. Further increasing dissipation shows that theσ ss constraint can be satisfied in two ways: either by a positive or negative µ 0 . Not only does the positive µ 0 transmit more information but the negative µ 0 is forbidden by our choice of r = 1. Above a certain value ofσ ss only the forbidden negative µ 0 = −1 branch corresponding to an initial distribution with all the weight in the anti-aligned states p 0 = (0, 0.5, 0.5, 0) remains (if we chose the counter clockwise solutions by fixing s = 1, this probability vector would have been the maximally informative initial state). The system cannot fulfill the constraint of such high dissipation. If we do not constrainσ ss we find that the maximum information corresponds to µ 0 = 1 [65], which we report in our analysis below.
We have seen that for both models, if we can choose the initial distribution instead of starting from the steady state, we can significantly increase the transmitted information. What is the "cost" of this choice of initial distribution? To estimate this total cost we calculate the average dissipation during time τ p > τ, τ p Σ avg (τ p ), for the circuit with the highest mutual information attainable for a given steady state dissipation rate rateσ ss if we allow the initial condition to be out of the steady state ( Figure 2). We also introduce the relaxation cost, τ p (Σ avg −σ ss ) ( Figure 8A), as the additional energy dissipated above the steady state value. As argued already, the systems that starts at steady state, i.e., for which µ 0 = µ ss 0 , will not pay an additional cost (see Figure 2, for µ 0 = µ ss 0 the function ofσ(τ p ) is constant, equal toσ ss ). In this case the mean total dissipation, Σ avg (τ p ), will be equal toσ ss and the relaxation cost goes to zero.  Figure 8B, the total cost (z-axis, in colour) generated was only slightly larger forS than for S and the difference is more pronounced only for relatively smallσ ss , where the cost in the steady state circuits goes to zero. This result holds for different combinations of delay readout times τ and reset times τ p , although the value of the total cost naturally increases with τ p . As discussed above, more information can be transmitted at shorter times and by optimizing over the initial condition.

As shown in
In order to quantify the intuition thatS transmits more information than S at a small price, we plotted in Figure 8C the information gain, I * − I ss , and the relaxation cost with respect to τ p (σ ss ). I * − I ss is the difference between the optimal information when the initial distribution is free to be optimized over (S) and the optimal information for the system with a steady state initial distribution (S). It quantifies the additional cost from optimizing the initial condition of the gain in information transmission. The relaxation cost is almost the same regardless of the reset time, τ p . The relaxation cost and the information gain decrease with increasing steady state dissipation,σ ss , as in this regime even the steady state system is able to have slow decorrelation by tuning the switching rates.
This analysis shows that higher optimal mutual information obtained by optimizing over the initial distribution does not generate significantly higher costs. The same result holds when comparing models with feedback F andF ( Figure 8D). The information increase from feedback in theF model with optimized initial conditions compared to the F steady state model is minimal at largeσ ss (as expected from Figure 5). While theF model with feedback always transmits more information than theS model without feedback, the total average cost for allσ ss is smaller for theF model with feedback than for theS model without feedback. This results means that even when feedback does not increase the transmitted information compared to models without feedback, it decreases the total cost.
The information gain of circuits with optimized initial conditions compared to steady state circuits is larger for theS model without feedback than theF model with feedback ( Figure 8E) and the relaxation cost decreases monotonically with increasingσ ss . In both the case with and without feedback there is a non-zero and non-infinite value of steady state dissipation where the information gain from optimizing the initial condition is largest. In summary, optimizing the initial condition nearly always incurs a cost, however it absolutely always results in a significant information gain. Table 1 summarizes the comparison of the optimal transmitted information I(M) and the total cost C(M) for all four models M ∈ {S,S, F,F}. Table 1. Comparison between the four models, S, F,S, andF in terms of optimal mutual information, I opt , and the cost (value of Σ avg calculated with optimal rates), C. for models without feedback, that start with the steady state distribution, S, and that optimize the initial distribution,S. Results shown for two choices of reset τ p and readout τ timescales. For the steady state models τ p Σ avg = τ pσ ss . (C) The information gain, I * − I ss , of the optimized initital condition model (S) compared to the steady state initial condition model (S) and the relaxation cost, τ p (Σ avg−σ ss ), as a function of the steady state entropy dissipation rate for the same choices of τ p and τ as in panel (B).
(D) Comparison of the optimal delayed information and total dissipative cost as a function of the steady state entropy dissipation rate for all four models: without feedback (S,S) and with feedback (F,F), with the initial distribution equal to the steady state one (S, F) or optimized over (S,F). τ = τ p = 0.5. (E) The information gain and relaxation cost of circuits with optimized initial conditions compared to steady state ones for the models with (F) and without feedback (S). τ = τ p = 0.5.

Suboptimal Circuits
We found the parameters of the stochastic processes, including the initial conditions, that optimally transmit delayed information between the two variables given a constraint onσ ss . However the real initial stimulus may deviate from the optimal one, due to random fluctuations of the environment. To see how an much information an optimal circuit can transmit for different initial conditions, we took the optimal parameters for different fixedσ ss and readout delay τ, varied the initial condition µ 0 and evaluated the transmitted information and the mean dissipation Σ avg (τ p ) for both models:S andF (Figure 9). We find that while information always decreases ( Figure 9A,C,E for modelS and Figure 9G,I,K for modelF), as expected, the mean dissipation can be smaller for unexpected values of the initial condition ( Figure 9B,D,F for modelS and Figure 9H,J,L for modelF). The transmitted information of the suboptimal circuits is larger than that of the optimal steady state circuit for many values of µ 0 , especially those close to the optimum of the non-steady state circuit (µ 0 = 1). The same conclusions hold for suboptimal circuits with and without feedback. The range of µ 0 values where suboptimal circuits provide an information gain is smaller for circuits with feedback than without feedback, due to the already large information transmission capacity of steady state circuits with feedback. . Information for modelS (panels (A,C,E)) and modelF (panels (G,I,K)) and Σ avg (τ p ) for modelS(panels (B,D,F)) and modelF (panels (H,J,L)) of information-optimal circuits with µ 0 = 1 evaluated for different values of the initial condition µ 0 . The circuits parameters are evaluated by optimizing information transmission for τ = 0.5 (A,B), τ = 1 (C,D) and τ = 2 (E,F) and fixedσ ss = 0.15 (blue lines),σ ss = 0.35 (magenta lines),σ ss = 0.75 (green lines). τ p = τ in all plots. For comparison we plot the optimal information of the steady state circuit S and F, respectively, optimized for the same steady state dissipationσ ss and readout delay τ (solid lines). The information always decreases for non-optimal values of µ 0 but the mean dissipation can be smaller for unexpected initial conditions.

Gene Regulatory Circuits
The coupled two state system model considered above can be thought of as a simplified model of receptor-ligand binding. It can also be considered as an overly simplified model of gene regulation where the input variable describes the presence or absence of a transcription factor and the output-the activation state of the regulated gene. However, the continuous nature of transcription factor concentrations has proven important when considering information transmission in these systems [53,54]. We will not repeat the whole optimization problem for continuous variables but we calculate and discuss the form of dissipation in the simplest gene regulatory module that can function out of equilibrium.

Bursty Gene Regulation
The simplest gene regulatory system that can function out of equilibrium is a model that accounts for transcriptional bursts [67][68][69][70][71][72][73]. The promoter state has two possible states: a basal expression state where the gene is read out a basal rate R 0 and an activated expression state where the gene is read out at rate R 1 . The promoter switches between these two states by binding a transcription factor present at concentration c, with rate k + and unbinds at a constant rate k − . The probability that there are g product proteins of this gene in the cell (we integrate out the mRNA state due to a separation of timescales) is P(g) = P 0 (g) + P 1 (g), where P 0 (g) describes the probability that the promoter is in the basal state and there are g proteins and P 1 (g) describes the analogous probability for the promoter to be in the activated state. The probability distribution evolves both due to binding and unbinding of the transcription factor and to protein production and degradation (with rate τ −1 ) according to These equations can be solved analytically in steady state in terms of special functions [79,80]. In the limit of fast promoter switching (k + and k − go to infinity and their ratio K ≡ k + /k − is constant) the system is well described by a Poisson distribution where R e f f is an effective production rate: The total steady state dissipation σ ss = σ 0 + σ 1 + σ 2 calculated from Equation (11) can be split into three parts, where The first two expressions can be simplified using the normalization relations ∑ g P * 0 (g) + P * 1 (g) = 1 and ∑ g P * 1 (g) = k + c k − +k + c obtaining: We now use these results to examine steady state dissipation in the equilibrium limit and the limit of the fast switching promoter. Similar results but in slightly different limits were obtained in reference [30]. Equilibrium Limit. Equilibrium is surely achieved if there is only one promoter state. In terms of our model this corresponds to k + is vanishing and k − = 0. In this limit the activated state is never occupied and the steady state probability goes to P * 1 (g) ≡ 0. Equations (21) and (22) result in a Poisson distribution with mean R 0 τ and we can verify that detailed balance is satisfied as confirmed by σ 2 = −σ 1 in Equations (25)- (27). Fast promoter switching limit. In the fast promoter switching limit the dissipation of the system is: σ FS is always positive, but the equilibrium regime is reached only if k − or k + asymptotically vanish. For finite binding and unbinding rates the system is not in equilibrium despite being well described by an equilibrium-like steady state probability distribution. Since this example is mainly presented as a pedagogical application of dissipation, for completeness we derive similar results in the Langevin description in Appendix D, discussing the differences in dissipation arising from model coarse graining [81][82][83].

Discussion
All living organisms, even the most simple ones, in order to adapt to the environment, must read and process information. In the case of cells, transmitting information means sensing chemical stimuli via receptors and activating biochemical pathways in response to these signals. Such reading and transmitting signals comes at a price-it consumes energy. There are plenty of possible designs of these regulatory circuits, yet not all of them are found in nature [2]. The question arises why some network regulatory functions are frequent and others non-existing. One way to approach such a question is to optimize a (specific) function by choosing the circuit's regulatory function. The choices of optimzed function that have been considered include noise (minimization) [11], time-delay of response (minimization) [2] or information transmitted between the input and output (maximization) [56].
Two different circuits can produce and use the same amount of proteins, but the energy dissipated in them is different. In other words, we assume that while ATP is certainly needed in a molecular circuit, it is part of the hardware of the network and cannot be modified a lot. Instead, we asked about the best regulatory functions (software) we can implement, given a certain set of hardware. For this reason we worked with a simplified binary representation of the circuits to concentrate on the regulatory computation and turned the problem of finding the optimal regulatory function into finding the optimal parameters of stochastic processes.
Our main previous findings about steady state circuits can be related to tasks performed by the circuits [66]. Circuits that function close to equilibrium transmit information optimally using positive feedback loops that are characteristic of long-term readouts responsible for cell fate commitment [84,85]. Circuits that function far from equilibrium transmit information using negative feedback loops that are representative of shock responses that are transient but need to be fast [86,87]. Therefore cells may implement non-equilibrium solutions when fast responses are needed and rely on equilibrium responses when averaging is possible and there is no rush. This results agrees with the general finding of Lan et al. [29] for continuous biochemical kinetics that negative feedback circuits always break detailed balance and such circuits function out of equilibrium.
In general in steady state we find that models with feedback significantly outperform models without feedback in terms of optimal information transmission between the two variables, but the respective costs of optimal information transmission are the same. Circuits close and far to equilibrium rely on a mixture of slow and fast timescales to delay relaxation and transmit information. The only other solution available in our simple setting is using the initial condition, which is efficient in terms of information transmission but costly.
Here we identified two properties linked to feedback: it does not necessarily transmit more information if we are allowed to pick an optimal initial condition compared to a system without feedback. Yet in this case implementing a circuit with feedback can reduce the non-equilibrium costs. In general, introducing an optimized intitial condition incurs a cost, but this cost is often minimal, especially taking into account the information gained. This cost is interpretable biologically as the external energetic cost needed to place the system in a specific initial condition. This cost must be provided by the work of another regulatory element or circuit or an external agent or force. This specific initial condition requires poising the system in a specific point. Yet it does not seem biologically implausible, let alone impossible, to "prepare" the intitial state after cell division or mitosis, or upon entering a new phase of the cell cycle [88]. For example, a specific gene expression state or receptor state (e.g., (+, +) or (−, −)) seems easily attainable. Modifying the initial conditions from the optimal µ 0 in circuits that function out of steady state decreases the transmitted information but can also decrease the mean dissipation. Therefore optimizing preparing the system out of steady state may still be a useful strategy for transmitting information.
One could look at these results from two perspectives: on the one hand argue that circuits with feedback transmit more information in the steady state setting; on the other hand feedback exhibits frugality in expenses in the case of optimized initial distributions. One could also defend the models without feedback stating that they can be only slightly worse in terms of information transmission (optimized initial distribution case) and can be found to dissipate the same amount of energy (steady state initial distribution). All circuits will reach steady state, however especially during fast processes such as development [89] or stress response [87], the information transmitted during short times may be what matters for downstream processes. In general regardless of the timescale, circuits with feedback perform better (or equally well) than regulatory system with no feedback, both in terms of information transmission and the cost of transmitting this optimal information.
The learning rate is another quantity that has been useful in studying bipartite systems in stochastic thermodynamics [43][44][45]. The learning rate, defined as l x = ∂ τ I[z τ , x t+τ ]| τ=0 , gives the instantaneous increase in information that the output variable has by continuing to learn about the input variable. We calculate the learning rate for our informationally-optimal models when they are in steady state ( Figure A3). For models without feedback the learning rate is bounded by σ x (as defined in Appendix E), such that η = x /σ x ≤ 1. It this case the interpretation of the learning rate allows us to estimate how closely the output variable is following the input variable and positive learning rates are indicative of adaptation and learning. Not surprisingly we find that the model with steady state initial conditions has a larger learning rate than the model with optimized initial conditions since modelÃ relies less on the parameters of the network than model A to transmit information and more on the initial conditions (that are forgotten in the steady state calculation). Calculating a time delay dependent learning rate would be more informative. The learning rate also increases withσ, in agreement with previous statements that learning is easier far from equilibrium [29,30,43]. We also performed the same calculation for models with feedback but as was pointed out previously [44,90,91], the interpretation of the learning rate becomes less clear in these systems since input and output are no longer clearly defined. Instead the above one-sided definition should be replaced by a time integral over the trajectory to distinguish if the learning is of the other variable (z) or a previous instance of the same variable (x t−τ ). The calculated quantity instead tells us about the ability of x to respond to z, assuming z was fluctuating freely. In that sense a positive value of l x tells us that the dynamics of the two variables of the circuit are not completely decoupled in steady state, except in the case of model F close to equilibrium. Our results tell us that equilibrium imposes a symmetry between input and output, which is broken either by initial conditions (F at smallσ) or large dissipation.
Lastly, for pedagogical purposes we attempted to discuss the link between dissipation calculations that are often performed on binary regulatory systems and continuous variables, showing that the simplest model of bursty transcription can result in non-zero dissipation, even in the fast switching limit where the steady state equilibrium Poisson distribution is recovered. Bursty gene expression is wide spread from bacteria [71,72], yeast [92] to invertebrates [73,89] and mammals [68]. Bursty self-activating genes in intermediate fast switching regimes have also been shown to have different stability properties than pure equilibrium systems, due to non-equilibrium cycling through the coupled promoter and protein states [93]. While cells are not energy limited, the discussion recounted in this paper may suggest that different modes of regulation (including burstiness) may be better suited to slow and fast responses.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. Model without Feedback
The transition matrix for the model without feedback reads: where the rates are defined in Figure 1A. By matrix diagonalization we find the eigenvalues and eigenvectors and calculate the probability distribution p(x τ , z 0 ) at time τ for the four states, and The steady state distribution is given by the eigenvector corresponding to the zeroth eigenvalue, These results allow us to calculate

Appendix B. Model with Feedback
The transition matrix for the model with feedback reads defined in Figure 1B:  Figure A1. The optimal parameters as a function of the readout delay, τ, for the models without feedback, S andS, at different constrained steady state dissipation ratesσ ss .
The detailed derivation of the steady state quantities and eigenvalues is given in Mancini et al. [66]. Here we just summarize the main results. The steady state probability distribution is: where we have defined A and ρ in Equations (18) and (19).
with q = (1 + y − s − α)/A and the rescaled time τ = tλ.  Figure A2. The optimal parameters as a function of the readout delay τ for models with feedback, F andF, at different constrained steady state dissipation ratesσ ss .

Appendix C. Entropy Production Rate
In this Appendix we present an alternative derivation of dissipation. We denote probability of state i by p i and the entropy of the distribution is defined as: The entropy production rate formula is derived by differentiating the entropy with respect to time: Denoting by w ij the transition rate from state i to state j, we obtainṗ i (t) = ∑ j =i w ji p j (t) − w ij p i (t).
We define w ii as − ∑ j,j =i w ij , so that we can write compactlyṗ i (t) = (A11) used (n, g) instead of the standard form (δn, δg) to describe fluctuations. Equations (A15) and (A16) can be recast into the matrix form formẊ = −AX + ξ with and the noise correlation matrix ξ(t)ξ(t ) = 2Dδ(t − t ) is The correlation matrix Σ can be computed with standard methods [56], by inverting the relation D = AΣ + ΣA t :

Entropy Production
The probability of a trajectory of a multivariate Langevin process can be calculated via the Onsager-Machlup formalism. Using this probability as starting point, the dissipation can be exactly derived (see Ref. [94], where the computation is done in detail and in a self-contained fashion). For the case of symmetric variables under time reversal the entropy production can be written in a compact form, where we have the index k runs over all the variables:  Figure A3. The learning rate for the output variable x as a function of the rescaled steady state dissipation,σ ss , calculated at steady state for models with (F andF) and without feedback (S andS). ModelsS andF have optimized initial conditions (that do not enter this calculations except for the optimal parameters) and models S and F are constrained to have initial conditions in steady state. Equation (A23) can be simplified by considering that all terms which are exact derivatives are not extensive in time (terms like t 0 dt n(t )ṅ(t ) = 1 2 n 2 (t) − n 2 (0) or its equivalent in g can be neglected in the large t limit). All the steady state correlations are also time translational invariant, i.e., t 0 dt n(t )ġ(t ) ≡ t nġ . As a result, the dissipation is: The correlation nġ in Equation (A24) can be computed by replacingġ with Equation (A16), yielding nġ = R nn − 1 τ ng . Substituting this expression into Equation (A21) we obtain: where τ s = (ck + + k − ) −1 . Note that in the limit τ s → 0 the dissipation is not dependent on c and equal to σ LE 0 = R/(1 + cK), where K is equal to k + /k − . Additionally, for K → ∞ (which corresponds to no flux to the inactive state, k − → 0) the dissipation vanishes like in the master equation formulation (Equation (30)).
As final remark, we note that a Langevin formulation is a coarse grained description of the Master equation approach described in Section 7.1. This kind of coarse graining procedure integrates away degrees of freedom which can carry non-equilibrium currents and can lead to lower values of dissipation [81][82][83]. For instance, consider the limit R 0 ≈ small but finite, Equation (30) becomes σ ME = cK(1 + R 1 )/(1 + cK) 2 log R 1 and one finds σ ME > σ LE .
For models with feedback (F andF) the learning rate is harder to interpret since the input no longer changes independently of the output. Formally we can still calculate the quantity in Equation (A27) as l x = y(s − α) α + s + y + 1 log y + 1 and σ x = y(s − α) α + s + y + 1 log s.
The informational efficiency is: which is bounded by 1 only if sy ≤ α (see Figure A3).