General H-theorem and entropies that violate the second law

$H$-theorem states that the entropy production is nonnegative and, therefore, the entropy of a closed system should monotonically change in time. In information processing, the entropy production is positive for random transformation of signals (the information processing lemma). Originally, the $H$-theorem and the information processing lemma were proved for the classical Boltzmann-Gibbs-Shannon entropy and for the correspondent divergence (the relative entropy). Many new entropies and divergences have been proposed during last decades and for all of them the $H$-theorem is needed. This note proposes a simple and general criterion to check whether the $H$-theorem is valid for a convex divergence $H$ and demonstrates that some of the popular divergences obey no $H$-theorem. We consider systems with $n$ states $A_i$ that obey first order kinetics (master equation). A convex function $H$ is a Lyapunov function for all master equations with given equilibrium if and only if its conditional minima properly describe the equilibria of pair transitions $A_i \rightleftharpoons A_j$. This theorem does not depend on the principle of detailed balance and is valid for general Markov kinetics. Elementary analysis of pair equilibria demonstrates that the popular Bregman divergences like Euclidean distance or Itakura-Saito distance in the space of distribution cannot be the universal Lyapunov functions for the first-order kinetics and can increase in Markov processes. Therefore, they violate the second law and the information processing lemma. In particular, for these measures of information (divergences) random manipulation with data may add information to data. The main results are extended to nonlinear generalized mass action law kinetic equations. In Appendix, a new family of the universal Lyapunov functions for the generalized mass action law kinetics is described.


The Problem
The first non-classical entropy was proposed by Rényi in 1960 [1]. In the same paper he discovered the very general class of divergences, the so-called f -divergences (or Csiszár-Morimoto divergences because of the works of Csiszár [2] and Morimoto [3] published simultaneously in 1963): where P = (p i ) is a probability distribution, P * is an equilibrium distribution, h(x) is a convex function defined on the open (x > 0) or closed x ≥ 0 semi-axis. We use here the notation H h (P P * ) to stress the dependence of H h on both p i and p * i . These divergences have the form of the relative entropy or, in the thermodynamic terminology, the (negative) free entropy, the Massieu-Planck functions [4], or F/RT where F is the free energy. They measure the deviation of the current distribution P from the equilibrium P * .
After 1961, many new entropies and divergences were invented and applied to real problems, including Burg entropy [5], Cressie-Red family of power divergences [6], Tsallis entropy [7,8], families of α-, βand γ-divergences [9] and many others (see the review papers [10,11]). Many of them have the f -divergence form, but some of them do not. For example, the squared Euclidean distance from P to P * is not, in general, a f -divergence unless all p * i are equal (equidistribution). Another example gives the Itakura-Saito distance: The idea of Bregman divergences [12] provides a new general source of divergences different from the f -divergences. Any strictly convex function F in an closed convex set V satisfies the Jensen inequality if p = q, p, q ∈ V . This positive quantity D F (p, q) is the Bregman divergence associated with F . For example, for a positive quadratic form F (x) the Bregman distance is just D F (p, q) = F (p − q). In particular, if F is the squared Euclidean length of x then D F (p, q) is the squared Euclidean distance. If F is the Burg entropy, F (x) = − i ln p i , then D F (p, q) is the Itakura-Saito distance. The Bregman divergences have many attractive properties. For example, the mean vector minimizes the expected Bregman divergence from the random vector [13]. The Bregman divergences are convenient for numerical optimization because generalized Pythagorean identity [14]. Nevertheless, for information processing and for many physical applications one more property is crucially important.
The divergence between the current distribution and equilibrium should monotonically decrease in Markov processes. It is the ultimate requirement for use of the divergence in information processing and in non-equilibrium thermodynamics and kinetics. In physics, the first result of this type was Boltzmann's H-theorem proven for nonlinear kinetic equation. In information theory, Shannon [15] proved this theorem for the entropy ("the data processing lemma") and Markov chains.
In his well-known paper [1], Rényi also proved that H h (P P * ) monotonically decreases in Markov processes (he gave the detailed proof for the classical relative entropy and then mentioned that for the f -divergences it is the same). This result, elaborated further by Csiszár [2] and Morimoto [3], embraces many later particular H-theorems for various entropies including the Tsallis entropy and the Rényi entropy (because it can be transformed into the form (1) by a monotonic function, see for example [11]). The generalized data processing lemma was proven [16,17]: for every two positive probability distributions P, Q the divergence H h (P Q) decreases under action of a stochastic matrix A = (a ij ) where is the ergodicity contraction coefficient, 0 ≤ α(A) ≤ 1. Here, neither Q nor P must be the equilibrium distribution: divergence between any two distributions decreases in Markov processes. Under some additional conditions, the property to decrease in Markov processes characterizes the f -divergences [18,19]. For example, if a divergence decreases in all Markov processes, does not change under permutation of states and can be represented as a sum over states (has the trace form), then it is the f -divergence [11,18].
The dynamics of distributions in the continuous time Markov processes is described by the master equation. Thus, the f -divergences are the Lyapunov functions for the master equation. The important property of the divergences H h (P P * ) is that they are universal Lyapunov functions. That is, they depend on the current distribution P and on the equilibrium P * but do not depend on the transition probabilities directly.
For each new divergence we have to analyze its behavior in Markov processes and to prove or refute the H-theorem. For this purpose, we need a simple and general criterion. It is desirable to avoid any additional requirements like the trace form or symmetry. In this paper we develop this criterion.
It is obvious that the equilibrium P * is a global minimum of any universal Lyapunov function H(P ) in the simplex of distributions (see the model equation below). In brief, the general H-theorem states that a convex function H(P ) is a universal Lyapunov function for the master equation if and only if its conditional minima correctly describe the partial equilibria for pairs of transitions A i A j . These partial equilibria are given by proportions p i /p * i = p j /p * j . They should be solutions to the problem H(P ) → min subject to p k ≥ 0 (k = 1, . . . , n), n k=1 p k = 1, and given values of p l (l = i, j) These solutions are minima of H(P ) on segments p i + p j = 1 − l =i,j p l , p i,j ≥ 0. They depend on n − 2 parameters p l ≥ 0 (l = i, j, l =i,j p l < 1).
Using this general H-theorem we analyze several Bregman divergences that are not f -divergences and demonstrate that they do not allow the H-theorem even for systems with three states. We present also the generalizations of the main results for Generalized Mass Action Law (GMAL) kinetics.

Three Forms of Master Equation and the Decomposition Theorem
We consider continuous time Markov chains with n states A 1 , . . . , A n . The Kolmogorov equation or master equation for the probability distribution P with the coordinates p i (we can consider P as a vector-column P = [p 1 , . . . , p n ] T ) is dp i dt = j, j =i (q ij p j − q ji p i ) (i = 1, . . . , n) where q ij (i, j = 1, . . . , n, i = j) are nonnegative. In this notation, q ij is the rate constant for the transition A j → A i . Any set of nonnegative coefficients q ij (i = j) corresponds to a master equation. Therefore, the class of the master equations can be represented as a nonnegative orthant in R n(n−1) with coordinates q ij (i = j). Equations of the same class describe any first order kinetics in perfect mixtures. The only difference between the general first order kinetics and master equation for the probability distribution is in the balance conditions: the sum of probabilities should be 1, whereas the sum of variables (concentrations) for the general first order kinetics may be any positive number. It is useful to mention that the model equation with equilibrium P * and relaxation time τ is a particular case of master equation for normalized variables p i (p i ≥ 0, i p i = 1). Indeed, let us take in Equation (7) q ij = 1 τ p * i . The graph of transitions for a Markov chain is a directed graph. Its vertices correspond to the states A i and the edges correspond to the transitions A j → A i with the positive transition coefficients, q ij > 0. The digraph of transitions is strongly connected if there exists an oriented path from any vertex A i to every other vertex A j (i = j). The continuous-time Markov chain is ergodic if there exists a unique strictly positive equilibrium distribution P * (p * i > 0, i p * i = 1) for master equation (7) [20,21]. Strong connectivity of the graph of transitions is necessary and sufficient for ergodicity of the corresponding Markov chain.
A digraph is weakly connected if the underlying undirected graph obtained by replacing directed edges by undirected ones is connected. The maximal weakly connected components of a digraph are called connected (or weakly connected) components. The maximal strongly connected subgraphs are called strong components. The necessary and sufficient condition for the existence of a strongly positive equilibrium for master equation (7) is: the weakly connected components of the transition graph are its strong components. An equivalent form of this condition is: if there exists a directed path from A i to A j , then there exists a directed path from A j to A i . In chemical kinetics this condition is sometimes called the "weak reversibility" condition [22,23]. This implies that the digraph is the union of disjoint strongly connected digraphs. For each strong component of the transition digraph the normalized equilibrium is unique and the equilibrium for the whole graph is a convex combination of positive normalized equilibria for its strong components. If m is the number of these components then the set of normalized positive equilibria of master equation (P * : p * i > 0, i p * i = 1) is a relative interior of a m − 1-dimensional polyhedron in the unit simplex ∆ n . The set of non-normalized positive equilibria (P * : p * i > 0) is a relative interior of a m-dimensional cone in the positive orthant R n + .
We reserve notation R n + for the positive orthant and for the nonnegative orthant we use R n + (the closure of R n + ) The Markov chain in Equation (7) is weakly ergodic if it allows the only conservation law: the sum of coordinates, i p i ≡ const. Such a system forgets its initial condition: the distance between any two trajectories with the same value of the conservation law tends to zero when time goes to infinity. Among all possible norms, the l 1 distance ( P − Q l 1 = i |p i − q i |) plays a special role: it does not increase in time for any first order kinetic system in master equation (7) and strongly monotonically decreases to zero for normalized probability distributions ( i p i = i q i = 1) and weakly ergodic chains. The difference between weakly ergodic and ergodic systems is in the obligatory existence of a strictly positive equilibrium for an ergodic system. A Markov chain is weakly ergodic if and only if for each two vertices A i , A j (i = j) we can find such a vertex A k that is reachable by oriented paths both from A i and from A j . This means that the following structure exists [24]: One of the paths can be degenerated: it may be i = k or j = k. Now, let us restrict our consideration to the set of the Markov chains with the given positive equilibrium distribution P * (p * i > 0). We do not assume that this distribution is compulsory unique. The transition graph should be the union of disjoint strongly connected digraphs (in particular, it may be strongly connected). Using the known positive equilibrium P * we can rewrite master equation (7) in the following form dp i dt = j, j =i where p * i and q ij are connected by the balance equation For the next transformation of master equation we join the mutually reverse transitions in pairs A i A j in pairs (say, i > j) and introduce the stoichiometric vectors γ ji with coordinates: Let us rewrite the master equation (7) in the quasichemical form: where The reversible systems with detailed balance form an important class of first order kinetics. The detailed balance condition reads [25]: at equilibrium, w + ij = w − ij , i.e., Here, w * ij is the equilibrium flux from A i to A j and back. For the systems with detailed balance the quasichemical form of the master equation is especially simple: It is important that any set of nonnegative equilibrium fluxes w * ij (i > j) defines by Equation (15) a system with detailed balance with a given positive equilibrium P * . Therefore, the set of all systems with detailed balance presented by Equation (15) and a given equilibrium may be represented as a nonnegative orthant in R n(n−1) 2 with coordinates w * ij (i > j). The decomposition theorem [26,27] states that for any given positive equilibrium P * and any positive distribution P the set of possible values dP /dt for Equations (13) under the balance condition (11) coincides with the set of possible values dP /dt for Equations (15) under detailed balance condition (14).
In other words, for every general system of the form (13) with positive equilibrium P * and any given non-equilibrium distribution P there exists a system with detailed balance of the form (15) with the same equilibrium and the same value of the velocity vector dP /dt at point P . Therefore, the sets of the universal Lyapunov function for the general master equations and for the master equations with detailed balance coincide.

General H-Theorem
Let H(P ) be a convex function on the space of distributions. It is a Lyapunov function for a master equations with the positive equilibrium P * if dH(P (t))/dt ≤ 0 for any positive normalized solution P (t). For a system with detailed balance given by Equation (15) The inequality dH(P (t))/dt ≤ 0 is true for all nonnegative values of w * ij if and only is it holds for any term in Equation (16) separately. That is, for any pair i, j (i > j) the convex function H(P ) is a Lyapunov function for the system (15) where one and only one w * ij is not zero. A convex function on a straight line is a Lyapunov function for a one-dimensional system with single equilibrium if and only if the equilibrium is a minimizer of this function. This elementary fact together with the previous observation gives us the criterion for universal Lyapunov functions for systems with detailed balance. Let us introduce the partial equilibria criterion: Definition 1 (Partial equilibria criterion). A convex function H(P ) on the simplex ∆ n of probability distributions satisfies the partial equilibria criterion with a positive equilibrium P * if the proportion p i /p * i = p j /p * j give the minimizers in the problem (6).

Proposition 1.
A convex function H(P ) on the simplex ∆ n of probability distributions is a Lyapunov function for all master equations with the given equilibrium P * that obey the principle of detailed balance if and only if it satisfies the partial equilibria criterion with the equilibrium P * . Figure 1. The triangle of distributions for the system with three states A 1 , A 2 , A 3 and the equilibrium p * 1 = 4 7 , p * 2 = 2 7 , p * 3 = 1 7 . The lines of partial equilibria A i A j given by the proportions p i /p * i = p j /p * j are shown, for A 1 A 2 by solid straight lines (with one end at the vertex A 3 ), for A 2 A 3 and for A 1 A 3 by dashed lines. The lines of conditional minima of H(P ) in problem (6) are presented for the partial equilibrium A 1 A 2 (a) for the squared Euclidean distance (a circle here is an example of the H(P ) level set) and (b) for the Itakura-Saito distance. Between these lines and the line of partial equilibria the "no H-theorem zone" is situated. In this zone, H(P ) increases in time for some master equations with equilibrium P * . Similar zones (not shown) exist near other partial equilibrium lines too. Outside these zones, H(P ) monotonically decreases in time for any master equation with equilibrium P * .
Combination of this Proposition with the decomposition theorem [26] gives the same criterion for general master equations without hypothesis about detailed balance Proposition 2. A convex function H(P ) on the simplex ∆ n of probability distributions is a Lyapunov function for all master equations with the given equilibrium P * if and only if it satisfies the partial equilibria criterion with the equilibrium P * .
These two propositions together form the general H-theorem. Theorem 1. The partial equilibria criterion with a positive equilibrium P * is a necessary condition for a convex function to be the universal Lyapunov function for all master equations with detailed balance and equilibrium P * and a sufficient condition for this function to be the universal Lyapunov function for all master equations with equilibrium P * .
Let us stress that here the partial equilibria criterion provides a necessary condition for systems with detailed balance (and, therefore, for the general systems without detailed balance assumption) and a sufficient condition for the general systems (and, therefore, for the systems with detailed balance too).

Examples
The simplest Bregman divergence is the squared Euclidean distance between P and P * , i (p i −p * i ) 2 . The solution to the problem (6) is: Obviously, it differs from the proportion required by the partial equilibria criterion p i Figure 1a).
For the Itakura-Saito distance (2) the solution to the problem (6) is: 1 It also differs from the proportion required (Figure 1b).
If the single equilibrium in 1D system is not a minimizer of a convex function H then dH/dt > 0 on the interval between the equilibrium and minimizer of H (or minimizers if it is not unique). Therefore, if H(P ) does not satisfy the partial equilibria criterion then in the simplex of distributions there exists an area bordered by the partial equilibria surface for A i A j and by the minimizers for the problem (6), where for some master equations dH/dt > 0 ( Figure 1). In particular, in such an area dH/dt > 0 for the simple system with two mutually reverse transitions, A i A j , and the same equilibrium. If H satisfies the partial equilibria criterion, then the minimizers for the problem (6) coincide with the partial equilibria surface for A i A j , and the "no H-theorem zone" vanishes. The partial equilibria criterion allows a simple geometric interpretation. Let us consider a sublevel set of H(P ) in the simplex ∆ n : U h = {P ∈ ∆ n | H(P ) ≤ h}. Let the level set be L h = {P ∈ ∆ n | H(P ) = h}. For the partial equilibrium A i A j we use the notation E ij . It is given by the equation The geometric equivalent of the partial equilibrium condition is: for all i, j (i = j) and every P ∈ L h ∩ E ij the straight line P + λγ ij (λ ∈ R) is a supporting line of U h . This means that this line does not intersect the interior of U h .
We illustrate this condition on the plane for three states in Figure 2. The level set of H is represented by the dot-dash line. It intersects the lines of partial equilibria (dashed lines) at points B 1,2,3 and C 1,2,3 . For each point P from these six intersections (P = B i and P = C i ) the line P + λγ jk (λ ∈ R) should be a supporting line of the sublevel set (the region bounded by the dot-dash line). Here, i, j, k should all be different numbers. Segments of these lines form a hexagon circumscribed around the level set ( Figure 2b).
The points of intersection B 1,2,3 and C 1,2,3 cannot be selected arbitrarily on the lines of partial equilibria. First of all, they should be the vertices of a convex hexagon with the equilibrium P * inside. Secondly, due to the partial equilibria criterion, the intersections of the straight line P + λγ ij with the partial equilibria E ij are the conditional minimizers of H on this line, and therefore should belong to the sublevel set U H(P ) . If we apply this statement to P = B i and P = C i , then we will get two projections of this point onto partial equilibria E ij parallel to γ ij (Figure 2a). These projections should belong to the hexagon with the vertices B 1,2,3 and C 1,2,3 . They produce a six-ray star that should be inscribed into the level set.
In Figure 2 we present the following characterization of the level set of a Lyapunov function for the Markov chains with three states. This convex set should be circumscribed around the six-ray star ( Figure 2a) and inscribed in the hexagon of the supporting lines ( Figure 2b).
All the f -divergences given by Equation (1) satisfy the partial equilibria criterion and are the universal Lyapunov functions but the reverse is not true: the class of universal Lyapunov functions is much wider than the set of the f -divergences. Let us consider the set "P EC" of convex functions H(P P * ), which satisfy the partial equilibria criterion. It is closed with respect to the following operations Figure 2. Geometry of the Lyapunov function level set. The triangle of distributions for the system with three states A 1 , A 2 , A 3 and the equilibrium p * 1 = 4 7 , p * 2 = 2 7 , p * 3 = 1 7 . The lines of partial equilibria A i A j given by the proportions p i /p * i = p j /p * j are shown by dashed lines. The dash-dot line is the level set of a Lyapunov function H. It intersects the lines of partial equilibria at points B 1,2,3 and C 1,2,3 . (The points B i are close to the vertices A i , the points C i belong to the same partial equilibrium but on another side of the equilibrium P * .) For each point B i , C i the corresponding partial equilibria of two transitions A i A j (j = i) are presented (a). These partial equilibria should belong to the sublevel set of H. They are the projections of B i , C i onto the lines of partial equilibria A i A j (j = i) with projecting rays parallel to the sides [A i , A j ] of the triangle (i.e., to the stoichiometric vectors γ ji (12)). The six-ray star with vertices B i , C i should be inside the dash-dot contour (a). Therefore, the projection of B i onto the partial equilibrium A i A j should belong to the segment [C k , P * ] and the projection of C i onto the partial equilibrium A i A j should belong to the segment [B k , P * ] (a). The lines parallel to the sides A j , A k of the triangle should be supporting lines of the level set of H at points B i , C i (i, j, k are different numbers) (b). Segments of these lines form a circumscribed hexagon around the level set (b).
• Conic combination: if H j (P P * ) ∈ P EC then j α j H j (P P * ) ∈ P EC for nonnegative coefficients α j ≥ 0.
• Convex monotonic transformation of scale: if H(P P * ) ∈ P EC then F (H(P P * )) ∈ P EC for any convex monotonically increasing function of one variable F .
Using these operations we can construct new universal Lyapunov functions from a given set. For example, is a universal Lyapunov function that does not have the f -divergence form because the first sum is an f -divergence given by Equation (1) with h(x) = 1 2 (x − 1) 2 and the product is the exponent of this f -divergence (exp is convex and monotonically increasing function).
The following function satisfies the partial equilibria criterion for every ε > 0.
It is convex for 0 < ε < 1. (Just apply the Gershgorin theorem [28] to the Hessian and use that all p i , p * i ≤ 1.) Therefore, it is a universal Lyapunov function for master equation in ∆ n if 0 < ε < 1. The partial equilibria criterion together with the convexity condition allows us to construct many such examples.

Generalized Mass Action Law
Several formalisms are developed in chemical kinetics and non-equilibrium thermodynamics for the construction of general kinetic equations with a given "thermodynamic Lyapunov functional". The motivation of this approach "from thermodynamics to kinetics" is simple [29,30]: (i) the thermodynamic data are usually more reliable than data about kinetics and we know the thermodynamic functions better than the details of kinetic equations, and (ii) positivity of entropy production is a fundamental law and we prefer to respect it "from scratch", by the structure of kinetic equations.
GMAL is a method for the construction of dissipative kinetic equations for a given thermodynamic potential H. Other general thermodynamic approaches [31][32][33] give similar results for a given stoichiometric algebra. Below we introduce GMAL following [29,30,34].
The list of components is a finite set of symbols A 1 , . . . , A n . A reaction mechanism is a finite set of the stoichiometric equations of elementary reactions: where ρ = 1, . . . , m is the reaction number and the stoichiometric coefficients α ρi , β ρi are nonnegative numbers. Usually, these numbers are assumed to be integer but in some applications the construction should be more flexible and admit real nonnegative values. Let α ρ , β ρ be the vectors with coordinates α ρi , β ρi correspondingly. A stoichiometric vector γ ρ of the reaction in Equation (18) is a n-dimensional vector that is, "gain minus loss" in the ρth elementary reaction. We assume α ρ = β ρ to avoid trivial reactions with zero γ ρ . One of the standard assumptions is existence of a strictly positive stoichiometric conservation law, a vector b = (b i ), b i > 0 such that i b i γ ρi = 0 for all ρ. This may be the conservation of mass, of the total probability, or of the total number of atoms, for example. A nonnegative extensive variable N i , the amount of A i , corresponds to each component. We call the vector N with coordinates N i "the composition vector". The concentration of A i is an intensive variable Let us consider a domain U in n-dimensional real vector space R n with coordinates N 1 , . . . , N n ≥ 0 (U ⊂ R n + ). For each N i , a dimensionless entropy (or free entropy, for example, Massieu, Planck, or Massieu-Planck potential that corresponds to the selected conditions [4]) S(N ) is defined in U . "Dimensionless" means that we use S/R instead of physical S. This choice of units corresponds to the informational entropy (p log p instead of k B p ln p).
The dual variables, potentials, are defined as the partial derivatives of H = −S: This definition differs from the chemical potentials [4] by the factor 1/RT . We keep the same sign as for the chemical potentials, and this differs from the standard Legendre transform for S. (It is the Legendre transform for the function H = −S.) The standard condition for the reversibility of the Legendre transform is strong positive definiteness of the Hessian of H.
For each reaction, a nonnegative quantity, reaction rate r ρ is defined. We assume that this quantity has the following structure (compare with Equations (4), (7), and (14) in [32] and Equation (4.10) in [33]): where (α ρ ,μ) = i α ρiμi is the standard inner product. Here and below, exp( , ) is the exponent of the standard inner product. The kinetic factor ϕ ρ ≥ is an intensive quantity and the expression exp(α ρ ,μ) is the Boltzmann factor of the ρth elementary reaction.
In the standard formalism of chemical kinetics the reaction rates are intensive variables and in kinetic equations for N an additional factor-the volume-appears. For heterogeneous systems, there may be several "volumes" (including interphase surfaces).
A nonnegative extensive variable N i , the amount of A i , corresponds to each component. We call the vector N with coordinates N i "the composition vector". N ∈ R n + . The concentration of A i is an intensive variable c i = N i /V , where V > 0 is the volume. If the system is heterogeneous then there are several "volumes" (volumes, surfaces, etc.), and in each volume there are the composition vector and the vector of concentrations [30,34]. Here we will consider homogeneous systems.
The kinetic equations for a homogeneous system in the absence of external fluxes are If the volume is not constant then the equations for concentrations includeV and have different form (this is typical for combustion reactions, for example). The classical Mass Action Law gives us an important particular case of GMAL given by Equation (21). Let us take the perfect free entropy where c i = N i /V ≥ 0 are concentrations and c * i > 0 are the standard equilibrium concentrations. For the perfect entropy function presented in Equation (23) and for the GMAL reaction rate function given by (21) we get The standard assumption for the Mass Action Law in physics and chemistry is that ϕ and c * are functions of temperature: ϕ ρ = ϕ ρ (T ) and c * i = c * i (T ). To return to the kinetic constants notation and in particular to first order kinetics in the quasichemical form presented in Equation (13), we should write:

General Entropy Production Formula
Thus, the following entities are given: the set of components A i (i = 1, . . . , n), the set of m elementary reactions presented by stoichiometric equations (18), the thermodynamic Lyapunov function H(N, V, . . .) [4,30,35], where dots (marks of omission) stand for the quantities that do not change in time under given conditions, for example, temperature for isothermal processes or energy for isolated systems. The GMAL presents the reaction rate r ρ in Equation (21) as a product of two factors: the Boltzmann factor and the kinetic factor. Simple algebra gives for the time derivative of H: An auxiliary function θ(λ) of one variable λ ∈ [0, 1] is convenient for analysis of dS/dt (see [29,34,36]): With this function,Ḣ defined by Equation (27) has a very simple form: The auxiliary function θ(λ) allows the following interpretation. Let us introduce the deformed stoichiometric mechanism with the stoichiometric vectors, which is the initial mechanism when λ = 1, the reverted mechanism with interchange of α and β when λ = 0, and the trivial mechanism (the left and right hand sides of the stoichiometric equations coincide) when λ = 1/2. Let the deformed reaction rate be r ρ (λ) = ϕ ρ exp(α ρ (λ),μ) (the genuine kinetic factor is combined with the deformed Boltzmann factor). Then θ(λ) = ρ r ρ (λ).
The inequality is necessary and sufficient for accordance between kinetics and thermodynamics (decrease of free energy or positivity of entropy production). This inequality is a condition on the kinetic factors. Together with the positivity condition ϕ ρ ≥ 0, it defines a convex cone in the space of vectors of kinetic factors ϕ ρ (ρ = 1, . . . , m). There exist two less general and more restrictive sufficient conditions: detailed balance and complex balance (known also as semidetailed or cyclic balance).

Detailed Balance
The detailed balance condition consists of two assumptions: (i) for each elementary reaction After this joining, the total number of stoichiometric equations decreases. We distinguish the reaction rates and kinetic factors for direct and inverse reactions by the upper plus or minus: The kinetic equations take the form The condition of detailed balance in GMAL is simple and elegant: For the systems with detailed balance we can take ϕ ρ = ϕ + ρ = ϕ − ρ and write for the reaction rate: M. Feinberg called this kinetic law the "Marselin-De Donder" kinetics [37]. Under the detailed balance conditions, the auxiliary function θ(λ) is symmetric with respect to change λ → (1 − λ). Therefore, θ(1) = θ(0) and, because of convexity of θ(λ), the inequality holds: θ (1) ≥ 0. Therefore,Ḣ ≤ 0 and kinetic equations obey the second law of thermodynamics.
The explicit formula forḢ ≤ 0 has the well known form since Boltzmann proved his H-theorem in 1872: A convenient equivalent form ofḢ ≤ 0 is proposed in [38]: is a normalized affinity. In this formula, the kinetic information is collected in the nonnegative factors, the sums of reaction rates (r + ρ + r − ρ ). The purely thermodynamic multiplier A tanh(A/2) ≥ 0 is positive for non-zero A. For small |A|, the expression A tanh(A/2) behaves like A 2 /2 and for large |A| it behaves like the absolute value, |A|.
The detailed balance condition reflects "microreversibility", that is, time-reversibility of the dynamic microscopic description and was first introduced by Boltzmann in 1872 as a consequence of the reversibility of collisions in Newtonian mechanics.

Complex Balance
The complex balance condition was invented by Boltzmann in 1887 for the Boltzmann equation [39] as an answer to the Lorentz objections [40] against Boltzmann's proof of the H-theorem. Stueckelberg demonstrated in 1952 that this condition follows from the Markovian microkinetics of fast intermediates if their concentrations are small [41]. Under this asymptotic assumption this condition is just the probability balance condition for the underlying Markov process. (Stueckelberg considered this property as a consequence of "unitarity" in the S-matrix terminology.) It was known as the semidetailed or cyclic balance condition. This condition was rediscovered in the framework of chemical kinetics by Horn and Jackson in 1972 [42] and called the complex balance condition. Now it is used for chemical reaction networks in chemical engineering [43]. Detailed analysis of the backgrounds of the complex balance condition is given in [34].
Formally, the complex balance condition means that θ(1) ≡ θ(0) for all values ofμ. We start from the initial stoichiometric equations (18) without joining the direct and reverse reactions. The equality Let us consider the family of vectors {α ρ , β ρ } (ρ = 1, . . . , m). Usually, some of these vectors coincide. Assume that there are q different vectors among them. Let y 1 , . . . , y q be these vectors. For each j = 1, . . . , q we take We can rewrite Equation (39) in the form The Boltzmann factors exp(μ, y j ) are linearly independent functions. Therefore, the natural way to meet these conditions is: for any j = 1, . . . , q This is the general complex balance condition. This condition is sufficient for the inequalityḢ = θ (1) ≤ 0, because it provides the equality θ(1) = θ(0) and θ(λ) is a convex function. It is easy to check that for the first order kinetics given by Equation (10) (or Equation (13)) with positive equilibrium, the complex balance condition is just the balance equation (11) and always holds.

Cyclic Decomposition of the Systems with Complex Balance
The complex balance conditions defined by Equation (42) allow a simple geometric interpretation. Let us introduce the digraph of transformation of complexes. The vertices of this digraph correspond to the formal sums (y, A) ("complexes"), where A is the vector of components, and y ∈ {y 1 , . . . , y q } are vectors α ρ or β ρ from the stoichiometric equations of the elementary reactions (18). The edges of the digraph correspond to the elementary reactions with non-zero kinetic factor.
Let us assign to each edge (α ρ , A) → (β ρ , A) the auxiliary current-the kinetic factor ϕ ρ . For these currents, the complex balance condition presented by Equation (42) is just Kirchhoff's first rule: the sum of the input currents is equal to the sum of the output currents for each vertex. (We have to stress that these auxiliary currents are not the actual rates of transformations.) Let us use for the vertices the notation Θ j : Θ j = (y j , A), (j = 1, . . . , q) and denote ϕ lj the fluxes for We say that the simple cycle is normalized if all the corresponding auxiliary fluxes are unit: The graph of the transformation of complexes cannot be arbitrary if the system satisfies the complex balance condition [22]. Proof. First of all, let us formulate Kirchhoff's first rule (42) for subsets: if the digraph of transformation of complexes satisfies Equation (42), then for any set of complexes Ω where ϕ ΦΘ is the positive kinetic factor for the reaction Θ → Φ if it belongs to the reaction mechanism (i.e., the edge Θ → Φ belongs to the digraph of transformations) and ϕ ΦΘ = 0 if it does not. Equation (43) is just the result of summation of Equations (42) for all (y j , A) = Θ ∈ Ω. We say that a state Θ j is reachable from a state Θ k if k = i or there exists a non-empty chain of transitions with non-zero coefficients that starts at Θ k and ends at Θ j : Θ k → . . . → Θ j . Let Θ i↓ be the set of states reachable from Θ i . The set Θ i↓ has no output edges.
Assume that the edge Θ j → Θ i is not included in a simple cycle, which means Θ j / ∈ Θ i↓ . Therefore, the set Ω = Θ i↓ has the input edge (Θ j → Θ i ) but no output edges and cannot satisfy Equation (43). This contradiction proves the proposition. This property (every edge is included in a simple cycle) is equivalent to the so-called "weak reversibility" or to the property that every weakly connected component of the digraph is its strong component.
For every graph with the system of fluxes, which obey Kirchhoff's first rule, the cycle decomposition theorem holds. It can be extracted from many books and papers [20,26,44]. Let us recall the notion of extreme ray. A ray with direction vector x = 0 is a set {λx} (λ ≥ 0). A ray l is an extreme ray of a cone Q if for any u ∈ l and any x, y ∈ Q, whenever u = (x + y)/2, we must have x, y ∈ l. If a closed convex cone does not include a whole straight line then it is the convex hull of its extreme rays [45].
Let us consider a digraph Q with vertices Θ i , the set of edges E and the system of auxiliary fluxes along the edges ϕ ij ≥ 0 ((j, i) ∈ E). The set of all nonnegative functions on E, ϕ : (j, i) → ϕ ij , is a nonnegative orthant R |E| + . Kirchhoff's first rule (Equation (42)) together with nonnegativity of the kinetic factors define a cone of the systems with complex balance Q ⊂ R |E| + .
Proposition 4 (Cycle decomposition of systems with complex balance). Every extreme ray of Q has a direction vector that corresponds to a simple normalized cycle Proof. Let a function φ : E → R + be an extreme ray of Q and suppφ = {(j, i) ∈ E | φ ij > 0}. Due to Proposition 3 each edge from suppφ is included in a simple cycle formed by edges from suppφ. Let us take one this cycle Denote the fluxes of the corresponding simple normalized cycle by ψ. It is a function on E: ψ i j+1 i j = ψ i 1 i k = 1 and ψ ij = 0 if (i, j) ∈ E but (i, j) = (i j+1 , i j ) and (i, j) = (i 1 , i k ) (i, j = 1, . . . , k, i = j).
Assume that suppφ includes at least one edge that does not belong to the cycle Θ i 1 → Θ i 2 → . . . → Θ i k → Θ i 1 . Then, for sufficiently small κ > 0, φ ± κψ ∈ Q and the vector φ ± κψ is not proportional to φ. This contradiction proves the proposition.
This decomposition theorem explains why the complex balance condition was often called the "cyclic balance condition".

Local Equivalence of Systems with Detailed and Complex Balance
The class of systems with detailed balance is the proper subset of the class of systems with complex balance. A simple (irreversible) cycle of the length k > 2 gives a simplest and famous example of the complex balance system without detailed balance condition.
For Markov chains, the complex balance systems are all the systems that have a positive equilibrium distribution presented by Equation (11), whereas the systems with detailed balance form the proper subclass of the Markov chains, the so-called reversible chains.
In nonlinear kinetics, the systems with complex balance provide the natural generalization of the Markov processes. They deserve the term "nonlinear Markov processes", though it is occupied by a much wider notion [46]. The systems with detailed balance form the proper subset of this class.
Nevertheless, in some special sense the classes of systems with detailed balance and with the complex balance are equivalent. Let us consider a thermodynamic state given by the vector of potentialsμ defined by Equation (20). Let all the reactions in the reaction mechanism be reversible (i.e., for every transition Θ i → Θ j the reverse transition Θ i ← Θ j is allowed and the corresponding edge belongs to the digraph of complex transformations). Calculate the right hand side of the kinetic equations (34) with the detailed balance condition given by Equation (35) for a given value ofμ and all possible values of ϕ + ρ = ϕ − ρ . The set of these values ofṄ is a convex cone. Denote this cone Q DB (μ). For the same transition graph, calculate the right hand side of the kinetic equation (22) under the complex balance condition (42). The set of these values ofṄ is also a convex cone. Denote it Q CB (μ). It is obvious that Q DB (μ) ⊆ Q CB (μ). Surprisingly, these cones coincide. In [26] we proved this fact on the basis of the Michaelis-Menten-Stueckelberg theorem [34] about connection of the macroscopic GMAL kinetics and the complex balance condition with the Markov microscopic description and under some asymptotic assumptions. Below a direct proof is presented.
Theorem 2 (Local equivalence of detailed and complex balance).
Proof. Because of the cycle decomposition (Proposition 4) it is sufficient to prove this theorem for simple normalized cycles. Let us use induction on the cycle length k. For k = 2 the transition graph is Θ 1 Θ 2 and the detailed balance condition (35) coincides with the complex balance condition (42). Assume that for the cycles of the length below k the theorem is proved. Consider a normalized simple cycle A). The corresponding kinetic equations are dN dt =(y 2 − y 1 ) exp(μ, y 1 ) + (y 3 − y 2 ) exp(μ, y 2 ) + . . .
At the equilibrium, all systems with detailed balance or with complex balance giveṄ = 0. Assume that the stateμ is non-equilibrium and therefore not all the Boltzmann factors exp(μ, y i ) are equal. Select i such that the Boltzmann factor exp(μ, y i ) has minimal value, while for the next position in the cycle this factor becomes bigger. We can use a cyclic permutation and assume that the factor exp(μ, y 1 ) is the minimal one and exp(μ, y 2 ) > exp(μ, y 1 ). Let us find a kinetic factor ϕ such that the reaction system consisting of two cycles, a cycle of the length 2 with detailed balance Θ 1 ϕ ϕ Θ 2 (here the kinetic factors are shown above and below the arrows) and a simple normalized cycle of the length k − 1, Θ 2 → . . . Θ k → Θ 2 , gives the sameṄ at the stateμ as the initial scheme. We obtain from Equation (45) the following necessary and sufficient condition (y 1 − y k ) exp(μ, y k ) + (y 2 − y 1 ) exp(μ, y 1 ) = (y 2 − y k ) exp(μ, y k ) + ϕ(y 2 − y 1 )(exp(μ, y 1 ) − exp(μ, y 2 )) . It is sufficient to equate here the coefficients at every y i (i = 1, 2, k). The result is ϕ = exp(μ, y k ) − exp(μ, y 1 ) exp(μ, y 2 ) − exp(μ, y 1 ) By the induction assumption we proved that theorem for the cycles of arbitrary length and, therefore, it is valid for all reaction schemes with complex balance.
The cone Q DB (μ) of the possible values ofṄ in Equation (34) is a polyhedral cone with finite set of extreme rays at any non-equilibrium stateμ for the systems with detailed balance. Each of its extreme rays has the direction vector of the form This follows from the form of the reaction rate presented by Equation (36) for the kinetic equations (34). Following Theorem 2, the cone of the possible values ofṄ for systems with complex balance has the same set of extreme rays. Each extreme ray corresponds to a single reversible elementary reaction with the detailed balance condition (35).

General H-Theorem for GMAL
Consider GMAL kinetics with the given reaction mechanism presented by stoichiometric equations (32) and the detailed balance condition (35). The reaction rates of the elementary reaction for the kinetic equations (34) are proportional to the nonnegative parameter ϕ ρ in Equation (36). These m nonnegative numbers ϕ ρ (ρ = 1, . . . , m) are independent in the following sense: for any set of values ϕ ρ ≥ 0 the kinetic equations (34) satisfy the H-theorem in the form of Equation (38): Therefore, nonnegativity is the only a priori restriction on the values of ϕ ρ (ρ = 1, . . . , m). One Lyapunov function for the GMAL kinetics with the given reaction mechanism and the detailed balance condition obviously exists. This is the thermodynamic Lyapunov function H used in GMAL construction. For ideal systems (in particular, for master equation) H has the standard form i N i (ln(c i /c * i )−1) given by Equation (23). Usually, H is assumed to be convex and some singularities (like c ln c) near zeros of c may be required for positivity preservation in kinetics (Ṅ i ≥ 0 if c i = 0). The choice of the thermodynamic Lyapunov function for GMAL construction is wide. We consider kinetic equations in a compact convex set U and assume H to be convex and continuous in U and differentiable in the relative interior of U with derivatives continued by continuity to U .
Assume that we select the thermodynamic Lyapunov function H and the reaction mechanism in the form (32). Are there other universal Lyapunov functions for GMAL kinetics with detailed balance and given mechanism? "Universal" here means "independent of the choice of the nonnegative kinetic factors".
For a given reaction mechanism we introduce the partial equilibria criterion by analogy to Definition 1. Roughly speaking, a convex function F satisfies this criterion if its conditional minima correctly describe the partial equilibria of elementary reactions.
For each elementary reaction i α ρi A i i β ρi A i from the reaction mechanism given by the stoichiometric equations (32) and any X ∈ U we define an interval of a straight line Definition 2 (Partial equilibria criterion for GMAL). A convex function F (N ) on U satisfies the partial equilibria criterion with a given thermodynamic Lyapunov function H and reversible reaction mechanism given by stoichiometric equations (32) if for all X ∈ U , ρ = 1, . . . , m. Proof. The partial equilibria criterion is necessary because F (N ) should be a Lyapunov function for a reaction mechanism that consists of any single reversible reaction from the reaction mechanism (32). It is also sufficient because for the whole reaction mechanism the kinetic equations (34) are the conic combinations of the kinetic equations for single reversible reactions from the reaction mechanism (32).
For the general reaction systems with complex balance we can use the theorem about local equivalence (Theorem 2). Consider a GMAL reaction system with the mechanism (18) and the complex balance condition. Proof. The theorem follows immediately from Theorem 3 about Lyapunov functions for systems with detailed balance and the theorem about local equivalence between systems with local and complex balance (Theorem 2).
The general H-theorems for GMAL is similar to Theorem 1 for Markov chains. Nevertheless, many non-classical universal Lyapunov functions are known for master equations, for example, the f -divergences given by Equation (1), while for a nonlinear reaction mechanism it is difficult to present a single example different from the thermodynamic Lyapunov function or its monotonic transformations. The following family of example generalizes Equation (17).
where f (N ) is a non-negative differentiable function and ε > 0 is a sufficiently small number. This function satisfies the conditional equilibria criterion. For continuous H(N ) on compact U with the spectrum of the Hessian uniformly separated from zero, this F (N ) is convex for sufficiently small ε > 0.

Generalization: Weakened Convexity Condition, Directional Convexity and Quasiconvexity
In all versions of the general H-theorems we use convexity of the Lyapunov functions. Strong convexity of the thermodynamic Lyapunov functions H (or even positive definiteness of its Hessian) is needed, indeed, to provide reversibility of the Legendre transform N ↔ ∇H.
For the kinetic Lyapunov functions that satisfy the partial equilibria criterion, we use, actually, a rather weak consequence of convexity in restrictions on the straight lines X + λγ (where λ ∈ R is a coordinate on the real line, γ is a stoichiometric vector of an elementary reaction): if λ * = argmin λ∈R F (X + λγ) then on the half-lines (rays) λ ≥ λ * and λ ≤ λ * function F (X + λγ) is monotonic. It does not decrease for λ ≥ λ * and does not increase for λ ≤ λ * . Of course, convexity is sufficient (Figure 3a) but a much weaker property is needed (Figure 3b).
A function F on a convex set U is quasiconvex [47] if all its sublevel sets are convex. It means that for every X, Y ∈ U In particular, a function F on a segment is quasiconvex if all its sublevel sets are segments. Among many other types of convexity and quasiconvexity (see, for example [48]) two are important for the general H-theorem. We do not need convexity of functions along all straight lines in U . It is sufficient that the function is convex on the straight lines X + Rγ ρ , where γ ρ are the stoichiometric (direction) vectors of the elementary reactions.
Let D be a set of vectors. A function F is D-convex if its restriction to each line parallel to a nonzero v ∈ D is convex [49]. In our case, D is the set of stoichiometric vectors of the transitions, D = {γ ρ | ρ = 1, . . . , m}. We can use this directional convexity instead of convexity in Propositions 1, 2 and Theorems 1, 3, 4.
Finally, we can relax the convexity conditions even more and postulate directional quasiconvexity [50] for the set of directions D = {γ ρ | ρ = 1, . . . , m}. Propositions 1, 2 and Theorems 1, 3, 4 will be still true if the functions are continuous, quasiconvex in restrictions on all lines X + Rγ ρ and satisfy the partial equilibria criterion.

Discussion
Many non-classical entropies are invented and applied to various problems in physics and data analysis. In this paper, the general necessary and sufficient criterion for the existence of H-theorem is proved. It has a simple and physically transparent form: the convex divergence (relative entropy) should properly describe the partial equilibria for transitions A i A j . It is straightforward to check this partial equilibria criterion. The applicability of this criterion does not depend on the detailed balance condition and it is valid both for the class of the systems with detailed balance and for the general first order kinetics without this assumption.
If an entropy has no H-theorem (that is, it violates the second law and the data processing lemma) then there should be unprecedentedly strong reasons for its use. Without such strong reasons we cannot employ it. Now, I cannot find an example of sufficiently strong reasons but people use these entropies in data analysis and we have to presume that they may have some hidden reasons and that these reasons may be sufficiently strong. We demonstrate that this problem arises even for such popular divergences like Euclidean distance or Itakura-Saito distance.
The general H-theorem is simply a reduction of a dynamical question (Lyapunov functionals) to a static one (partial equilibria). It is not surprising that it can be also proved for nonlinear Generalized Mass Action Law kinetics. Here kinetic systems with complex balance play the role of the general Markov chains, whereas the systems with detailed balance correspond to the reversible Markov chains. The requirement of convexity of Lyapunov functions can be relaxed to the directional convexity (in the directions of reactions) or even directional quasiconvexity.
For the reversible Markov chains presented by Equations (15) with the classical entropy production formula (16), every universal Lyapunov function H should satisfy inequalities These inequalities are closely related to another generalization of convexity, the Schur convexity [51]. They turn into the definition of the Schur convexity when equilibrium is the equidistribution with p * i = 1/n for all i. Universal Lyapunov functions for nonlinear kinetics give one more generalization of the Schur convexity. Introduction of many non-classical entropies leads to the "uncertainty of uncertainty" phenomenon: we measure uncertainty by entropy but we have uncertainty in the entropy choice [27]. The selection of the appropriate entropy and introduction of new entropies are essentially connected with the class of kinetics. H-theorems in physics are formalizations of the second law of thermodynamics: entropy of isolated systems should increase in relaxation to equilibrium. If we know the class of kinetic equations (for example, the Markov kinetics given by master equations) then the H theorem states that it is possible to use this entropy with the given kinetics. If we know the entropy and are looking for kinetic equations then such a statement turns into the thermodynamic restriction on the thermodynamically admissible kinetic equations. For information processing, the class of kinetic equations describes possible manipulations with data. In this case, the H-theorems mean that under given class of manipulation the information does not increase. It is not possible to compare different entropies without any relation to kinetics. It is useful to specify the class of kinetic equations, for which they are the Lyapunov functionals. For the GMAL equations, we can introduce the dynamic equivalence between divergences (free entropies or conditional entropies). Two functionals H(N ) and For the Markov kinetics, the partial equilibria criterion is sufficient for a convex function H(P ) to be dynamically consistent with the relative entropy i p i (ln(p i /p * i )−1) in the unit simplex ∆ n . For GMAL, any convex function H(N ) defines a class of kinetic equations. Every reaction mechanism defines a family of kinetic equations from this class and a class of Lyapunov functions F , which are dynamically consistent with H. The main message of this paper is that it is necessary to discuss the choice of the non-classical entropies in the context of kinetic equations.

A1. Maximum of quasiequilibrium entropies -a new family of universal Lyapunov functions for generalized mass action law
The general H theorems for the Generalized Mass Action Law (GMAL) and for its linear version, master equation, look very similar. For the linear systems many Lyapunov functionals are known in the explicit form: for every convex function h on the positive ray R + we have such a functional (1). On the contrary, for the nonlinear systems we, typically, know the only Lyapunov function H, it is the thermodynamic potential which is used for the system construction. The situation looks rather intriguing and challenging: for every finite reaction mechanism there should be many Lyapunov functionals, but we cannot construct them. (There is no chance to find many Lyapunov functions for all nonlinear mechanisms together under given thermodynamics because in this case the cone of the possible velocitieṡ N is a half-space and locally there is the only divergence with a given tangent hyperplane. Globally, such a divergence can be given by an arbitrary monotonic function on the thermodynamic tree [53,55]).
In this Appendix, we present a general procedure for the construction of a family of new Lyapunov functionals from H for nonlinear GMAL kinetics and a given reaction mechanism. We will use two auxiliary construction, the quasiequilibrium entropies (or divergences) and the forward-invariant peeling.
Let us consider isochoric systems (constant volume V ). For them, concentrations c i (intensive variables) and amounts N i (extensive variables are proportional with a constant extensive factor V and we take N i = c i in a standard unit volume without loss of generality.
We assume that H is strongly convex in the second approximation in R n + . This means that it is twice differentiable and the Hessian ∂ 2 H/∂N i ∂N j is positively definite in R n + . In addition, we assume logarithmic singularities of the partial derivatives of H near zeros of concentrations: where the functions µ 0i (c) are bounded continuously differentiable functions in a vicinity of the non-negative orthant. This assumption corresponds to the physical hypothesis about the logarithmic singularity of the chemical potentials, µ i = RT ln c i + . . . where . . . stands for a continuous function of c, T , and to the supposition about the classical mass action law for small concentrations. Assume also that all the described properties of H hold for its restrictions on the faces of R n + : these restrictions are strictly convex, differentiable in the relative interior, etc.
For every linear subspace E ⊂ R n and a given composition vector N 0 ∈ R n + the quasiequilibrium composition is the partial equilibrium The quasiequilibrium divergence is the value of H at the partial equilibrium: Due to the assumption about strong convexity of H and logarithmic singularity (53), for a positive vector N 0 ∈ R n + and a subspace E ⊂ R n the quasiequilibrium composition N * E (N 0 ) is also positive. Such quasiequilibrium "entropies" are discussed by Jaynes [56]. He considered the quasiequilibrium H-function as the Boltzmann H-function H B in contrast to the original Gibbs H-function, H G . The Gibbs H-function is defined for the distributions on the phase space of the mechanical systems. The Boltzmann function is a conditional minimum of the Gibbs function, therefore the inequality holds H B ≤ H G [56]. Analogously, and this inequality turns into the equality if and only if N 0 is the quasiequilibrium state for the subspace E: N 0 = N * E (N 0 ). After Jaynes, these functions are intensively used in the discussion of time arrow [57][58][59]. In the theory of information, quasiequilibrium was studied in detail under the name information projection (or I-projection) [60].
Analysis of partial equilibria is useful in chemical engineering in the presence of uncertainty: when the reaction rate constants are unknown then the chains of partial equilibria together with information about the thermodynamically preferable directions of reactions may give some important information about the process [61].
Let us prove several elementary properties of H * E (N ). Let E and L be subspaces of R n . Proof. 1. Convexity of H * E (N ) means that for every positive N 1 and N 2 and a number λ ∈ [0, 1] the inequality holds: Let us prove this inequality. First, because convexity H. Secondly, H(N * E (N 1,2 )) = H * E (N 1,2 ) by definition and the last inequality reads because the last value is the minimum of H on the linear manifold λN 1 + (1 − λ)N 2 + E. Inequality is proven.
2. Indeed, the function H * E is constant on the set (N + E) ∩ R n + , by construction.
3. In the proof of item 1 the inequality H(λN * is strong for λ = 0, 1 and N 1 − N 2 / ∈ E. Therefore, under these conditions the convexity inequality is strong. 5. This follows directly from the definitions of H * E (N ) as a conditional minimum and N * E (N ) as the corresponding minimizer of H on N + E ∩ R n + Consider the reversible reaction mechanism (32) with the set of the stoichiometric vectors Υ. For each Γ ⊂ Υ we can take E = Span(Γ) and define the quasiequilibrium. The subspace Span(Γ) may coincide for different Γ and the quasiequilibrium depends on the subspace E only, therefore, it is useful to introduce the set of these subspaces for a given reaction mechanism (32). Let E Υ be the set of all subspaces of the form E = Span(Γ) (Γ ⊂ Υ). For each dimension k we denote E k Υ the set of k-dimensional subspaces from E Υ .
For each dimension k = 0, . . . , rank(Υ) we define the function H k,max Υ : H 0,max Υ = H, and for 0 < k ≤ rank(Υ) H k,max Immediate consequence of the definition of the quasiequilibrium divergence and Theorem 3 is: is a Lyapunov function in R n + for all kinetic equations (34) with the given thermodynamic Lyapunov function H, reaction rates presented by Equation (36) (detailed balance) and the reversible reaction mechanism with the set of stoichiometric vectors Γ ⊆ Υ. Let us assume now that the partial equilibrium N * * = N * {Rγ} (N 0 ) is, at the same time, the partial equilibrium for several other E ∈ E 1 Υ . Let B be the set of subspaces E ∈ E 1 Υ for which N * * is a partial equilibrium, i.e. H(N * * ) = H * E (N * * ). In this case, for all E / ∈ B (E ∈ E 1 Υ ) H * E (N * * ) < H(N * * ). Therefore, in a sufficiently small vicinity of N * * the function H 1,max Υ (N ) can be defined as where ∆ = N −N * * , (DH) * * = (DH) | N * * is the differential of H at N * * , and •, • * * = (•, (D 2 H) * * •) is the entropic inner product, with the positive symmetric operator (D 2 H) * * = (D 2 H) | N * * (the second differential of H at N * * ). The entropic inner product is widely used in kinetics and nonequilibrium thermodynamics, see, for example [53,57,[62][63][64][65]. Let us split R n into the orthogonal sum: R n = E ⊕ E ⊥ , E ⊥ is the orthogonal supplement to E in the entropic inner product •, • * * . Each vector ∆ ∈ R n is represented in the form ∆ = ∆ ⊕ ∆ ⊥ , where ∆ ∈ E and ∆ ⊥ ∈ E ⊥ . By the definition of the partial equilibrium as a conditional minimizer of H, (DH) * * (∆ ) = 0, and we have the following representation of H In particular, when ∆ ∈ E (∆ ⊥ = 0), this formula gives From these formulas, we easily get the approximations of N * E (N ) and H * E (N ) in a vicinity of N * * . Let N − N * * = ∆ = ∆ ⊕ ∆ ⊥ . Then in particular, (N * E (N ) − N * * ) ⊥ = ∆ ⊥ (exactly) and (N * E (N ) − N * * ) = o( ∆ ). Therefore, If E is a 1D subspace with the directional vector γ (E = Rγ) then here, γ γ| γ,γ is the orthogonal projector onto E and 1 − γ γ| γ,γ is the orthogonal projector onto E ⊥ , the orthogonal complement to E.
Let us use the normalized vectors γ. In this case, Let us assume now that N * * is the partial equilibrium for several (two or more) different 1D subspaces E and B is a finite set of these subspaces. The set B includes two or more different subspaces E. Select a normalized directional vector γ E for each E ∈ B. Let E B = Span{γ E | E ∈ B}. N * * is a critical point of H on (N * * + E B ) ∩ R n because γ E ∈ ker(DH) * * for all E ∈ B and, therefore, E B ⊂ ker(DH) * * .
Consider a function H 1,max (N ) = max E∈B H * E (N ). This function is strictly convex on (N * * + E B ) ∩ R n and N * * is its minimizer on this set. Indeed, in a vicinity of N * * in (N * * + E B ) ∩ R n for every E ∈ B Equation (57) holds. Consider a direct sum of k = |B| copies of E B , E 1 ⊕ E 2 ⊕ . . . ⊕ E k , where all E i are the copies of E B equipped by the entropic inner product •, • * * and the corresponding Euclidean norm, and the norm of the sum is the maximum of the norm in the summands: x 1 ⊕ . . . ⊕ x k = max{ x 1 , . . . , x k }. The following linear map ψ is a surjection ψ : and if all the summands are zero for ∆ ∈ E B then ∆ = 0: and, therefore, N * * is a unique minimizer of H 1,max (N ) on (N * * + E B ) ∩ R n Because of the local equivalence of the systems with detailed and complex balance and Theorem 4 we also get the following proposition.
The concentration triangle c 1 + c 2 + c 3 = b is split by the partial equilibria lines into six compartments. In each compartment, the cone of possible directions (the angle) Q DB is presented. The positively invariant set which includes the A 1 vertex is outlined by bold. (b) The set U is outlined by the dashed line, the level set H = h − ε is shown by the dotted line. The levels of H * γ are the straight lines γ. The boundary of the peeled set U ε {γ 1 ,γ 2 ,γ 3 } is shown by red. (c) The sets U , U ε {γ 1 ,γ 2 ,γ 3 } and the level set H = h − ε without auxiliary lines are presented.

A2. Forward-invariant peeling
We use the quasiequilibrium functions and their various combinations for construction of new Lyapunov functions from the known thermodynamic Lyapunov functions, H, and an arbitrary convex function F . In this procedure, we delete some parts from the sublevel sets of F to make the rest positively-invariant with respect to GMAL kinetics with the given reaction mechanism and detailed or complex balance. We call these procedures the forward-invariant peeling.
Let U ⊂ R * + be a convex compact set of non-negative n-dimensional vectors N and for some η > min H the η-sublevel set of H belongs to U : {N | H(N ) ≤ η} ⊂ U . Let h > min H be the maximal value of such η. Select a thickness of peel ε > 0.
We define the peeled set U as For sufficiently small ε > 0 (ε < h − min H) this set is non-empty and forward-invariant.
Proposition 8. For sufficiently small ε > 0 (ε < h − min H) the peeled set U ε Υ is non-empty. If it is non-empty then it is forward-invariant with respect to kinetic equations (34) with the thermodynamic Lyapunov function H, reaction rates presented by Equation (36) (detailed balance) and the reversible reaction mechanism (32) with the set of stoichiometric vectors Γ ⊆ Υ.
Proposition 9. For sufficiently small ε > 0 (ε < h − min H) the peeled set U ε Υ is non-empty. If it is non-empty then it is forward-invariant with respect to all kinetic equations (34) with the given thermodynamic Lyapunov function H, the complex balance condition (42) and the reaction mechanism (18) with any set of stoichiometric vectors Γ ⊆ Υ ∪ −Υ.
The forward-invariant peeling for a 2D nonlinear kinetic scheme is demonstrated in Figure 5. A 3D example is presented in Figure 6. It is worth to mention that the peeled froward-invariant sets have 1D Let F (N ) be a continuous strictly convex function with bounded level sets on the non-negative orthant. For each level of H, h ∈ imH, we define the level . Let f * (h) ≥ f (h) be any strictly increasing function. In particular, we can take f * (h) = f (h) + ε (ε > 0). Introduce the peeled function The level sets of F f * (h) Υ (N ) have 1D faces parallel to the stoichiometric vectors γ ∈ Υ near the corresponding partial equilibria outside a vicinity of the intersections of these partial equilibria hypersurfaces. At the intersections of two partial equilibria there is a singularity and the size of both faces tends to zero (Figure 6 a).
Let us consider kinetic systems with perturbed thermodynamic potentials H = H + ∆H, where ∆H is uniformly small with it second derivatives. For such the perturbed systems in a bounded set all the partial equilibria are close to the partial equilibria of the original system. (An important case is the perturbation of H by a linear functional ∆H.) We will modify the peeling procedure to create forward-invariant sets for sufficiently small perturbations.
Let us look on the forward-invariant peeled set on Figure 6 a. If we slightly deform the partial equilibria surfaces for each reaction (Figure 6 b, red dashed lines) but keep their intersection unchanged then the peeled set may remain forward-invariant. It is sufficient that the intersections of the partial equilibria with the border of U in R n do not leave the corresponding 1D faces (Figure 6 b). If we perform additional peeling near the intersections of the partial equilibria (Figure 6 c), then the peeled set may be positively invariant with respect to the kinetic equations with perturbed thermodynamic potentials (or, even a bit stronger, with respect to kinetic equations with interval coefficients for sufficiently small intervals).
We consider the reversible reaction mechanism (32) with the set of the stoichiometric vectors Υ. U ⊂ R n + is a convex compact set, and for some η > min H the η-sublevel set of H belongs to U : Let h > min H be the maximal value of such η. Select a sequence of thicknesses of peels ε 1 , ε 2 , . . . , ε k > 0, where k = rankΥ. Let us use for the peeling the functions H i,max For each i we consider the sublevel set for sufficiently small numbers ε j > 0 all these sets are non-empty. The peeled U for this sequence of thicknesses is defined as where we take U 0 = U . Definition of U k (k = rankΥ) requires some comments. If rankΥ = n then Span(Υ) = R n and the corresponding quasiequilibrium N * R n is the global equilibrium, i.e.
In this case, H k,max . Therefore, in this case the term U k is not needed in Equation (58).
If k = rankΥ < n then the term U k is necessary. In this case, H k,max Υ (N ) = H * Span(Υ) and U k defines non-trivial peeling.
2. For these thicknesses ε 1 , . . . , ε k > 0 the peeled set U ε 1 ,...,ε k Υ is forward-invariant with respect to kinetic equations (34) with the perturbed thermodynamic Lyapunov function H + ∆H, reaction rates presented by Equation (36) (detailed balance) and the reversible reaction mechanism with the set of stoichiometric vectors Γ ⊆ Υ if the perturbation ∆H is sufficiently uniformly small with its second derivatives.
The similar proposition is valid for the systems with complex balance (because of the local equivalence theorem). Peeling of the sublevel sets of a convex function will produce a Lyapunov function similarly to Proposition 10.
Essential difference of Proposition 12 from Proposition 8 is in the ultimate positive-invariance of U ε Υ if it is non-empty. To provide the forward-invariance of U ε 1 ,...,ε k Υ we need an additional property. Let E ∈ E Υ be a subspace of the form E = Span(Γ), Γ ⊂ Υ. The quasiequilibrium surface Φ E ⊂ R n + is a set of all quasiequilibria N * E (N ) (N ∈ R n + ). The Legendre transform of Φ E (its image in the space of potentialsμ) is the orthogonal supplement to E, for everyμ from this image (μ, γ) = 0, and this is an equivalent definition of Φ E .
For every E ∈ E Υ we define the E-faces of U ε 1 ,...,ε k Υ as follows. Let dim E = i. Consider the generalized cylindrical surface with the given value H * The E-faces of U ε 1 ,...,ε k Υ belong to the intersection Let h − k j=1 ε j > min H and for every E, L ∈ E Υ the following property holds: Then the peeled set U ε 1 ,...,ε k Υ is non-empty and forward-invariant with respect to kinetic equations (34) with the perturbed thermodynamic Lyapunov function H + ∆H, reaction rates presented by Equation (36) (detailed balance) and the reversible reaction mechanism with the set of stoichiometric vectors Γ ⊆ Υ if the perturbation ∆H is sufficiently uniformly small with its second derivatives.
The proof is an application of the general H-theorem based on the partial equilibrium criterion (for illustration see Figure 6 c). The condition of Proposition 13 means that the result of peeling in higher dimensions does not destroy the main property of the 1D peeling (Proposition 8): if a positive boundary point N of the bodily peeled set is a partial equilibrium in direction γ ∈ Υ then the stoichiometric vector γ belongs to a supporting hyperplane of this peeled set at N .

A3. Greedy peeling
The goal of the forward-invariant peeling is to create a forward-invariant convex set from an initial convex set U by deletion (peeling) of its non-necessary parts. The resulting (peeled) set should be forward-invariant with respect to any GMAL kinetics with a given reaction mechanism and the thermodynamic Lyapunov function H ("free energy"). In more general but practically useful settings, we consider not a single system but a family of systems with the given reaction mechanism but for a set of Lyapunov functions H(N ) = H 0 (N ) + (l, N ) where l ∈ Q and Q ⊂ R n is a convex compact set. We would like to produce a set that is forward-invariant with respect to all these systems (and, therefore, with respect to the differential inclusion (compare to (22)) whereμ − ∇H 0 (N ) = l ∈ Q and ϕ ρ ∈ R + . Further on, we consider systems with fixed volume, therefore we omit the factor V and make no difference between the amounts N i and the concentrations c i .
The peeling procedure proposed in the previous subsection works but it is often too extensive and produces not the maximal possible forward-invariant set. We would like to produce the maximal forward-invariant subset of U and, therefore, have to minimize peeling. Here we meet a slightly unexpected obstacle. The union (and the closure) of forward-invariant sets is also forward-invariant, whereas the union of convex sets may be non-convex. Therefore, there exists the unique maximal forward-invariant subset of U but it may be non-convex and the maximal convex forward-invariant subset may be non-unique. If we would like to find the maximal forward-invariant subset then we have to relax the requirement of convexity.
A set U is directionally convex with respect to a set of vectors Γ if for every x ∈ U and γ ∈ Γ the intersection (x + Rγ) ∩ U is a segment of a straight line: The minimal forward-invariant non-convex (but directionally convex) sets were introduced in [54] and studied for chemical kinetics in [53] and for Markov chains (master equation) in [67].
Let U ⊂ R n + be a compact subset. The greedy peeling of U is constructed for an inclusion (59) as a sequence of peeling operations Π γ , where γ is a stoichiometric vector of an elementary reaction. A point x ∈ U belongs to Π γ (U ) if and only if there exists such a segment [a, b] ⊂ R that We call the set S γ = {N ∈ R n + | (∇H 0 | N + l, γ) = 0 for some l ∈ Q} the equilibrium strip for the elementary reaction with the stoichiometric vector γ.
Another equivalent description of the operation Π γ may be useful. Find the orthogonal projection of U ∩ S γ onto the orthogonal complement to γ, the hyperplane γ ⊥ ⊂ R n . Let π ⊥ γ be the orthogonal projector onto this hyperplane. Find all such z ∈ π ⊥ γ (U ∩ S γ ) that This set is the base of Π γ (U ), i.e. it is For each z ∈ B γ (U ) consider the straight line (π ⊥ γ ) −1 z. This line is parallel to γ and its orthogonal projection onto γ ⊥ is one point z. The intersection S γ ∩ ((π ⊥ γ ) −1 z) is a segment. Find the maximal connected part of U ∩ ((π ⊥ γ ) −1 z) that includes this segment. This is also a segment (a fiber). Let us call it F z,γ (U ). We define Π γ (U ) = z∈Bγ (U ) F z,γ (U ) The set Π γ (U ) is forward-invariant with respect to the differential inclusion (59) if the reaction mechanism consists of one reaction with the stoichiometric vector γ. It is directionally convex in the direction γ. Of course, if we apply the operation Π γ with a different stoichiometric vector γ to Π γ (U ) then the forward-invariance with respect to the differential inclusion (59) for one reaction with the previous stoichiometric vector γ may be destroyed. Nevertheless, if we apply an infinite sequence of operations Π γρ (ρ = 1, . . . , m) where all the stoichiometric vectors γ ρ from the reaction mechanism appear infinitely many times then the sequence converges to the maximal forward-invariant subset of U because of monotonicity (in particular, this limit may be empty if there is no positively invariant subset in U ). The limit set is directionally convex in directions γ ρ (ρ = 1, . . . , m) and is the same for all such sequences.

A4. A toy example
Let us consider a reaction mechanism with the classical mass action law and interval constants 0 < k i min ≤ k i ≤ k i max < ∞. Consider the kinetic equations with such interval constants and classical mass action law. The stoichiometric vectors of the reactions are We will demonstrate how to use peeling for solving of the following problem for the system (60): is it possible that the solution of the differential inclusion with these interval constants starting from a positive vector will go to zero when t → ∞? (This question for this system was considered recently as an unsolved problem [68].) Let us use the local equivalence of systems with complex and detailed balance and represent this system as a particular case of differential inclusion (59) (with possible extension of the interval of constants).
The equilibrium concentrations c * i in the irreversible cycle satisfy the following identities: Instead of the irreversible cycle of linear reactions we will take the reversible cycle with the interval restrictions on the equilibrium constants (the ratios of the reaction rate constants κ j /κ −j ) The detailed balance condition should also hold for the constants κ ±j : The equilibria for this cycle satisfy the conditions Figure 7. Partial equilibria of the reversible cycle (62) with the interval restrictions on the equilibrium constants. The triangle is split by the lines of partial equilibria A i A j into several compartments. The borders of these compartments are combined from the segments of the dashed lines. These dashed lines correspond to the minima and maxima of the equilibrium constants κ j /κ −j . In each compartment, the cone (the angle) of possible directions ofċ is given. This is a proper cone (an angle that is less than π) outside the equilibrium strips, a halfplane in an equilibrium strip of a single reaction, and a whole plane in the intersection of two such strips. The area of the possible equilibria (where the angle of possible directions ofċ is the whole plane) is outlined by bold line and colored in green. .
These conditions provide the same range of equilibrium concentrations for the reversible and irreversible cycles. Therefore, the possible value ofċ for the irreversible cycle in the given interval of reaction rate constants always belongs to the cone of possible values ofċ of the reversible cycle under the given restrictions (63) and the detailed balance condition (64).
For the reversible cycle the reaction rates are The reaction rate of the reaction 2A 1 3A 2 is r 4 = k 4 c 2 1 − k −4 c 3 3 . The time derivatives of the concentrations arė c 1 = −r 1 + r 3 − 2r 4 ,ċ 2 = r 1 − r 2 + 3r 4 ,ċ 3 = r 2 − r 3 The differential inclusion for the reversible linear cycle (62) is represented in Fig. 7. There are three types of areas: (i) area where the equilibria may be located and the direction ofċ may coincide with any vector of the linear subspace iċ i = 0, (ii) areas where direction of one reaction is indefinite but the signs of two other reactions rates are fixed, and (iii) areas where the signs of all reaction rates are fixed. The cones (angles) of possible vectorsċ are drawn in Fig. 7   V0234: • On F2 r 1 < 0, and r 3 > 0; • On F3 r 2 < 0, r 3 > 0, and r 4 > 0; • On F4 r 1 < 0, r 2 < 0, and r 3 > 0.
Thus, the peeled set is positively invariant if 0 < l < 2. This means It is sufficient to take 0 < v ≤ 2 3 (for example, v = 2 3 ) because of the obvious inequality, c • 1 2(c • 1 +c 2 ) < 1 2 . We see that the peeled faces are located between the planes i c i = ε and i c i = 3 2 ε (for v = 2/3). Therefore, it is sufficient to take in the rescaling (67) the constants max a = 3 2 , min a = 1 which do not depend on the equilibrium constant.
We have demonstrated that for any given range of positive kinetic constants any positive solution of the kinetic inclusion for the system (60) cannot approach the origin when t → ∞.
We have started from a system (60) with interval rate constants and have embedded the corresponding differential inclusion into a differential inclusion for a reversible system with detailed balance (64) and interval restrictions onto equilibrium constants (63).
We have constructed a piecewise-linear surface that isolated the ε-vicinity of the origin from the outside for sufficiently small ε > 0. This surface cannot be intersected by the solutions of the kinetic inclusion in the motion from the outside to the origin.
The peeling procedure used in this toy-example differs from the universal greedy peeling. (It is the simplified version of the greedy peeling.) We have guessed the structure of the corner near A 3 and build two plain faces, F3 and F4, instead of a sequence of the curvilinear "cylindric" faces. This piecewise peeling is not minimal but is simpler for drawing.