Axiomatic Information Thermodynamics

We present an axiomatic framework for thermodynamics that incorporates information as a fundamental concept. The axioms describe both ordinary thermodynamic processes and those in which information is acquired, used and erased, as in the operation of Maxwell’s demon. This system, similar to previous axiomatic systems for thermodynamics, supports the construction of conserved quantities and an entropy function governing state changes. Here, however, the entropy exhibits both information and thermodynamic aspects. Although our axioms are not based upon probabilistic concepts, a natural and highly useful concept of probability emerges from the entropy function itself. Our abstract system has many models, including both classical and quantum examples.


Introduction
Axiomatic approaches to physics are useful for exploring the conceptual basis and logical structure of a theory. One classic example was presented by Robin Giles over fifty years ago in his monograph Mathematical Foundations of Thermodynamics [1]. His theory is constructed upon three phenomenological concepts: thermodynamic states, an operation (+) that combines states into composites, and a relation (→) describing possible state transformations. From a small number of basic axioms, Giles derives a remarkable amount of thermodynamic machinery, including conserved quantities ("components of content"), the existence of an entropy function that characterizes irreversibility for possible processes, and so on.
Alternative axiomatic developments of thermodynamics have been constructed by others along different lines. One notable example is the framework of Lieb and Yngvason [2,3] (which has recently been used by Thess as the basis for a textbook [4]). Giles's abstract system, meanwhile, has found application beyond the realm of classical thermodynamics, e.g., in the theory of quantum entanglement [5].
Other work on the foundations of thermodynamics has focused on the concept of information. Much of this has been inspired by Maxwell's famous thought-experiment of the demon [6]. The demon is an entity that can acquire and use information about the microscopic state of a thermodynamic system, producing apparent violations of the Second Law of Thermodynamics. These violations are only "apparent" because the demon is itself a physical system, and its information processes are also governed by the underlying dynamical laws.
Let us examine a highly simplified example of the demon at work. Our thermodynamic system is a one-particle gas enclosed in a container (a simple system also used in [7]). The gas may be allowed to freely expand into a larger volume, but this process is irreversible. A "free compression" process that took the gas from a larger to a smaller volume with no other net change would decrease the entropy of the system and contradict the Second Law. See Figure 1. A one-particle gas may freely expand to occupy a larger volume, but the reverse process would violate the Second Law. Now, we introduce the demon, which is a machine that can interact with the gas particle and acquire information about its location. The demon contains a one-bit memory register, initially in the state 0. First, a partition is introduced in the container, so that the gas particle is confined to the upper or lower half (U or L). The demon measures the gas particle and records its location in memory, with 0 standing for U and 1 for L. On the basis of this memory bit value, the demon moves the volume containing the particle. In the end, the gas particle is confined to one particular smaller volume, apparently reducing the entropy of the system. This is illustrated in Figure 2.  Figure 2. A Maxwell's demon device acquires one bit of information about the location of a gas particle, allowing it to contract the gas to a smaller volume.
As Bennett pointed out [8], every demon operation we have described can be carried out reversibly, with no increase in global entropy. Even the "measurement" process can be described by a reversible interaction between the particle location and the demon's memory bit b, changing the states according to whereb is the binary negation of b. However, the demon process as described leaves the demon in a new situation, since an initially blank memory register now stores a bit of information. To operate in a cycle (and thus unambiguously violate the Second Law), the demon must erase this bit and recover the blank memory. However, as Landauer showed [9], the erasure of a bit of information is always accompanied by an entropy increase of at least k B ln 2 in the surroundings, just enough to ensure that there is no overall entropy decrease in the demon's operation. The Second Law remains valid. Information erasure is a physical process. The demon we have described can also erase a stored bit simply by reversing the steps outlined. Depending on the value of the bit register, the demon moves the gas particle to one of two corresponding regions separated by a partition. The same interaction that accomplished the "measurement" process (which takes (U, 0) (U, 0) and (L, 0) (L, 1)) can now be used to reset the bit value to 0 (taking (U, 0) (U, 0) and (L, 1) (L, 0)). In other words, the information stored redundantly in both the gas and the register is "uncopied", so that it remains only in the gas. Finally, the partition is removed and the gas expands to the larger volume. The net effect is to erase the memory register while increasing the entropy of the gas by an amount k B ln 2, in conventional units.
Another link between thermodynamics and information comes from statistical mechanics. As Jaynes has shown [10,11], the concepts and measures of information devised by Shannon for communication theory [12] can be used in the statistical derivation of macroscopic thermodynamic properties. In a macroscopic system, we typically possess only a small amount of information about a few large-scale parameters (total energy, volume, etc.). According to Jaynes, we should therefore choose a probability distribution over microstates that maximizes the Shannon entropy, consistent with our data. That is to say, the rational probability assignment includes no information (in the Shannon sense) except that found in the macroscopic state of the system. This prescription yields the usual distributions used in statistical mechanics, from which thermodynamic properties may be derived.
Axiomatic theories and information analyses each provide important insights into the meaning of thermodynamics. The purpose of this paper is to synthesize these two approaches. We will present an axiomatic basis for thermodynamics that uses information ideas from the very outset. In such a theory, Maxwell's demon, which accomplishes state changes by acquiring information, is neither a paradox nor a sideshow curiosity. Instead, it is a central conceptual tool for understanding the transformations between thermodynamic states. In our view, thermodynamics is essentially a theory of the descriptions of systems possessed by agents that are themselves (like the demon) physical systems. These descriptions may change as the external systems undergo various processes; however, they also may change when the agent acquires, uses or discards information.
Our work is thus similar in spirit to that of Weilenmann et al. [13], though our approach is very different. They essentially take the Lieb-Yngvason axiomatic framework and apply it to various quantum resource theories. In particular, for the resource theory of non-uniformity [14], the Lieb-Yngvason entropy function coincides with the von Neumann entropy of the quantum density operator, which is a measure of quantum information. We, on the other hand, seek to modify axiomatic thermodynamics itself to describe processes involving "information engines" such as Maxwell's demon. We do not rely on any particular microphysics and have both classical and quantum models for our axioms. The connections we find between information and thermodynamic entropy are thus as general as the axioms themselves. Furthermore, to make our concept of information as clear as possible, we will seek to base our development on the most elementary ideas of state and process.
We therefore take as our prototype the axiomatic theory of Giles [1]. In fact, Giles's monograph contains two different axiomatic developments. (The first system is presented in Chapters 1-6 of Giles's book, and the second is introduced beginning in Chapter 7.) The first, which we might designate Giles I, is based on straightforward postulates about the properties of states and processes. The second, Giles II, is more sophisticated and powerful. The axioms are less transparent in meaning (e.g., the assumed existence of "anti-equilibrium" and "internal" thermodynamic states), but they support stronger theorems. This difference can be illustrated by the status of entropy functions in the two developments. In Giles I, it is shown that an entropy function exists; in fact, there may be many such functions. In Giles II, it is shown that an absolute entropy function, one that is zero for every anti-equilibrium state, exists and is unique. Giles himself regarded the second system as "The Formal Theory", which he summarizes in an Appendix of that title.
The information-based system we present here, on the other hand, is closer to the more elementary framework of Giles I. We have taken some care to use notation and concepts that are as analogous as possible to that theory. We derive many of Giles's propositions, including some that he uses as axioms. Despite the similarities, however, even the most elementary ideas (such as the combination of states represented by the + operation) will require some modifications. These changes highlight the new ideas embodied in axiomatic information thermodynamics, and so we will take some care to discuss them as they arise.
Section 2 introduces the fundamental idea of an eidostate, including how two or more eidostates may be combined. In Section 3, we introduce the → relation between eidostates: A → B means that eidostate A can be transformed into eidostate B, with no other net change in the apparatus that accomplishes the transformation. The collection of processes has an algebraic structure, which we present in Section 4. We introduce our concept of information in Section 5 and show that the structure of information processes imposes a unique entropy measure on pure information states. Section 6 introduces axioms inspired by Maxwell's demon and shows their implications for processes involving thermodynamic states. Sections 7 and 8 outline how our information axioms also yield Giles's central results about thermodynamic processes, including the existence of an entropy function, conserved components of content, and mechanical states. In Section 10, we see that an entropy function can be extended in a unique way to a wider class of "uniform" eidostates. This, rather remarkably, gives rise to a unique probability measure within uniform eidostates, as we describe in Section 11. Sections 12 and 13 present two detailed models of our axioms, each one highlighting a different aspect of the theory. We conclude in Section 14 with some remarks on the connection between information and thermodynamic entropy, a connection that emerges necessarily from the structure of processes in our theory.

Eidostates
Giles [1] bases his development on a set S of states. The term "state" is undefined in the formal theory, but heuristically it represents an equilibrium macrostate of some thermodynamic system. (Giles, in fact, identifies a ∈ S with some method of preparation; the concept of a "system" does not appear in his theory, or in ours.) States can be combined using the + operation, so that if a, b ∈ S then a + b ∈ S also. The new state a + b is understood as the state of affairs arising from simultaneous and independent preparations of states a and b. For Giles, this operation is commutative and associative; for example, a + b is exactly the same state as b + a.
We also have a set S . However, our theory differs in two major respects. First, the combination of states is not assumed to be commutative and associative. On the contrary, we regard a + b and b + a as entirely distinct states for any a = b. The motivation for this is that the way that a composite state is assembled encodes information about its preparation, and we want to be able to keep track of this information. On the states in S , the operation + is simply that of forming a Cartesian pair of states: a + b = (a, b), and nothing more.
A second and more far-reaching difference is that we cannot confine our theory to individual elements of S . A theory that keeps track of information in a physical system must admit non-deterministic processes. For instance, if a measurement is made, the single state a before the measurement may result in any one of a collection of possible states {a 0 , a 1 , . . . , a n } corresponding to the various results. Therefore, our basic notion is the eidostate, which is a finite nonempty set of states. The term "eidostate" derives from the Greek word eidos, meaning "to see". The eidostate is a collection of states that may be regarded as possible from the point of view of some agent. The set of eidostates is designated by E . When we combine two eidostates A, B ∈ E into the composite A + B, we mean the set of all combinations a + b of elements of these sets. That is, A + B is exactly the Cartesian product (commonly denoted A × B) for the sets. Thus, our state combination operation + is very different from that envisioned by Giles. His operation is an additional structure on S that requires an axiom to fix its properties, whereas we simply use the Cartesian product provided by standard set theory, which we of course assume.
Some eidostates in E are Cartesian products of other sets (which are also eidostates); some eidostates are not, and are thus "prime". We assume that each eidostate has a "prime factorization" into a finite number of components. Here, is our first axiom: Axiom 1. Eidostates: E is a collection of sets called eidostates such that: (a) Every A ∈ E is a finite nonempty set with a finite prime Cartesian factorization.
(c) Every nonempty subset of an eidostate is also an eidostate.
Part (c) of this axiom ensures, among other things, that every element a of an eidostate can be associated with a singleton eidostate {a}. Without too much confusion, we can simply denote any singleton eidostate {a} by the element it contains, writing a instead. We can therefore regard the set of states S in two ways. Either we may think of S as the collection of singleton eidostates in E , or we may say that S is the collection of all elements of all eidostates: S = A∈E A. Either way, the set E characterized by Axiom 1 is the more fundamental object.
Our "eidostates" are very similar to the "specifications" introduced by del Rio et al. [15] in their general framework for resource theories of knowledge. The two ideas, however, are not quite identical. To begin with, a specification V may be any subset of a state space Ω, whereas the eidostates in E are required to be finite nonempty subsets of S -and indeed, not every such subset need be an eidostate. For example, the union A ∪ B of two eidostates is not necessarily an eidostate. The set E therefore does not form a Boolean lattice. Specifications are a general concept applicable to state spaces of many different kinds, and are used in [15] to analyze different resource theories and to express general notions of approximation and locality. We, however, will be restricting our attention to an E (and S ) with very particular properties, expressed by Axiom 1 and our later axioms, that are designed to model thermodynamics in the presence of information engines such as Maxwell's demon.
Composite eidostates are formed by combining eidostates with the + operation (Cartesian product). The same pieces may be combined in different ways. We say that A, B ∈ E are similar (written A ∼ B) if they are made up of the same components. That is, A ∼ B provided there are eidostates E 1 , . . . , E n such that A = F A (E 1 , . . . , E n ) and B = F B (E 1 , . . . , E n ) for two Cartesian product formulas F A and F B . Thus, and so on. The similarity relation ∼ is an equivalence relation on E . We sometimes wish to combine an eidostate with itself several times. For the integer n ≥ 1, we denote by nA the eidostate A + (A + . . .), where A appears n times in the nested Cartesian product. This is one particular way to combine the n instances of A, though of course all such ways are similar. Thus, we may assert equality in A + nA = (n + 1)A, but only similarity in nA + A ∼ (n + 1)A.
Finally, we note that we have introduced as yet no probabilistic ideas. An eidostate is a simple enumeration of possible states, without any indication that some are more or less likely than others. However, as we will see in Section 11, in certain contexts, a natural probability measure for states does emerge from our axioms.

Processes
In the axiomatic thermodynamics of Giles, the → relation describes state transformations. The relation a → b means that there exists another state z and a definite time interval τ ≥ 0 so that where τ indicates time evolution over the period τ. The pair (z, τ) is the "apparatus" that accomplishes the transformation from a to b. This dynamical evolution is a deterministic process; that is, in the presence of the apparatus (z, τ) the initial state a is guaranteed to evolve to the final state b. This rule of interpretation for → motivates the properties assumed for the relation in the axioms. Our version of the arrow relation → is slightly different, in that it encompasses non-deterministic processes. Again, we envision an apparatus (z, τ) and we write to mean that the initial state a + z may possibly evolve to b + z over the stated interval. Then, for eidostates A and B, the relation A → B means that, if a ∈ A and b ∈ S , then there exists an apparatus (z, τ) such that a + z τ b + z only if b ∈ B. Each possible initial state a in A might evolve to one or more final states, but all of the possible final states are contained in B. For singleton eidostates a and b, the relation a → b represents a deterministic process, as in Giles's theory. Again, we use our heuristic interpretation of → to motivate the essential properties specified in an axiom: Axiom 2. Processes: Let eidostates A, B, C ∈ E , and s ∈ S .
Part (a) of the axiom asserts that it is always possible to "rearrange the pieces" of a composite state. Thus, A + B → B + A and so on. Of course, since ∼ is a symmetric relation, A ∼ B implies both A → B and B → A, which we might write as A ↔ B.
Part (b) says that a process from A to C may proceed via an intermediate eidostate B. Parts (c) and (d) establish the relationship between → and +. We can always append a "bystander" state C to any process A → B, and a bystander singleton state s in A + s → B + s can be viewed as part of the apparatus that accomplishes the transformation A → B.
We use the → relation to characterize various conceivable processes. A formal process is simply a pair of eidostates A, B . Following Giles, we may say that A, B is: irreversible if it is possible but not reversible.
Thus, any formal process must be one of four types: reversible, natural irreversible, antinatural irreversible, or impossible.
Any nonempty subset of an eidostate is an eidostate. If the eidostate A is an enumeration of possible states, the proper subset eidostate B A may be regarded as an enumeration with some additional condition present that eliminates one or more of the possibilities. What can we say about processes involving these "conditional" eidostates? To answer this question, we introduce two further axioms. The first is this: This expresses the idea that no natural process can simply eliminate a state from a list of possibilities. This is a deeper principle than it first appears. Indeed, as we will find, in our theory, it is the essential ingredient in the Second Law of Thermodynamics.
To express the second new axiom, we must introduce the notion of a uniform eidostate. The eidostate A is said to be uniform provided, for every a, b ∈ A, we have a b. That is, every pair of states in A is connected by a possible process. This means that all of the A states are "comparable" in some way involving the → relation.
All singleton eidostates are uniform. Are there any non-uniform eidostates? The axioms in fact do not tell us. We will find models of the axioms that contain non-uniform eidostates and others that contain none. Even if there are states a, b ∈ S for which a b and b a, nothing in our axioms guarantees the existence of an eidostate that contains both a and b as elements. We denote the set of uniform eidostates by U ⊆ E . Now, we may state the second axiom about conditional processes.

Axiom 4. Conditional processes:
(a) Suppose A, A ∈ E and b ∈ S . If A → b and A ⊆ A then A → b. (b) Suppose A and B are uniform eidostates that are each disjoint unions of eidostates: The first part makes sense given our interpretation of the → relation in terms of an apparatus (z, τ). If every a ∈ A satisfies a + z τ b + z for only one state b, then the same will be true for every a ∈ A ⊆ A. Part (b) of the axiom posits that, if we can find an apparatus whose dynamics transforms A 1 states into B 1 states, and another whose dynamics transforms A 2 states into B 2 states, then we can devise a apparatus with "conditional dynamics" that does both tasks, taking A to B. This is a rather strong proposition, and we limit its scope by restricting it to the special class of uniform eidostates. We can as a corollary extend Part (b) of Axiom 4 to more than two subsets. That is, suppose A, B ∈ U and the sets A and B are each partitioned into n mutually disjoint, nonempty subsets: A = A 1 ∪ · · · ∪ A n and B = B 1 ∪ · · · B n . In addition, suppose A k → B k for all k = 1, . . . , n. Then, A → B.

Process Algebra and Irreversibility
Following Giles, we now explore the algebraic structure of formal processes and describe how the type of a possible process may be characterized by a single real-valued function. Although the broad outlines of our development will follow that of Giles [1], there are significant differences. (For example, in our theory, the unrestricted set P of eidostate processes does not form a group.) There is an equivalence relation among formal processes, based on the similarity relation ∼ among eidostates. We say that A, B . = C, D if there exist singletons x, y ∈ S such that A + x ∼ C + y and B + x ∼ D + y. That is, if we augment the eidostates in A, B by x and those in C, D by y, the corresponding eidostates in the two processes are mere rearrangements of each other. It is not hard to establish that this is an equivalence relation. Furthermore, equivalent processes (under . =) are always of the same type. To see this, suppose that if A, B . = C, D and A → B. Then, A + x ∼ C + y, etc., and Hence, C → D. It is a straightforward corollary that A → B if and only if C → D. Processes (or more strictly, equivalence classes of processes under . =) have an algebraic structure. We define the sum of two processes as and the negation of a process as − A, B = B, A . We call the − operation "negation" even though in general − A, B is not the additive inverse of B, A . Such an inverse may not exist. As we will see, however, the negation does yield an additive inverse process in some important special contexts. The sum and negation operations on processes are both compatible with the equivalence . =.
This means that the sum and negation operations are well-defined on equivalence classes of processes. If [ A, B ] and [ C, D ] denote two equivalence classes represented by A, B and C, D , then which is the same regardless of the particular representatives chosen to indicate the two classes being added.
We let P denote the collection of all equivalence classes of formal processes. Then, the sum operation both is associative and commutative on P. Furthermore, it contains a zero element 0 = [ s, s ] for some singleton s. That is, if Γ = [ A, B ], The set P is thus a monoid (a semigroup with identity) under the operation +. Moreover, the subset P S of singleton processes-equivalence classes of formal processes with singleton initial and final states-is actually an Abelian group, since In P S , the negation operation does yield the additive inverse of an element. The → relation on states induces a corresponding relation on processes. If Γ, ∆ ∈ P, we say that Γ → ∆ provided the process Γ − ∆ is natural (a condition we might formally write as Γ − ∆ → 0). Intuitively, this means that process Γ can "drive process ∆ backward", so that Γ and the opposite of ∆ together form a natural process. Now, suppose thatP is a collection of equivalence classes of possible processes that is closed under addition and negation. An irreversibility function I is a real-valued function onP such that The value of I determines the type of the processes inP: An irreversibility function, if it exists, has a number of elementary properties. For instance, since the process Γ + (−Γ) is always reversible for any Γ ∈P, it follows that I(−Γ) = −I(Γ). In fact, we can show that all irreversibility functions onP are essentially the same: Theorem 1. An irreversibility function onP is unique up to an overall positive factor.
Proof. Suppose I 1 and I 2 are irreversibility functions on the setP. If Γ is reversible, then I 1 (Γ) = I 2 (Γ) = 0. If there are no irreversible processes inP, then I 1 = I 2 .
Now, suppose thatP contains at least one irreversible process Γ, which we may suppose is natural irreversible. Thus, both I 1 (Γ) > 0 and I 2 (Γ) > 0. Consider some other process ∆ ∈P. We must show that We proceed by contradiction, imagining that the two ratios are not equal and (without loss of generality) that the second one is larger. Then, there exists a rational number m/n (with n > 0) such that The first inequality yields mI 1 (Γ) − nI 1 (∆) > 0, so that the process mΓ − n∆ (that is, mΓ + n(−∆)) must be natural irreversible. The second inequality yields mI 2 (Γ) − nI 2 (∆) < 0, so that the process mΓ − n∆ must be antinatural irreversible. These cannot both be true, so the original ratios must be equal.
The additive irreversibility function I is only defined on a setP of possible processes. However, we can under some circumstances extend an additive function to a wider domain. It is convenient to state here the general mathematical result we will use later for this purpose: Theorem 2 ("Hahn-Banach theorem" for Abelian groups). Let G be an Abelian group. Let φ be a real-valued function defined and additive on a subgroup G 0 of G. Then, there exists an additive function φ defined on G such that φ (x) = φ(x) for all x in G 0 .
Note that the extension φ is not necessarily unique-that is, there may be many different extensions of a single additive function φ.

Information and Entropy
In our theory, information resides in the distinction among possible states. Thus, the eidostate A = {a 1 , a 2 , a 3 } represents information in the distinction among it elements. However, thermodynamic states such as the a k s may also have other properties such as energy, particle content, and so on.
To disentangle the concept of information from the other properties of these states, we introduce a notion of a "pure" information state.
The intuitive idea is this. We imagine that the world contains freely available memory devices. Different configurations of these memories-different memory records-are distinct states that are degenerate in energy and every other conserved quantity. Any particular memory record can thus be freely created from or reset to some null value.
We therefore define a record state to be an element r ∈ S such that there exists a ∈ S so that a ↔ a + r. The state a can be thought of as part of the apparatus that reversibly exchanges the particular record r with a null value. In fact, if A is any eidostate at all, we find that and so by cancellation of the singleton state a, A ↔ A + r. We denote the set of record states by R.
If r, s ∈ R, then r + s ∈ R, and furthermore r ↔ s. Any particular record state can be reversibly transformed into any other, a fact that expresses the arbitrariness of the "code" used to represent information in a memory device. An information state is an eidostate whose elements are all record states, and the set of such eidostates is denoted I . All information states are uniform eidostates. An information process is one that is equivalent to a process I, J , where I, J ∈ I . (Of course, information processes also include processes of the form I + x, J + x for a non-record state x ∈ S , as well as more complex combinations of record and non-record states.) Roughly speaking, an information process is a kind of computation performed on information states.
It is convenient at this point to define a bit state (denoted I b ) as an information state containing exactly two record states. That is, I b = {r 0 , r 1 }. A bit process is an information process of the form Θ b = r, I b for some r ∈ R-that is, a process by which a bit state is created from a single record state.
We can illustrate a bit process in a thermodynamics context by considering a thought-experiment involving Maxwell's demon. We imagine that the demon operates on a one-particle gas confined to a volume, a situation described by gas state v (see Figure 3). The demon, whose memory initially has a record state r, inserts a partition into the container, dividing the volume in two halves labeled 0 and 1. The gas molecule is certainly in one sub-volume or the other. The demon then records in memory which half the particle occupies. Finally, the partition is removed and the particle wanders freely around the whole volume of the container. In thermodynamic terms, the gas state relaxes to the original state v. The overall process establishes the following relations: and so r → {r 0 , r 1 }. In our example, the bit process Maxwell's demon interacting with a one-particle gas, illustrating a bit process r, {r 0 , r 1 } . However, we do not yet know that there actually are information states and information processes in our theory. We address this by a new axiom.

Axiom 5.
Information: There exist a bit state and a possible bit process.
Axiom 5 has a wealth of consequences. Since an information state exists, record states necessarily also exist. There are infinitely many record states, since for any r ∈ R we also have distinct states r + r, r + (r + r), . . . , nr, . . . , all in R. We may have information states in I that contain arbitrarily many record states, because nI b contains 2 n elements. Furthermore, since every nonempty subset of an information state is also an information state, for any integer k ≥ 1 there exists I ∈ I so that #(I) = k.
Any two bit states can be reversibly transformed into one another. Consider I b = {r 0 , r 1 } and If any bit process is possible, then every bit process is possible. Furthermore, every such process is of the natural irreversible type. To see why, consider the bit state I b = {r 0 , r 1 } and suppose I b → r for a record state r. Since r → r 0 , this implies that {r 0 , r 1 } → r 0 , which is a violation of Axiom 3. Therefore, it must be that r → I b but I b r; and this is true for any choice of r and I b . Now, we can prove that every information process is possible, and that the → relation is determined solely by the relative sizes of the initial and final information states. Proof. To begin with, we can see that #(I) = #(J) implies I → J. This is because we can write I = {r 1 , . . . , r n } and J = {s 1 , . . . , s n }. Since r k → s k for every k = 1, . . . , n, the finite extension of Axiom 4 tells us that I → J. Now, imagine that #(I) > #(J). There exists a proper subset I of I so that #(I ) = #(J), and thus J → I . If it happened that I → J, it would follow that I → I , a contradiction of Axiom 3. Hence, #(I) > #(J) implies I J. It remains to show that if #(I) < #(J), it must be that I → J. As a first case, suppose that #(I) = n and #(J) = n + 1. Letting I = {r 1 , . . . , r n } and J = {s 1 , . . . , s n , s n+1 }, we note that r 1 → s 1 , r 2 → s 2 , . . . , r n → {s n , s n+1 } (the last being a bit process. It follows that I → J. We now proceed inductively. Given #(J) = #(I) + n, we can imagine a sequence of information states K m with successive numbers of elements between #(I) and #(J), so that #(K m ) = #(I) + m. From what we have already proved, and by transitivity of the → relation we may conclude that I → J. Therefore, I → J if and only if #(I) ≤ #(J).
Intuitively, the greater the number of distinct record states in an information state, the more information it represents. Thus, under our axioms, a natural information process may maintain or increase the amount of information, but never decrease it.
We can sharpen this intuition considerably. Let P I be the set of information processes, which is closed under addition and negation, and in which every process is possible. We can define a function I on P I as follows: For Γ = [ I, J ] ∈ P I , The logarithm function guarantees the additivity of I when two processes are combined, and the sign of I exactly determines whether I → J. Thus, I is an irreversibility function on P I , as the notation suggests. This function is unique up to a positive constant factor-i.e., the choice of logarithm base. If we choose to use base-2 logarithms, so that a bit process has I(Θ b ) = 1, then I is uniquely determined.
The irreversibility function on information processes can be expressed in terms of a function on information states. Suppose we have a collection of eidostates K ⊆ E that is closed under the + operation. With Giles, we define a quasi-entropy on K to be a real-valued function S such that, for A, B ∈ K : (A full "entropy" function satisfies one additional requirement, which we will address in Section 8 below.) Given a quasi-entropy S, we can derive an irreversibility function I on possible K -processes by Obviously, S(I) = log #(I) is a quasi-entropy function on I that yields the irreversibility function on P I . We recognize it as the Hartley-Shannon entropy of an information source with #(I) possible outputs [16]. In fact, this is the only possible quasi-entropy on I . Since every information process is possible, two different quasi-entropy functions S and S can only differ by an additive constant. However, since I + r ↔ I for I ∈ I and r ∈ R, we know that S(r) = 0 for any quasi-entropy. Therefore, S(I) = log #(I) is the unique quasi-entropy function for information states. The quasi-entropy of a bit state is S(I b ) = 1.

Demons
Maxwell's demon accomplishes changes in thermodynamic states by acquiring and using information. For example, a demon that operates a trapdoor between two containers of gas can arrange for all of the gas molecules to end up in one of the containers, "compressing" the gas without work. As we have seen, it is also possible to imagine a reversible demon, which acquires and manipulates information in a completely reversible way. If such a demon produces a transformation from state x to state y by acquiring k bits of information in its memory, it can accomplish the reverse transformation (from y to x) while erasing k bits from its memory.
Maxwell's demon is a key concept in axiomatic information thermodynamics, and we introduce a new axiom to describe "demonic" processes. Axiom 6. Demons: Suppose a, b ∈ S and J ∈ I such that a → b + J.
(a) There exists I ∈ I such that b → a + I. (b) For any I ∈ I , either a → b + I or b + I → a.
In Part (a), we assert that what one demon can do (transforming a to b by acquiring information in J), another demon can undo (transforming b to a by acquiring information in I). Part (b) envisions a reversible demon. Any amount of information in I is either large enough that we can turn a to b by acquiring I, or small enough that we can erase the information by turning b to a.
A process A, B is said to be demonically possible if one of two equivalent conditions hold: • There exists an information state J ∈ I such that either There exists an information process I, J ∈ P I such that A, B + I, J is possible; that is, either It is not hard to see that these are equivalent. Suppose we have J ∈ I such that A → B + J. Then, for any I ∈ I , A + I → B + (I + J). Conversely, we note that A → A + I for any I ∈ I . Thus, if A + I → B + J then A → B + J as well.
If a process is possible, then it is also demonically possible, since if A → B it is also true that A + I → B + I. For singleton processes in P S , moreover, the converse is also true. Suppose a, b ∈ S and a, b is demonically possible. Then, there exists I ∈ I such that either a → b + I or b → a + I. Either way, either trivially or by an application of Axiom 6, there must be J ∈ I so that a → b + J. A single record state r ∈ R is a singleton information state in I . Thus, by Axiom 6 it must be that either a → b + r or b + r → a. Since b + r ↔ b, we find that a b, and so a, b is possible. A singleton process is demonically possible if and only if it is possible. This means that we can use processes involving demons to understand processes that do not. In fact, we can use Axiom 6 to prove a highly significant fact about the → relation on singleton eidostates. Proof. First, we note the general fact that, if x, y is a possible singleton process, then there exist I, J ∈ I so that x → y + I and y → x + J.
Given our hypothesis, therefore, there must be I, J ∈ I such that b → a + I and a → c + J. Then b → a + I → c + (I + J).
That is, b, c is demonically possible, and hence possible.
This fact is so fundamental that Giles made it an axiom in his theory. For us, it is a straightforward consequence of the axiom about processes involving demons. It tells us that the set of singleton states S is partitioned into equivalence classes, within each of which all states are related by .
This statement is more primitive than, but closely related to, a well-known principle called the Comparison Hypothesis. The Comparison Hypothesis deals with a state relation called adiabatic accessibility (denoted ≺) which is definable in Giles's theory (and ours) but is taken as an undefined relation in some other axiomatic developments. According to the Comparison Hypothesis, if X and Y are states in a given thermodynamic space, either X ≺ Y or Y ≺ X. Lieb and Yngvason, for instance, show that the Comparison Hypothesis can emerge as a consequence of certain axioms for thermodynamic states, spaces, and equilibrium [2,3].
Theorem 4 also sheds light on our axiom about conditional processes, Axiom 4. In Part (a) of this axiom, we suppose that A → b for some A ∈ E and b ∈ S . The axiom itself allows us to infer that a → b for every a ∈ A. However, Theorem 4 now implies that, for every a, a ∈ A, either a → a or a → a. In other words, Part (a) of Axiom 4, similar to Part (b) of the same axiom, only applies to uniform eidostates.
Finally, we introduce one further "demonic" axiom.
Axiom 7. Stability: Suppose A, B ∈ E and J ∈ I . If nA → nB + J for arbitrarily large values of n, then A → B.
According to the Stability Axiom, if a demon can transform arbitrarily many copies of eidostate A into arbitrarily many copies of B while acquiring a bounded amount of information, then we may say that A → B. This can be viewed as a kind of "asymptotic regularization" of the → relation. The form of the Stability Axiom that we have chosen is a particularly simple one, and it suffices for our purposes in this paper. However, more sophisticated axiomatic developments might require a refinement of the axiom. Compare, for instance, Axiom 2.1.3 in Giles to its refinement in Axiom 7.2.1 [1].
To illustrate the use of the Stability Axiom, suppose that A, B ∈ E and I ∈ I such that A + I → B + I. Our axioms do not provide a "cancellation law" for information states, so we cannot immediately conclude that A → B. However, we can show that nA → nB + I for all positive integers n. The case n = 1 holds since A → A + I → B + I. Now, we proceed inductively, assuming that nA → nB + I for some n. Then, Thus, nA → nB + I for arbitrarily large (and indeed all) values of n. By the Stability Axiom, we see that A → B. Thus, there is after all a general cancellation law information states that appear on both sides of the → relation.

Irreversibility for Singleton Processes
From our two "demonic" axioms (Axioms 6 and 7), we can use the properties of information states to derive an irreversibility function on singleton processes. LetP S denote the set of possible singleton processes. This is a subgroup of the Abelian group P S . Thus, if we can find an irreversibility function I onP S , we will be able to extend it to all of P S .
We begin by proving a useful fact about possible singleton processes: Theorem 5. Suppose a, b ∈ S so that α = a, b is possible. Then, for any integers n, m ≥ 0, either Proof. If m = n, the result is easy. Suppose that n > m, so that n = m + k for positive integer k. By Axiom 6, either a → b + kI b or b + kI b → a. We can then append the information state mI b to both sides and rearrange the components. The argument for m > n is exactly similar.
This fact has a corollary that we may state using the → relation on processes. Suppose α = a, b is a possible singleton process, and let q, p be integers with q > 0. Then, either qα → pΘ b or qα ← pΘ b (so that qα − pΘ b is either natural or antinatural) for the bit process Θ b = r, I b .
Given α, therefore, we can define two sets of rational numbers: where q > 0. Both sets are nonempty and every rational number is in at least one of these sets. Furthermore, if p/q ∈ U α and p /q ∈ L α , we have that qα ← pΘ b and q α → p Θ b , and so Hence, (pq − p q)Θ b → 0. Since Θ b is itself a natural irreversible process, it follows that pq − p q ≥ 0, and so p q ≥ p q .
Every element of U α is an upper bound for L α . It follows that U α and L α form a Dedekind cut of the rationals, which leads us to the following important result. Theorem 6. For α ∈P S , define I(α) = inf U α = sup L α . Then, I is an irreversibility function onP S .
Notice that we have arrived at an irreversibility function for possible singleton processes-those most analogous to the ordinary processes in Giles or any text on classical thermodynamics-from axioms about information and processes involving demons (Axioms 5-7). In our view, such ideas are not "extras" to be appended onto a thermodynamic theory, but are instead central concepts throughout. In ordinary thermodynamics, the possibility of a reversible heat engine can have implications for processes that do not involve any heat engines at all. In the information thermodynamics whose axiomatic foundations we are exploring, the possibility of a Maxwell's demon has implications even for situations in which no demon acts.
We now have irreversibility functions for both information processes and singleton processes. These are closely related. In fact, it is possible to prove the following general result: Theorem 7. If α ∈P S and Γ ∈ P I , then the combined process α + Γ is natural if and only if I(α) + I(Γ) ≥ 0.
Since the setP S of possible singleton processes is a subgroup of the Abelian group P S of all singleton processes, we can extend the additive irreversiblity function I to all of P S . Though I is unique on the possible setP S , its extension to P S is generally not unique.

Components of Content and Entropy
Our axiomatic theory of information thermodynamics is fundamentally about the set of eidostates E . However, the part of that theory dealing with the set S of singleton eidostates includes many of the concepts and results of ordinary axiomatic thermodynamics [1]. We have a group of singleton processes P S containing a subgroupP S of possible processes, and we have constructed an irreversibility function I onP S that may be extended to all of P S . From these we can establish several facts. Because the extension of the irreversibility function I fromP S to all of P S is not unique, the quasi-entropy S is not unique either. How could various quasi-entropies differ? Suppose I 1 and I 2 are two different extensions of the same original I, leading to two quasi-entropy functions S 1 and S 2 on S . Then, the difference Q = S 1 − S 2 is a component of content. That is, if a, b is possible, since I 1 and I 2 agree onP S , which contains a, b . Another idea that we can inherit without alteration is the concept of a mechanical state. A mechanical state is a singleton state that reversibly stores one or more components of content, in much the same way that we can store energy reversibly as the work done to lift or lower a weight. The mechanical state in this example is the height of the weight. In Giles's theory [1], mechanical states are the subject of an axiom, which we also adopt: Nothing in this axiom asserts the actual existence of any mechanical state. It might be that M = ∅. Furthermore, the choice of the designated set M is not determined solely by the → relations among the states. For instance, the set R of record states might be included in M , or not. This explains why the introduction of mechanical states must be phrased as an axiom, rather than a definition: a complete specification of the system must include the choice of which set is to be designated as M . Whatever choice is made for M , the set P M of mechanical processes (i.e., those equivalent to l, m for l, m ∈ M ) will form a subgroup of P S .
A mechanical state may "reversibly store" a component of content Q, but it need not be true that every Q can be stored like this. We say that a component of content Q is non-mechanical if Q(m) = 0 for all m ∈ M . For example, we might store energy by lifting or lowering a weight, but we cannot store particle number in this way.
Once we have mechanical states and processes, we can give a new classification of processes. A process Γ ∈ P is said to be adiabatically natural (possible, reversible, antinatural) if there exists a mechanical process µ ∈ P M such that Γ + µ is natural (possible, reversible, antinatural). We can also define the "adiabatic accessibility" relation for states in S , as mentioned in Section 6: a ≺ b whenever the process a, b is adiabatically natural.
The set M of mechanical states allows us to refine the idea of a quasi-entropy into an entropy, which is a quasi-entropy S that takes the value S(m) = 0 for any mechanical state m. Such a function is guaranteed to exist. We end up with a characterization theorem, identical to a result of Giles [1], that summarizes the general thermodynamics of singleton eidostates in our axiomatic theory. An entropy function is not unique. Two entropy functions may differ by a non-mechanical component of content.
The entropy function on S is related to the information entropy function we found for information states in I . Suppose we have a, b ∈ S and I, J ∈ I . Then, Theorem 7 tells us that a + I → b + J only if S(a) + log #(I) ≤ S(b) + log #(J).
Let E SI represent the set of eidostates that are similar to a singleton state combined with an information state. Then, S(a + I) = S(a) + log #(I) is an entropy function on E SI . In the next section, we will extend the domain of the entropy function even further, to the set U of all uniform eidostates.

State Equivalence
Consider a thought-experiment (illustrated in Figure 4) in which a one-particle gas starts out in a volume v 0 and a second thermodynamic system starts out in one of three states e 1 , e 2 or e 3 . We assume that all conserved quantities are the same for these three states, but they may differ in entropy. We can formally describe the overall situation by the eidostate E + v 0 , where E = {e 1 , e 2 , e 3 } is uniform.  Figure 4. A state equivalence thought-experiment involving states of an arbitrary thermodynamic system and a one-particle gas.
We can reversibly transform each of the e k states to the same state e, compensating for the various changes in entropy by expanding or contracting the volume occupied by the gas. That is, we can have v 0 + e k ↔ v k + e for adjacent but non-overlapping volumes v k . Axiom 4 indicates that we can write Now, we note that V can itself be reversibly transformed into a singleton eidostate v. A gas molecule in one of the sub-volumes (eidostate V) can be turned into a molecule in the whole volume (eidostate e) by removing internal partitions between the sub-volumes; and when we re-insert these partitions the particle is once again in just one of sub-volumes. Thus, V ↔ v. To summarize, we have The uniform eidostate E, taken together with the gas state v 0 , can be reversibly transformed into the singleton state v + e. We call this a state equivalence for the uniform eidostate E. By choosing the state e properly (say, by letting e = e k for some k), we can also guarantee that the volume v is larger than v 0 , so that v 0 → v by free expansion.
This discussion motivates our final axiom, which states that this type of reversible transformation is always possible for a uniform eidostate. Axiom 9. State equivalence: If E is a uniform eidostate then there exist states e, x, y ∈ S such that x → y and E + x ↔ e + y.
Axiom 9 closes a number of gaps in our theory. For example, our previous axioms (Axioms 1-8) do not by themselves guarantee than any state in S has a nonzero entropy. With the new axiom, however, we can prove that such states exist. The bit state I b , which is uniform, has some state equivalence given by I b + x ↔ e + y. Since the eidostates on each side of this relation are in E SI , we can determine the entropies on each side. We find that 1 + S(x) = S(e) + S(y). (25) It follows that at least one of the states e, x, y must have S = 0. State equivalence also allows us to define the entropy of any uniform eidostate E ∈ U . If E + x ↔ e + y then we let S(E) = S(e) + S(y) − S(x).
We must first establish that this expression is well-defined. If we have two state equivalences for the same E, so that E + x ↔ e + y and E + x ↔ e + y , then Thus, our definition for S(E) does not depend on our choice of state equivalence for E. Is S an entropy function on U ? It is straightforward to show that S is additive on U and that S(m) = 0 for any mechanical state m. It remains to show that, for any E, F ∈ U with E, F possible, E → F if and only if S(E) ≤ S(F). We will use the state equivalences E + x ↔ e + y and F + w ↔ f + z.
Suppose first that E → F. Then, E + (x + w) → F + (x + w) and so From this, it follows that and hence S(E) ≤ S(F). We can actually extract one more fact from this argument. If we assume that E, F is possible, it must also be true that (e + y) + w, ( f + z) + x is a possible singleton process. If we now suppose that S(E) ≤ S(F), we know that S((e + y) + w) ≤ S(( f + z) + x) and thus Therefore, E → F, as desired.
We have extended the entropy S to uniform eidostates. It is even easier to extend any component of content function Q to these states. If E ∈ U , then any e 1 , e 2 ∈ E must have Q(e 1 ) = Q(e 2 ), since e 1 e 2 . Thus, we can define Q(E) = Q(e k ) for any e k ∈ E. This is additive because the elements of E + F are combinations e k + f j of states in E and F. Furthermore, suppose we have a state equivalence E + x ↔ e + y. Since we assume x → y in a state equivalence, Q(x) = Q(y). By the conditional process axiom (Axiom 4) we know that e k + x → e + y for any e k ∈ E. It follows that Q(e) = Q(e k ) = Q(E). Now, let E 1 and E 2 be uniform eidostates with state equivalences E k + x k ↔ e k + y k . Suppose further that Q(E 1 ) = Q(E 2 ) for every component of content Q. We know that Q(x 1 ) = Q(y 1 ), Q(x 2 ) = Q(y 2 ) and Q(e 1 ) = Q(e 2 ) for every component of content. Thus It follows that E 1 E 2 , i.e., that E 1 , E 2 is a possible eidostate process. We have therefore extended Theorem 8 to all uniform eidostates. We state the new result here. The set U of uniform eidostates includes the singleton states in S , the information states in I , all combinations of these, and perhaps many other states as well. (Non-uniform eidostates in E might exist, as we will see in the model we discuss in Section 12, but their existence cannot be proved from our axioms.) The type of every process involving uniform eidostates can be determined by a single entropy function (which must not decrease) and a set of components of content (which must be conserved).

Entropy for Uniform Eidostates
We have extended the entropy function from singleton states and information states to all uniform eidostates. It turns out that this extension is unique. The following theorem and its corollaries actually allow us to compute the entropy of any E ∈ U from the entropies of the states contained in E. Theorem 10. Suppose E ∈ U is a disjoint union of uniform eidostates E 1 and E 2 . Then Proof. E and E k (k = 1, 2) all have equal components of content. Define Note that, if we replace E k by E k = E k + I for I ∈ I , then these new eidostates are still disjoint and These eidostates have the same components of content as the original E. Furthermore, We can find a uniform eidostate E 0 with the same components of content such that S 0 = S(E 0 ) is less than or equal to S(E 1 ), S(E 2 ) and S(E). (It suffices to pick E 0 to be the state of smallest entropy among E 1 , E 2 and E.) Then, there exist integers m k ≥ 1 such that

•
There are disjoint information states J k containing m k record states.
That is, we choose m k so that S(E k ) − S 0 ≥ 0 is between log(m k ) and log(m k + 1). To put it more simply, m k = 2 S(E k )−S 0 . Once we have m k , we can write that Adding these inequalities for k = 1, 2 and taking the logarithm yields How far apart are the two ends of this chain of inequalities? Here, is a useful fact about base-2 logarithms: If n ≥ 1, then log(n + 2) < log(n) + 2/n. This implies The two ends of the inequality differ by less than 2/(m 1 + m 2 ). We can get another chain of inequalities by applying Axiom 4 about conditional processes. Since all of our uniform eidostates have the same components of content, we know that From the axiom, we can therefore say which implies that We have two quantities that lie in the same interval. Their separation is therefore bounded by the interval width-i.e., less than 2/(m 1 + m 2 ). Therefore, How big are the numbers m k ? We can make such numbers as large as we like by considering instead the eidostates E k = E k + I, where I is an information state. As we have seen, ∆(E 1 , E 2 ) = ∆(E 1 , E 2 ). Given any > 0, we can choose I so that Then, 2/(m 1 + m 2 ) < , and so Since this is true for any > 0, we must have ∆(E 1 , E 2 ) = 0, and so as desired.
Theorem 10 has a corollary, which we obtain by applying the theorem inductively: Theorem 11. If E is a uniform eidostate, The entropy of any uniform eidostate is a straightforward function of the entropies of the states contained therein. If E contains more than one state, we notice that S(E) > S(e k ) for any e k ∈ E. It follows that e k , E is a natural irreversible process.
To take a simple example of Theorem 11, consider the entropy of an information state. Every record state r has S(r) = 0. Thus, for I ∈ I , as we have already seen.

Probability
An eidostate in E represents a state of knowledge of a thermodynamic agent. It is, as we have said, a simple list of possible states, without any assignment of probabilities to them. If the agent is to use probabilistic reasoning, then it needs to assign conditional probabilities of the form P(A|B) where A, B ∈ E .
The entropy formula in Theorem 11 allows us to make such an assignment based on the entropy itself, provided the eidostate conditioned upon is uniform. Suppose E ∈ U and let a ∈ S . Then, the entropic probability of a conditioned on E is Clearly, 0 ≤ P(a|E) ≤ 1 and ∑ a P(a|E) = 1. It is worth noting that, although E is "uniform" (in the sense that all a ∈ E have exactly the same conserved components of content), the probability distribution P(a|E) is not uniform, but assigns a higher probability to states of higher entropy. We can generalize entropic probabilities a bit further. Let A be any subset of S (be it an eidostate or not) and E ∈ U . Then, A ∩ E is either a uniform eidostate or the empty set ∅. If we formally assign S(∅) = −∞, then both E and A ∩ E have well-defined entropies. Then, we define Obviously, P(E|E) = 1. Now, consider two disjoint sets A and B along with E ∈ U . The set (A ∪ B) ∩ E is a disjoint union of uniform eidostates (or empty sets) A ∩ E and B ∩ E. Thus, in accordance with the rules of probability. We also have the usual rule for conditional probabilities. Suppose A, B ⊆ S and E ∈ U such that A ∩ E = ∅. Then The entropy formula in Theorem 11 and the probability assignment in Equation (50) call to mind familiar ideas from statistical mechanics. According to Boltzmann's formula, the entropy is S = log Ω, where Ω is (depending on the context) the number (or phase space volume or Hilbert space dimension) of the microstates consistent with macroscopic data about a system. In the microcanonical ensemble, a uniform probability distribution is assigned to these microstates. In an eidostate E comprising non-overlapping macrostates states {e 1 , e 2 , . . .}, we would therefore expect Ω(E) = Ω(e 1 ) + Ω(e 2 ) + . . ., and the probability of state e k should be proportional to Ω(e k ). However, in our axiomatic system Theorem 11 and Equation (50) do not arise from any assumptions about microstates, but solely from the "phenomenological" → relation among eidostates in E .
Even though the entropy function S is not unique, the entropic probability assignment is unique. Suppose S 1 and S 2 are two entropy functions for the same states and processes. Then, as we have seen, the difference S 1 − S 2 is a component of content. All of the states within a uniform eidostate E (as well as E itself) have the same values for all components of content. That is, S 1 (a) − S 2 (a) = S 1 (E) − S 2 (E) for any a ∈ E. Thus, P 1 (a|E) = 2 S 1 (a)−S 1 (E) = 2 S 2 (a)−S 2 (E) = P 2 (a|E).
(Both probabilities are zero for a / ∈ E, of course.) The entropic probability assignment is not the only possible probability assignment, but it does have a number of remarkable properties. For instance, suppose E and F are two uniform eidostates. If we prepare them independently, we have the combined eidostate E + F and the probability of some particular state x + y is as we would expect for independent events. The entropic probability also yields an elegant expression for the entropy S(E) of a uniform eidostate E. Theorem 12. Suppose E is a uniform eidostate, and P(a|E) is the entropic probability for state a ∈ E. Then where S(a) is the average state entropy in E and H( P) is the Shannon entropy of the P(a|E) distribution.
Proof. We first note that, for any a ∈ E, log P(a|E) = S(a) − S(E). We rewrite this as S(E) = S(a) − log P(a|E) and take the mean value with respect to the P(a|E) probabilities: Therefore, S(E) = S(a) + H( P), as desired.
We previously said that the list of possible states in an eidostate represents a kind of information. Theorem 12 puts this intuition on a quantitative footing. The entropy of a uniform eidostate E can be decomposed into two parts: the average entropy of the states, and an additional term representing the information contained in the distinction among the possible states. For a singleton state, the entropy is all of the first sort. For a pure information state I ∈ I , it is all of the second.
In fact, the decomposition itself uniquely picks out the entropic probability assignment. Suppose P(a|E) is the entropic probability of a given E, and P (a|E) is some other probability distribution over states in E. By Gibbs's inequality [17], with equality if and only if P (a|E) = P(a|E) for all a ∈ E. We find that and so S(E) ≥ S(a) with equality if and only if the P distribution is the entropic one. An agent that employs entropic probabilities will regard the entropy of a uniform eidostate E as the sum of two parts, one the average entropy of the possible states and the other the Shannon entropy of the distribution. For an agent that employs some other probability distribution, things will not be so simple. Besides the average state entropy and the Shannon entropy of the distribution, S(E) will include an extra, otherwise unexplained term. Thus, for a given collection of eidostates connected by the → relation, the entropic probability provides a uniquely simple account of the entropy of any uniform eidostate.
It is in this sense we say that the entropic probability "emerges" from the entropy function S, just as that function itself emerges from the → relation among eidostates.
Some further remarks about probability are in order. Every formal basis for probability emphasizes a distinct idea about it. In Kolmogorov's axioms [18], probability is simply a measure on a sample space. High-measure subsets are more probable. In the Bayesian approach of Cox [19], probability is a rational measure of confidence in a proposition. Propositions in which a rational agent is more confident are also more probable. Laplace's early discussion [20] is based on symmetry. Symmetrically equivalent events-two different orderings of a shuffled deck of cards, for instance-are equally probable. (Zurek [21] has used a similar principle of "envariance" to discuss the origin of quantum probabilities.) In algorithmic information theory [16], the algorithmic probability of a bit string is related to its complexity. Simpler bit strings are more probable.
In a similar way, entropic probabilities express facts about state transformations. In a uniform eidostate E, any two states a, b ∈ E are related by a possible process. If a → b, then the output state is at least as probable as the input state: P(a|E) ≤ P(b|E).

A Model for the Axioms
A model for an axiomatic system may serve several purposes. The existence of a model establishes that the axioms are self-consistent. A model may also demonstrate that the system can describe an actual realistic physical situation. If the axioms have a variety of interesting models, then the axiomatic theory is widely applicable. We may almost say that the entire significance of an axiomatic system lies in the range of models for that system.
Terms that are undefined in an abstract axiomatic system are defined within a model as particular mathematical structures. The axioms of the system are provable properties of those structures. Therefore, a model for axiomatic information thermodynamics must include several elements: • A set S of states and a collection E of finite nonempty subsets of S to be designated as eidostates.

•
A rule for interpreting the combination of states (+) in S .
A designated set M ⊆ S of mechanical states (which might be empty).

•
Proofs of Axioms 1-9 within the model, including the general properties of →, the existence of record states and information states, etc.
The model will therefore involve specific meanings for S , E , +, → and so forth. It will also yield interpretations of derived concepts and results, such as entropy functions and conserved components of content. In the abstract theory, the combination a + b is simply the Cartesian pair (a, b). This definition may suffice for the model, or the model may have a different interpretation of +. In any case, it must be true in the model that a + b = a + b implies a = a and b = b .
Our first model for axiomatic thermodynamics is based on a set A of "atomic" states, from which all states in S and all eidostates in E are constructed. We assign entropies and components of content to these states, which extend to composite states by additivity. Let us consider a simple but non-trivial example that has just one component of content Q. A suitable set A of atomic states is shown in Figure 5. It includes a special state r (with S(r) = 0 and Q(r) = 0) and a continuous set of states s λ with Q(s λ ) = 1 and S(s λ ) = λ. The parameter λ ranges over the closed interval [0, 1]. S Q r s 0 s 1 s Figure 5. Atomic states in our simple "macrostate" model.
The set of states S includes everything that can be constructed from A by finite application of the pairing operation. In this way, we can build up a ∈ S with any non-negative integer value of the component of content Q(a) and any entropy value 0 ≤ S(a) ≤ Q(a). Indeed, there will typically be many different ways to create given (Q, S) values. To obtain Q(a) = 2 and S(a) = 1, for example, we might have a = s 1 + s 0 , s 0 + s 1 , (s 1/2 + s 1/2 ) + r, . . .. Anticipating somewhat, we call an eidostate uniform if all of its elements have the same Q-value. We calculate the entropy of a uniform eidostate by applying Theorem 11 to it.
Our model allows any finite nonempty set of states to play the role of an eidostate. Hence, we have both uniform and non-uniform eidostates in E . Each eidostate A has a finite Cartesian factorization If A is uniform, then all of its factors are also uniform. If none of its factors are uniform, we say that A is completely non-uniform. More generally, we can write down an NU-decomposition for any eidostate A: where N A is a completely non-uniform eidostate and U A is a uniform eidostate. Of course, if A itself is either completely non-uniform or uniform, one or the other of these eidostates may be absent from the decomposition. The NU-decomposition is unique up to similarity: If N A + U A ∼ N A + U A for completely non-uniform Ns and uniform Us, then N A ∼ N A and U A ∼ U A .
We can now define the → relation on E in our model. If A, B ∈ E , we first write down NU-decompositions A ∼ N A + U A and B ∼ N B + U B . We say that A → B provided three conditions hold: We may call these the N-criterion, Q-criterion, and S-criterion, and summarize their meaning as follows: A → B provided we can transform A to B by: (a) rearranging the non-uniform factors; and (b) transforming the uniform factors in a way that conserves Q and does not decrease S. Now, let us examine each of the axioms in turn.

Axiom 1
The basic properties of eidostates follow by construction. Axiom 2 Part (a) holds because A ∼ B implies that A and B can have the same NU-decomposition. Part (b) holds because similarity, equality (for Q) and inequality (for S) are all transitive. Parts (c) and (d) make use of the general facts that N A+B ∼ N A + N B and U A+B ∼ U A + U B . Axiom 3 If N A ∼ N B , then A B. If N A ∼ N B , then it must be true that U B U A , and so S(U A ) > S(U B ). The S-criterion fails, so A B in this case as well. Axiom 4 For Part (a), we note that A must be uniform, and so A ⊆ A is also uniform. The statement follows from the S-criterion. Part (b) also follows from the S-criterion. Axiom 5 The atomic state r with Q(r) = 0 and S(r) = 0 is a record state, as is r + r, etc. We can take our bit state to be I b = {r, r + r}. Since every information state I ∈ I is uniform with Q(I) = 0, every information process (including Θ b = r, I b ) is possible. Axiom 6 Since all of the states of the form a + I are uniform, the statements in this axiom follow from the S-criterion. Axiom 7 Suppose nA → nB + J. Since J is uniform, it must be that nN A ∼ nN B , from which it follows that N A ∼ N B . The S-criterion for U A and U B follows from a typical stability argument-that is, if nx ≤ ny + z for arbitrarily large values of n, then it must be true x ≤ y. Axiom 8 The set M of mechanical states may be defined to include all states that can be constructed from the zero-entropy atomic state s 0 (s 0 + s 0 , s 0 + (s 0 + s 0 ), etc.). The required properties of M follow. Axiom 9 The uniform eidostate E has Q(E) = q ≥ 0 and S(E) = σ ≥ 0. Choose an integer n > σ. Now, let e = qs 0 (or e = r if q = 0), x = ns 0 , and y = ns λ where λ = σ/n. We find that Q(e) = q, Q(x) = Q(y) = n, S(e) = S(x) = 0 and S(y) = n(σ/n) = σ. It follows that x → y and E + x ↔ e + y.
This model based on a simple set of atomic states has several sophisticated characteristics, including a non-trivial component of content Q and possible processes involving non-uniform eidostates. It is not difficult to create models of this type that are even richer and more complex. However, it may be objected that this type of model obscures one of the key features of axiomatic information thermodynamics. Here, the entropy function S does not emerge from the → relation among eidostates, but instead is imposed by hand to define → within the model. We address this deficiency in our next model.

A Simple Quantum Model
Now, we present a model for the axioms in which the entropy function does emerge from the underlying structure. The model is a simple one without mechanical states or non-trivial components of content. Every eidostate is uniform and every process is possible. On the other hand, the model is based on quantum mechanics, and so is not devoid of features of interest.
Consider an agent A that can act upon an external qubit system Q having Hilbert space Q. Based on the information the agent possesses, it may assign the states |ψ 1 or |ψ 2 to the qubit. These state vectors need not be orthogonal. That is, it may be that no measurement of Q can perfectly distinguish which of the two states is actually present. The states, however, correspond to states of knowledge of agent A, and the agent is able to perform different operations on Q depending on whether it judges the qubit to be in one state or the other. Our notion of information possessed by the agent is thus similar to Zurek's concept of actionable information [22]. Roughly speaking, information is actionable if it can be used as a control variable for conditional unitary dynamics. This means that the two states of the agent's memory (|µ 1 and |µ 2 in a Hilbert space A) must be distinguishable. Hence, if we include the agent in our description of the entire system, the states |µ 1 ⊗ |ψ 1 and |µ 2 ⊗ |ψ 2 are orthogonal, even if |ψ 1 and |ψ 2 are not.
Our model for axiomatic information thermodynamics envisions a world consisting of an agent A and an unbounded number of external qubits. Nothing essential in our model would be altered if the external systems had dim Q = d -"qudits" instead of qubits. The thermodynamic states of the qubit systems are actually states of knowledge of the agent, and so we must include the corresponding state of the agent's memory in our physical description. The quantum state space for our model is of the form: To be a bit more rigorous, we restrict H to vectors of the form |Ψ ⊗ |0 ⊗ |0 ⊗ · · · , where |Ψ ∈ A ⊗ Q ⊗n for some finite n, and |0 is a designated "zero" ket in Q. Physical states in H have Ψ|Ψ = 1. (The space H is not quite a Hilbert space, since it is not topologically complete, but this mathematical nicety will not affect our discussion.) Since the Qs are qubits, dim Q = 2. The agent space A, however, must be infinite-dimensional, so that it contains a countably infinite set of orthogonal quantum states. These are to be identified as distinct records of the agent's memory. In our thermodynamic model, the elements of S (the thermodynamic "states") are projection operators on H. For any a ∈ S , we have a projection on H of the form where |a is an agent state in A and π a is a non-null projection in Q ⊗n for some finite n ≥ 1. The value of n is determined by a specified integer function L(a), which we call the length of the state a. Heuristically, the thermodynamic state a means that the state of the world lies in the subspace S a onto which Π a projects. The agent's memory is in the state |a and the the quantum state of the first L(a) external qubits lies somewhere in the subspace onto which π a projects. (All of the subsequent qubits are in the state |0 .) Two distinct thermodynamic states correspond to orthogonal states of the agent's memory. If a, b ∈ S with a = b, then a|b = 0. The projections Π a and Π b are orthogonal to each other (so that Π a Π b = 0). However, it need not be the case that π a and π b are orthogonal.
Given a, the projection Π a projects onto the subspace S a The dimension of this subspace is d a = dim S a = Tr Π a = Tr π a . Note that d a ≤ 2 L(a) . We will assume that there are S a subspaces of every finite dimension: For any integer n ≥ 1, there exists a ∈ S with d a = n.
Suppose a, b ∈ S correspond to Π a = |a a| ⊗ π a ⊗ · · · and Π b = |b b| ⊗ π b ⊗ · · · . Then, we will specify that the combined state a + b corresponds to Since the state a + b entails a distinct state of the agent's knowledge, the agent state vector |a + b is orthogonal to both |a and |b . We also note that L(a + b) = L(a) + L(b).
Here is a clarifying example. Suppose a, b, c ∈ S . Then, are distinct thermodynamic states and hence orthogonal projections in H, even though they correspond to exactly the same qubit states. The difference between (a + b) + c and a + (b + c) entirely lies in the distinct representations of the states in the agent's memory.
The eidostates in our model are the finite nonempty collections of states in S . We can associate each eidostate with a projection operator as well. Let E = {a, . . .} be an eidostate. We define This is a projection operator because the Π a projections are orthogonal to one another. Π E projects onto a subspace S E , which is the linear span of the collection of subspaces {S a , . . .}. Interestingly, this subspace S E might contain quantum states in which the agent A is entangled with one or more external qubits. Suppose a, b ∈ S are associated with single-qubit projections onto distinct states |ψ a and |ψ b . The eidostate E = {a, b} is associated with the projection which projects onto a subspace S E that contains the quantum state In this state, the agent does not have a definite memory record state. However, if a measurement is performed on the agent (perhaps by asking it a question), then the resulting memory record a or b would certainly be found to be consistent with the state of the qubit system, |ψ a or |ψ b .
Suppose we combine two eidostates A = {a, . . .} and B = {b, . . .}. Then, the quantum state lies in a subspace of dimension When eidostates combine, subspace dimension is multiplicative. It remains to define the → relation in our quantum model. We say that A → B if there exists a unitary time evolution operator U on H such that |Ψ ∈ S A implies that U |Ψ ∈ S B . That is, every quantum state consistent with A evolves to one consistent with B under the time evolution U. (Note that the evolution includes a suitable updating of the agent's own memory state.) This requirement is easily expressed as a subspace dimension criterion: We are now ready to verify our axioms.
Axiom 1 This follows from our construction of the eidostates E . Axiom 2 All of these basic properties of the → relation follow from the subspace dimension criterion. Axiom 3 If B A, then d A > d B , and so A B. Axiom 4 Again, both parts of this axiom follow from the subspace dimension criterion. If eidostate A is a disjoint union of eidostates A 1 and A 2 , then d A = d A 1 + d A 2 . Axiom 5 Any state r with d r = 1 functions as a record state. We have assumed that such a state exists.
We can take I b = {r, r + r}. The bit process Θ b = r, I b is natural (r → I b ) by the subspace dimension criterion. Notice that, for any information state I, d I = #(I). Axiom 6 For any b ∈ S and I ∈ I , we have d b+I = d b · #(I). For Part (a), we can always find a large enough information state so that d b ≤ d a · #(I). For Part (b), either d a ≤ d b+I or d b+I ≤ d a . Axiom 7 If (d A ) n ≤ (d B ) n · #(J) for arbitrarily large values of n, then d A ≤ d B . Axiom 8 It is consistent to take M = ∅. Axiom 9 All of our eidostates are uniform. For any eidostate E, we can choose e so that d e = d E .
(Recall that we have assumed states with every positive subspace dimension.) If we chose x = y to be any state, then E + x ↔ e + y.
Our quantum mechanical model is therefore a model of the axioms of information thermodynamics. It is a relatively simple model, of course, having no non-trivial conserved components of content and no mechanical states.
In the quantum model, the entropy of any eidostate is simply the logarithm of the dimension of the corresponding subspace: S(E) = log d E . This is the von Neumann entropy of a uniform density operator ρ E = 1 d E Π E . We can, in fact, recast our entire discussion in terms of these mixed states, and this approach does yield some insights. For example, we find that the density operator ρ E for eidostate E is a mixture of the density operators for its constituent states: where P(a|E) is the entropic probability There are, of course, many quantum states of the agent and its qubit world that do not lie within any eidostate subspace S E . For example, consider a state associated with a pure state projection Π a = |a a| ⊗ |ψ a ψ a | ⊗ · · · . Let |χ be a state orthogonal to |ψ a . Then, |a ⊗ |χ ⊗ · · · is a perfectly legitimate quantum state that is orthogonal to S a and every other eidostate subspace. This state represents a situation in which the agent's memory record indicates that the first L(a) external qubits are in state |ψ a , but the agent is wrong.
The exclusion of such physically possible but incongruous quantum states tells us something significant about our theory of axiomatic information thermodynamics. The set E does not necessarily include all possible physical situations; the arrow relations → between eidostates do not necessarily represent all possible time evolutions. Our axiomatic system is simply a theory of what transformations are possible among a collection of allowable states. In this, it is similar to ordinary classical thermodynamics, which is designed to consider processes that begin and end with states in internal thermodynamic equilibrium.

Remarks
The emergence of the entropy S, a state function that determines the irreversibility of processes, is a key benchmark for any axiomatic system of thermodynamics. Our axiomatic system does not yield a unique entropy on S , since it is based on the extension of an irreversibility function to impossible processes. However, many of our results and formulas for entropy are uniquely determined by our axioms. The entropy measure for information states, the Hartley-Shannon entropy log #(I), is unique up to the choice of logarithm base. This in turn uniquely determines the irreversibility function on possible singleton processes, since this is defined in terms of the creation and erasure of bit states. There is a unique relationship between the entropy of a uniform eidostate and the entropy of the possible states it contains. Finally, the entropic probability distribution on a uniform eidostate, which might appear at first to depend on the singleton state entropy, is nonetheless unique. It remains to be seen how the axiomatic system developed here for state transformations is related to the axiomatic system, similar in some respects, given by Knuth and Skilling for considering problems of inference [23]. There, symmetry axioms in a lattice of states give rise to probability and entropy measures. In a similar way, the "entropy first, probability after" idea presented here is reminiscent of Caticha's "entropic inference" [24], in which the probabilistic Bayes rule is derived (along with the maximum entropy method) from a single relative entropy functional.
The entropy in information thermodynamics has two aspects: a measure of the information in a pure information state and an irreversibility measure for a singleton eidostate (the kind most closely analogous to a conventional thermodynamic state). The most general expression for the entropy of a uniform eidostate in Theorem 12 exhibits this twofold character. However, the two aspects of entropy are not really distinct in our axiomatic system. Both are based on the structure of the → relation among eidostates, which tells how one eidostate may be transformed into another by processes that may include demons.
The connection between information and thermodynamic entropy is nowhere more clearly stated than in Landauer's principle, the minimum thermodynamic cost of information erasure. It is easy to state a theorem of our system corresponding to Landauer's principle. Suppose a, b ∈ S and I b is a bit state. If a + I b → b, then it follows that S(b) ≥ S(a) + 1. Erasing a bit state is necessarily accompanied by an increase of at least one unit in the thermodynamic entropy. More generally, suppose A and B are uniform eidostates with A → B. If we use Theorem 12 to write the entropies of the two states as then it follows that ∆ S ≥ −∆H for the process taking A to B. In other words, any decrease of the "Shannon information" part of the eidostate entropy (−∆H) must be accompanied by an increase, on average, in the thermodynamic entropy of the possible states (∆ S ).
These theorems do not really constitute a "proof" of Landauer's principle, because they depend upon the physical applicability of our axioms. On the other hand, our axiomatic framework does obviate some of the objections that have been raised to existing derivations of Landauer's principle. Norton [25], for example, has argued that many "proofs" of the principle improperly mix together two different kinds of probability distribution, the microstate distributions associated with thermodynamic equilibrium states and the probability assignments to memory records that represent information. He calls this the problem of "illicit ensembles". Our approach, by contrast, is not based on concepts of probability, except for those that emerge naturally from the structure of the → relation. We do not resolve thermodynamic states into their microstates at all, and we represent information by a simple enumeration of possible memory records.
The Second Law of Thermodynamics, the law of non-decrease of entropy, is the canonical example of the "arrow of time" in physics. Time asymmetry in our axiomatic theory is found in the direction of the → relation. It is an enlightening exercise to consider the axioms of information thermodynamics with the arrows reversed. Some axioms (e.g., Axioms 1 and 2) are actually unchanged by this. Others may be modified but still remain true statements in the theory. A few become false. The most striking example of the last is Axiom 3, which states that no eidostate may be transformed into a proper subset of itself. That is, no process can deterministically delete one of a list of possible states. This is the principle that leads to irreversbility in information processes, and through them, to more general irreversibility and entropy measures.