Representing Measurement as a Thermodynamic Symmetry Breaking

Descriptions of measurement typically neglect the observations required to identify the apparatus employed to either prepare or register the final state of the “system of interest.” Here, we employ category-theoretic methods, particularly the theory of classifiers, to characterize the full interaction between observer and world in terms of information and resource flows. Allocating a subset of the received bits to system identification imposes two separability constraints and hence breaks two symmetries: first, between observational outcomes held constant and those allowed to vary; and, second, between observational outcomes regarded as “informative” and those relegated to purely thermodynamic functions of free-energy acquisition and waste heat dissipation. We show that breaking these symmetries induces decoherence, contextuality, and measurement-associated disturbance of the system of interest.


Introduction
Measurements of macroscopic systems-indeed, any observations made in any setting, however informal-pose a problem in quantum theory because they appear to violate the theory's fundamental symmetry: unitarity or conservation of information (see [1,2] for comprehensive reviews). Irreversibly recordable observational outcomes are, in particular, classical by definition, as emphasized by Bohr [3] and many others. Obtaining such classical outcomes, however, requires physically interacting with the world. What is the relationship between the classical outcomes that observers obtain and the physical interactions via which they obtain them? To investigate this question, we turn to a relatively-neglected aspect of measurement: the identification, by the observer, of the physical system being observed. We are interested, in particular, in how an observer identifies macroscopic systems such as laboratory apparatus.
The ability of observers to identify macroscopic systems is generally taken for granted even when discussing systems with explicitly quantum properties (see [4] for a recent review). Schrödinger, for example, had no problem identifying the steel chamber containing his cat, although it contains, and may well itself be a component of, a quantum system in an entangled state [5]. More tellingly, Wigner had no trouble identifying and conversing with his friend, although his friend is by assumption a component of a quantum system in an entangled state, the other component of which may be as large as an entire laboratory [6]. When performing a Bell/EPR experiment, Alice and Bob similarly have no trouble identifying and reading their respective apparatus pointers, although these pointers are components of a spacelike-extended quantum system in an entangled where H OW is the O − W interaction, and can choose bases for O and W such that, with respect to a time parameter t characterizing U: where k = O or W, i = 1 . . . N for finite N, the α k i (t) are real functions with codomains [0, 1] such that: for every finite ∆t, k B is Boltzmann's constant, T k is k's temperature, β k ≥ ln 2 is an inverse measure of k's average per-bit thermodynamic efficiency that depends on the internal dynamics H k , and the M k i are Hermitian operators with binary eigenvalues representing "questions to Nature" with yes-no answers [21]. Here, O and W are each regarded as "observing" the other; the operators M O i act on W, while the operators M W i act on O. To assure that O and W are "interesting" from the point of view system identification, we assume that the sets {M k i } of operators are large enough that, for any M k i , there is at least one M k j such that [M k i , M k j ] = 0 and at least one M k l such that [M k i , M k l ] = 0. The latter requirement enables violation of the Leggett-Garg inequality [22] or, equivalently, Kochen-Specker contextuality [23]. Hence, H OW is manifestly a quantum interaction. However, as noted above, Equations (2) and (3) make no reference to either H O or H W , and are indeed independent of any assumptions about the purity or separability of the states |O or |W , the decomposition of either O or W into subsystems, or the interactions, if any, between such subsystems.
The idea that nature "answers" an observer's questions is classical, and implies an irreversible state change [24]: each question from O that W "answers" transfers one bit from W to O and is paid for by the transfer of β O k B T O from O to W. This situation is completely symmetric: each question from W that O "answers" transfers one bit from O to W and is paid for by the transfer of β W k B T W from W to O. Given Equation (3), the action required for k to transfer N bits in time ∆t is: The representation of observation given by Equations (2)-(4) simply describes an exchange of energy for information. How then is identifying the system of interest distinguished, physically, from measuring its state? The answer, we suggest, lies in two asymmetries: the asymmetry between bits representing the "system of interest" and bits representing the "apparatus" and the asymmetry between bits that are processed as "information about W" and bits that are processed as fuel, i.e., as free energy to drive the processing of the bits considered informative, or as waste heat. These asymmetries are imposed not by H OW but rather by the structure of H O (or, if viewing W as an observer, by H W ); hence, they are features of the information-processing architecture of O (or W). In what follows, we make this suggestion precise as follows: • We rigorously define a "system" contained within W, relative to O's observational capabilities, by employing the natural equivalence between binary-valued observables and binary classifiers as defined [25] and the category-theoretic construction of a cocone (for review, see [26]). • We formulate the distinction between system identification and pointer-state measurement as a collection of equivalence relations on cocones, and showing that: (1) transitions between cocone equivalence classes can be represented more generally as groupoid operations; and (2) these groupoid operations correspond to entanglement swaps that result in O-relative decoherence [27]. • We show that such entanglement swaps can also be viewed as a context switches as defined [28][29][30] and hence that Born-rule probability distributions over measurement outcomes are generically context-dependent, i.e., generically display Kochen-Specker contextuality. • We show that free-energy acquisition and waste-heat dissipation into the "environment" component of W can generically have non-negligible effects on observational outcomes due to entanglement swapping/contextuality.
We begin in Section 2 by providing a fully sequential model of interaction as (mutual) measurement, and then coarse-graining time to generalize this sequential model. We then show that, for finite U, the assumption of separability is equivalent to the assumption that O and W communicate by classical communication, i.e., exchange of fungible [31] bit strings, via an ancillary, noise-free classical channel. We provide a formal description of system identification in Section 3, showing that any "system of interest" must have distinct reference and pointer components, that these components can be represented as cocones over distinct sets of observables related by a predictability sieve [9], and that distinguishing these components induces both component-scale decoherence and system-scale contextuality. We then show in Section 4 that, if system identification is held fixed, system state is generically vulnerable to disturbance as free energy is extracted from and heat is dissipated into the environment. We conclude in Section 5 by briefly discussing the implications of these results for the architectures of "information gathering and using systems" (IGUSs, i.e., observers [32]) and the relationship between decoherence and temporal coarse-graining.

Sequential Measurements
The simplest physical interactions, and hence the simplest measurements, are sequential: O selects and deploys one operator M O i during each interval ∆t, receiving in consequence one bit of information from and transferring β O k B T O of heat to W. The interaction has the same description from W's perspective, only replacing superscripts O with W.
Following [17], sequential measurements can be represented by choosing the functions α k i (t) to be rectangular functions, but replacing the typical fixed duty cycle n with variable duty cycles n k i (t) "chosen" by k, i.e., determined by the unspecified internal dynamics H k : where the offset s, 0 ≤ s ≤ n(0) − 1 determines when M k i is first deployed, m is the total (finite but unlimited) number of times M k i is deployed, and: Each Π (i,m) (t) is a sequence, starting at t = s, of m unit-height rectangular pulses with width ∆t, with separation given by n k i (t)∆t. In this case, we have: with action still given by Equation (4). The quantization of both action and information guarantees that time can always be fine-grained sufficiently to render measurements sequential. It is convenient in the followings, however, to coarse-grain time and view some measurements as carried out simultaneously (or "in parallel"). Clearly, this is only possible for subsets of mutually-commuting measurement operators. In what follows, we abuse the notation by employing M P k to indicate either the kth individual operator as above or the kth subset of mutually-commuting operators where the ambiguity presents no problems, and the notation {M P } k for the kth subset of mutually-commuting operators where more explicitness is called for. We reserve ∆t to indicate the interval required to deploy one operator and obtain one bit as above and use τ to indicate an integer multiple of this interval during which a mutually-commuting subset of operators is deployed and multiple bits are obtained.

Mutual Measurement Is Classical Communication
We can, as is now standard, think of O and W as communicating agents, i.e., we can think of H OW as specifying a communication channel. (2), the specified communication channel is classical, ancillary, and free from classical noise.

Theorem 1. With H OW given by Equation
Proof of Theorem 1. No generality is lost by regarding the interaction H OW as specifying a strictly sequential deployment of the M k i as in Equation (6). In this case, the action in Equation (4) transfers one bit, corresponding to one of the two eigenvalues of the deployed operator M O i , from W to O during each interval ∆t, and one bit, corresponding to one of the two eigenvalues of the deployed operator M W j , from O to W during this same ∆t. This sequential exchange of bits constitutes classical communication. There are no physical degrees of freedom other than those of O and W, thus the communication channel is ancillary and free from classical noise.
While the exchange of bits between O and W is free of classical noise, the O − W channel flips each bit, independently, with finite probability unless O and W are assumed to share a quantum reference frame a priori [33], i.e., only if O and W share a basis. We assume such a shared basis for simplicity. As an explicit example, suppose O and W alternate preparation and measurement of an array of non-interacting qubits, as shown in Figure 1. Assuming a shared reference frame for s z preparations and measurements, the encoded bit values are preserved in either direction. It is important to emphasize that while by Theorem 1 H OW can be considered to define a classical communication channel, H OW is, as noted above, a manifestly quantum interaction that can violate the Leggett-Garg inequality and display Kochen-Specker contextuality, as discussed in detail in Section 3.5 below. It is also important to emphasize that, while the bits received by O provide a representation, for O, of the state |W , they do not, by themselves, provide any information to O about the decompositional structure of W, if any, and do not specify the internal Hamiltonian H W . This is a simple consequence of linearity: we are free to choose any decomposition W = SE and write H W = H S ⊗ H E and H W = H S + H E + H SE without affecting the interaction H OW and hence O's observational outcomes in any way. We need, therefore, make no assumptions about the "ontic" state of W as noted above; whether this state is separable or pure makes no difference to O's observational outcomes. What is of interest in what follows is solely the "epistemic" state of W for O; the notation "|W " is henceforward used strictly to denote this "epistemic" state. As above, the same considerations apply to the representation of |O by W.

Systems Require Reference Components with Invariant States
Identifying a system S in W requires distinguishing it from the rest of W, which we call the "environment" E of S. In order for this S to have a measurable state |S , |W must be separable as |W = |SE = |S |E . Here, the recognition that |W , and hence also |S and |E , are "epistemic" states, i.e., states for O, is consistent with the general observer-relativity of separability [34][35][36][37].
The system S having a measurable state is, however, not yet sufficient for S to be identifiable by O. In addition, S must have some component, which we call the "reference" R, that has a time invariant, and hence reliably recognizable state. We must, therefore, be able to write S = RP, where P is the usual "pointer" component of S, and require (O-relative) separability |S = |RP = |R |P , where |R is a time-invariant "reference state" used for system identification and |P is the usual time-varying "pointer state" that is of interest to O. This decomposition into R and P is illustrated in Figure 2. Any identifiable macroscopic object, e.g., any item of laboratory apparatus, clearly must support such a decomposition.
where all unions are disjoint. We will also write H OW = H OR + H OP + H OE , where H OR implements system identification, H OP implements pointer-state measurement, and H OE implements free-energy acquisition and waste-heat dissipation. Each of these interactions can, by Theorem 1, be regarded as classical bit exchange. We write numbers of exchanged bits as N = N R + N P + N E and inverse thermodynamic efficiencies as As shown in Section 4 below, it is inequalities among the β R , β P , and β E that break the thermodynamic symmetry of H OW . Without such symmetry breaking, R and P cannot be distinguished from E, thus neither system identification nor pointer measurement can occur.
For S to remain identifiable over multiple rounds of measurement, the M R j must satisfy the commutation and predictability sieve [9] requirements: In general, neither the M P k nor the M E l will all commute. Failure of commutativity among the M P k is exemplified by apparatus calibration procedures [17], failure of commutativity among the M E l by such events as laboratory power failures.

Reference Components Can Be Represented as Cocones over One-Bit Classifiers
As emphasized above, system identification must be implemented by the internal process executed by the observer, i.e., by the Hamiltonian H O . Indeed, it can only be H O that assigns different functions to the inputs provided by H OW , and hence decomposes H OW into component system identification, measurement, and energy-management interactions. While it is now commonplace to regard observers as Bayesian agents [38,39] and a handful of architectural models of observers have been proposed [32,[40][41][42], the question of generic requirements on H O to implement "observation" in any meaningful sense has been largely neglected. We approach this question here using tools and methods from the theory of formal languages and category theory. These allow us to specify a minimal virtual machine [43] that must be implemented by H O in any O capable of both identifying and measuring the pointer states of an external system. This approach is independent of the physical implementation of H O , up to requiring that O has sufficient degrees of freedom. We can, without loss of generality, view O as implemented by a suitable circuit of quantum gates [44]. Barwise and Seligman [25] introduced the idea of a "classifier" as implementing the relation between "tokens" in some language and the "types" to which they belong. We first characterize this notion formally, and then show that the one-bit measurement operators M O i can be identified with one-bit classifiers. Mathematical operations on classifiers become, in this case, formal specifications of computations on measurement outcomes implemented by H O .
is a set of "types", and |= A is a "classification" relation between tokens and types.
A classifier for voltmeters, for example, would assign all examples (i.e., tokens) of voltmeters to the general class (i.e., type) "voltmeters"; similarly a classifier for observations (tokens) of some particular voltmeter V would assign all such observations, but no observations of other systems, including other voltmeters, to the class (type) "observations of V" (see [26] for extensive review with examples, and [45] for a specific application to system identification). The simplest classifiers implement one-bit, yes-no classification decisions, i.e., they group entities (tokens) having some property P into classes (types) "entities with property P"; for example, a one-bit classifier for black objects would group all such objects, excluding all objects that were not black.
A natural map between classifiers is the "infomorphism" [25] defined as follows: Intuitively, an infomorphism "transmits the information" from one classifier to another, so that, e.g., "b is type B" can encode or represent the information "a is type A". "Information" here is not simply a quantity of bits ("Shannon information"), but is rather the set of logical constraints imposed by Definition 2; hence, it is "pragmatic information" as defined [46]. This idea of transmitting information motivates the definition of an "information channel" [25], itself a classifier, expressed as a collection {g i : A i → C} of infomorphisms with a common codomain, or "core" C. Here, the classifier C encodes or represents the information (i.e., the logical constraints) encoded jointly by the A i ; alternatively, C can be thought of as a shared memory jointly accessed by the A i . The sense which channels encode sets of mutual constraints holding between classifiers is further elaborated in [25,26] where the notion of a classifier is extended to that of a "local logic" by specifying a subset (possibly a singlet) of tokens satisfying all of the types, and the notion of an infomorphism is extended to a "logic infomorphism" that preserves this additional structure. It is natural to think of a local logic as "identifying" the token(s) that satisfy all of its types, logic infomorphisms transferring token-identification information between local logics, and channels comprising sets of logic infomorphisms as encoding mutual constraints that assemble multiple identified tokens-which can naturally be thought of as "parts" [45]-into a larger identified system.
Given suitable commutativity conditions, a collection of channels {g ij : A ij → C j } admits a colimit, a single channel C that collects all of the classification information encoded by the A ij . The conditions are: • The A ij must be representable as a finite nonredundant set {A k } with infomorphisms f ij : A i → A j . • There exist infomorphisms h i : C i → C and h ij : C i → C j .
• All compositions of infomorphisms with codomain C commute.
These can be summarized by requiring that all diagrams of the form shown in Figure 3 commute. As the colimit in this case is a cocone over the A k , we refer to such diagrams as "cocone diagrams" (CCDs).

Figure 3.
A cocone diagram (CCD) is a commuting diagram depicting maps (infomorphisms) f ij between classifiers A i and A j , maps g k l from the A k to one or more channels C l over a subset of the A i , and maps h l from channels C l to the colimit C (cf. Equation (6.7) of [26]).
Let us now consider a family of N R one-bit classifiers where W(e) if and only if e is an event in W and |= i is the ith of N R distinct, but not necessarily mutually-exclusive, classification criteria. An infomorphism f ij encodes a correlation between the criteria |= i and |= j ; if e meets the criterion |= i , i.e., e |= i 1, then f ij e |= j 1; otherwise, f ij e |= j 0. Infomorphisms are by definition invertible, i.e., f ij f ji = Identity; hence, this correlation is bidirectional. Assuming now that a cocone C above the A i exists, this C encodes the information that some event e * in W simultaneously satisfies all N R of the criteria |= i . It is thus natural to view this C as identifying events of some type [e * ] that satisfy these criteria. Identifying [e * ] as "events in which R is present in state |R " this is precisely what the set of mutually-commuting (by Equation (7)) operators {M R j } have been defined as doing. We therefore identify the operators M R j with classifiers A R i , and consider the cocone C R as the definition, for O, of the reference R. As discussed in Section 2.1, we can regard the M R j as deployed simultaneously if we coarse-grain time. This identification of the M R j as the base of a CCD gives an explicit meaning both to the label R and to the idea that the M R j together identify R. As identifying R requires that R be in the fixed state |R , we can also consider the CCD to give an explicit meaning to the idea of identifying R in state |R . This identification process is, as noted above, a computation implemented by H O .

Measuring the Pointer State |P of an Identified System S
By identifying R, the observer O by assumption identifies the system S, including its pointer component P. We now turn to the question of measuring the pointer state |P , the state "of interest" of S. We again employ the approach of formally specifying a computational process to be implemented by H O .
Considering the set of measurement operators {M P k }, it is clear that it can be partitioned into a subsets, each of which contains only operators that mutually commute; in the limit, each of these subsets is a singlet. Moreover, Equation (7) guarantees that any of the M P k must commute with all of the M R j . Hence, we can consider, without loss of generality, measurements made by deploying the M R j to identify S, and one of the M P k to determine its (perhaps partial) pointer state. As above, the generalization from a single M P k to a kth subset {M P } k of mutually-commuting operators only requires coarse-graining time.
As the selected M P k satisfies [M P k , M R j ] = 0, ∀j, the methods of the previous section can be used to construct a CCD over the M R j together with M P k . We can view the colimit C R k in this CCD as encoding the information that P is in state |P k . In total, we can construct N P such CCDs. Call the kth such CCD "CCD k ". We can now ask: What is the relation between the CCD k , given that the M P k generically do not (all) commute? To address this question, we introduce a discrete, coarse-grained time parameter τ ≥ (N R + 1)∆t during which the M R j and one or more mutually-commuting M P k are executed. We then define an operation G ij : CCD i → CCD j that transitions between CCD i and CCD j in one unit of τ. Physically, this operation G ij can be viewed as a (formal specification of a) discrete sample of the action of the propagator P (t) appearing in Equation (4).
As O can choose to make pointer-state measurements in any order, the operation G ij exists for any i and j. In particular, G ji exists whenever G ij exists. Sequential transitions between CCDs can be represented as compositions, e.g., G ij • G jk represents the sequential transition CCD i → CCD j → CCD k , i.e., a sequence of measurements deploying ∑ j M R j + M P i , then ∑ j M R j + M P j , then ∑ j M R j + M P k , where again we allow M P i to also stand for {M P } i for simplicity. We explicitly assume that • is independent of τ.
Hence, the elements {G ij } with associative composition • define a groupoid [47,48], not a group. Elements of this groupoid, which we denote G, are labeled by the parameter τ, which distinguishes groupoid actions at distinct times. A sequence of such actions is illustrated in Figure 4. . A sequence of CCDs identifying R (blue triangles) and measuring pointer components P i , P j , P k . . . P l . Transitions between CCDs are implemented by groupoid elements, e.g., G ij and labeled by discrete times, e.g., τ i . The operators M P k can equally well be generalized to subsets {M P } k of mutually-commuting pointer-state observables.
Clearly a groupoid-indeed, in particular, a permutation group-can also be defined within each CCD k . The elements of these groupoids G k are operators G k lm that exchange the lth and mth elements of the subset {M R j } ∪ {M P } k of mutually-commuting measurement operators. These G k are exchange symmetry groupoids over the elements of their respective subsets {M R j } ∪ {M P } k . Actions by elements of G break this exchange symmetry, resulting in decoherence as outlined below.

Sequential Measurements Induce Decoherence
Let us denote the unmeasured components of P at time τ i as P i , so P = P i ⊗ P i at τ i . Again P i can also indicate those components of P measured by an ith subset of mutually-commuting observables.
With this notation, at τ i , W comprises a measured system R ⊗ P i and an unmeasured system P i ⊗ E, i.e., we have a decomposition W = (R ⊗ P i ) ⊗ (P i ⊗ E). The state |R ⊗ P i is measurable-indeed, it is the pointer state of interest-so |W must be separable as |W = |R ⊗ P i |P i ⊗ E .
Sequential measurements at τ i and τ j swap entanglement of components of P between R and E, i.e., the groupoid elements G ij implement entanglement swaps: Such entanglement swaps implement decoherence [27]. Hence, we can view the groupoid elements G ij as decoherence operators. These operators are implemented by H O , i.e., the observer O decoheres each measured state |R ⊗ P i by selecting it for measurement and coupling it to its own specific decohering environment P i ⊗ E. The functional asymmetry between the operator subsets M R i , M P j , and M E k is evident in this representation, as is the asymmetry between the "selected" pointer operators M P i and the "unselected" operators M P j . The "general" environment E serves as a resource for decoherence that is accessed by O at each measurement.

Entanglement Swapping Induces Contextuality
The Kochen-Specker theorem [23] shows that quantum theory generically admits contextuality, i.e., the outcome probability distribution obtained for some (subset of) observable(s) can depend on what other observables are simultaneously measured, even when all simultaneously-measured observables mutually commute. Dzhafarov and colleagues have proposed an extension of classical probability theory, termed "contextuality by default" (CbD) in which any measurement system is prescribed in terms of "bunches" of random variables coupled through degrees of connectedness leading to a context label that distinguishes between "true contextuality" on the one hand, and "non-contextual description" on the other [28,29]. The latter typifies the habitual imperfection of empirical systems subject to direct influences, for example, classical signaling between collections of degrees of freedom. "True" contexts distinguish sets of measurements with non-direct influences that can be unpredictable, or indeed empirically unfathomable. Further, it has been shown both that this extension is sufficient to capture Kochen-Specker contextuality [30] and that such contextuality can be observed in human decision making [49] (cf. [50] for an analysis of experiments that only appear to demonstrate contextuality). Effectively, all measured probabilities are expressed as conditioned on some context label c, i.e., Prob(x) becomes Prob(x|c) for any event x measured in c. The context labels in CbD can be viewed simply as a bookkeeping device; there is no formal requirement that the contexts are fully characterized by the available observations. The unmeasured systems P i ⊗ E provide natural context labels for comparing outcome probability distributions over the measured systems R ⊗ P i . As the outcomes specifying |R must remain fixed to permit system identification, the outcome probability distributions of interest are of those of the |P i . If each of the P i is a one-dimensional component of P, then no observables are measured in multiple contexts and no contextuality is observable. If, however, the P i are multidimensional components of P, each measured using a subset of mutually-commuting observables {M P } i , contextuality can be expected by default whenever {M P } i ∩ {M P } j = ∅ for some subsets i and j. The mechanism inducing contextuality is the entanglement swap implemented by the G ij in Equation (9). The special case of noncontextuality only occurs if (in the sense of approximation) |P i ⊗ E ≈ |P j ⊗ E , i.e., only if: This can occur if and only if [M P i , M P j ] ≈ 0 for all M P i ∈ {M P } i and M P j ∈ {M P } j [30]. It can occur, in other words, only if G ij does not (significantly) break the exchange symmetry within each subset of mutually-commuting operators.

CCD Commutativity Enforces Bayesian Coherence
As mentioned above, it is now commonplace to consider observers to be Bayesian agents. A Bayesian agent is a system that implements Bayes' theorem for conditional probabilities, Prob(a|b) = Prob(b|a)(Prob(a)/Prob(b)), where a and b are any two events or conditions and Prob(b) = 0. In the CbD framework, Bayes' theorem only applies within a context. If probabilities are computed using complex amplitudes for states and the Born rule, contextuality is taken into account via phase interference and Bayes' theorem can be applied across contexts [38].
While Bayes' theorem follows from the usual Kolmogorov axioms, following de Finetti [51], it is generally motivated by the rationality of avoiding "Dutch book" probability assignments that violate the Kolmogorov axioms. Assigning probabilities, including conditional probabilities, only in ways that satisfy the Kolmogorov axioms achieves "Bayesian coherence." Here, we show that if the CCD formalism is extended to include probability labels on observational outcomes, the requirement of commutativity enforces Bayesian coherence and hence compliance with the Kolmogorov axioms.
To motivate this, let us represent a single binary measurement somewhat redundantly as implemented by two classifiers, A (1) and A (0) , that test for outcomes "1" and "0", respectively. The probabilities of these outcomes are Prob(1|R) and Prob(0|R), respectively; we can treat these probabilities as labels as shown in Figure 5. Here, the horizontal arrow is an infomorphism encoding the classical correlation between A (1) and A (0) . Commutativity requires that all paths through the diagram yield the same result; it is natural to extend this requirement to the probability labels by requiring: 1. that only the shortest paths between objects in a diagram (such as a CCD) are labeled, and that the probability of a such a path is the product of the probabilities of its component arrows; and 2. that the probabilities of all paths sum to unity. Additivity is thus ensured by commutativity, and conjunction subsumed by conditionals when defined as in Definition 3 below.
Assigning a probability 1/n to each horizontal arrow (including right-to-left inverse arrows, even if implicit) between pointer-component classifiers in a diagram with n such classifiers, we have in the case of Figure 5 that Prob(1|R) = 1 − Prob(0|R). Collectively then, the diagram exhibits Bayesian coherence; in particular, the posterior of a prior at one stage can be regarded as the prior for the next [45]. This procedure clearly generalizes to any CCD with n binary pointer-state classifiers (i.e., binary-valued measurement operators) and hence n 2 distinct sets of binary-valued observational outcomes. As the state |R must remain fixed, probabilities are not assigned within the R-identifying cocones of such CCDs. The idea of "assigning" probabilities to arrows in CCDs over classifiers can be made precise by interpreting classifiers themselves as probabilistic. Following [52] (cf. [53]), we define: The sequent relation can be weakened by requiring only that if x |= A M, there is some probability Prob(N|M) that x |= A N. This is essentially how a conditional probability interprets the logical implication "⇒" [54] (see [26] for details). If sequents conditioned on R are defined within the colimit classifier R ⊗ P k of a CCD k over a set of pointer-state operators {M P } k and then relaxed to probabilities, these probabilities can be migrated downward as labels on the infomorphisms from the elements of {M P } k to R ⊗ P k . As these probabilities are defined within CCD k , they are CbD compliant with context label k.

Information Processing Demands Are Asymmetrical between R, P and E
The formal treatment above admits, via the explicitly thermodynamic representation of H OW in Equation (2), a straightforward physical interpretation in terms of information and resource flows. The environment E is represented in the above as a passive resource for decoherence. The thermodynamic role of E, however, is that of free energy source and waste heat sink. The conversion of free energy to waste heat ("metabolism") funds the thermodynamically-irreversible state changes of O that are interpretable as "recording" observational outcomes-the "informative" information about W that O is using H OW to discover-on some "memory" implemented by H O . By including the interaction with E as an explicit component of the "measurement" interaction between O and W, Equation (2) avoids the usual "open system" assumption that allows the effects of free-energy extraction and waste-heat dissipation on the measurement process to be neglected. It thus makes the asymmetry between the effects on O of "informative" information flows and "mere" resource flows explicit. In a simple system with a state trajectory describable as a Markov process, the memory on which observational outcomes are recorded is not persistent for more than one time step ∆t. However, for "interesting" observers capable of performing multiple observations of the same identified system, memories must persist for multiple time steps. The identification criteria for R, in particular, must persist for multiple time steps in any observer capable of re-identifying R across multiple cycles of measurement. Persistence raises the question of implementation. In a classical system, a memory is persistent only if it has as time-invariant classical record. In a quantum system, however, the classical record of a memory can be erased or modified (i.e., erased and replaced), but it may leave an implicit record of induced phase coherence relations [55]. Such implicit records enable non-Markovian behavior, e.g., Leggett-Garg violations. In the present context, we consider memory of previously-obtained outcomes to be a function of H O , with one effect of memory being the choice of what measurements to make next, e.g., the choice of the functions n k i (t) in Equation (5). Maintaining memories against thermal noise (classical systems) or decoherence (quantum systems) requires the expenditure of free energy. The effective thermodynamic efficiencies N R β R and N P β P of processing measurements of R and P, respectively, must, therefore, incorporate the free-energy demands of maintaining the functional integrity of the virtual-machine architecture described in the last section, however it is implemented by H O . Hence, we can generically expect β R , β P 1, with magnitudes scaling roughly both with N R and N P , respectively, and with the number of elements of G, i.e., with the complexity of commutativity constraints between the M P k . Because the time-varying outcomes generated by the M P k can be expected to have a larger influence on what is measured next than the time-invariant outcomes generated by the M R j , we can also generically expect β P > β R . What is "of interest" about an external system S naturally requires more free energy to process, remember, and act upon than what is not [46].
The acquisition of free energy from E must, on the other hand, be relatively efficient if the free energy obtained is to fund information processing (i.e., internal "work") as well as the "metabolic" processes that convert it to waste heat, i.e., actions back on E or external "work". Hence, we must generically have N E β E < N R β R , N P β P . If we make the reasonable assumption that information processing is somehow compartmentalized in O, e.g., to provide isolation from noise and/or decoherence, a large uniform heat bath will be less efficient as a power source than a smaller, higher-temperature bath local to the processing compartment. We can, therefore, expect that typical O will allocate a component of H OE to high-efficiency free-energy acquisition, i.e., that typically where superscripts h and l indicate high and low efficiency, respectively. Both organisms and apparatus, including all practical computers, employ this compartmentalization strategy ubiquitously.

Thermodynamic Interactions with E Generically Disturb |P
It is standard to assume that the environment E is "large" even though a large environment is not strictly needed for decoherence [2,10]. One motivation for a large environment is to assure that any classical disturbances to E during the course of observation are well away from the system S of interest and hence negligible.
The assumption of negligible disturbance breaks down, however, when both the thermodynamics of measurement and the mechanism of observer-relative decoherence are taken into account. As shown above, the free energy required per bit to process outcomes obtained by the M R j and M P k is large compared to k B T O and hence for typical, near-isothermal measurement interactions, large compared to k B T E , at least in the near vicinity of O. The action in Equation (4) of O on W, and hence on E, is therefore large compared toh. Disturbances to E at the scale ofh are insignificant classically, but can be significant to entanglement swaps involving E as in Equation (9). As the swap in Equation (9) is executed every time a new subset of mutually-commuting M P k is deployed, these disturbances are not incidental to, but rather are direct consequences of the measurement process. Hence, the action in Equation (4) cannot be considered negligible, but rather must be assumed generically to disturb |P . As shown in [17], the scale of this disturbance increases as the measurement resolution and hence the input bandwidth N P increases.

Discussion
We argue here that the physical interactions that implement observations appear to lose information for a simple and somewhat pedestrian reason: H OW appears not to conserve information because only a (typically small) fraction of the information transferred is considered "informative" by the observer. The rest is, of necessity, employed as free energy to fund the processing and memory storage of the "informative" fraction. In typical discussions of measurement, even the information required to identify R is ignored; only the pointer state is regarded as "of interest" and hence informative, while the rest of the transferred information is relegated to noise or decoherence (e.g., [56], where this is made fully explicit).
When the measurement interaction H OW is considered fully explicitly, two distinct symmetries are broken. Not only are input bits to be processed as information distinguished from input bits to be processed as free-energy supply, but input bits indicating the time-varying pointer state |P of S must also be distinguished from input bits identifying the reference component of S, the state |R of which must remain time-invariant. These distinctions in how input bits are processed are reflected by thermodynamic asymmetries, i.e., by the requirement that N E β E < N R β R , N P β P and the generic expectation that β P > β R .
The fundamental asymmetry between R and P is reflected in the architecture of the minimal virtual machine that must be implemented by any observer capable of identifying a system and making a sequence of pointer-state measurements. Simultaneous (in coarse-grained time) measurements of mutually-commuting subsets of observables, including in every instance the M R j , are processed by a CCD. Switching between mutually-commuting subsets i and j of observables is implemented by a groupoid operator G ij . These operators execute entanglement swaps or, equivalently, context swaps. In either picture, they induce decoherence and generically disturb |P .
We do not explicitly consider the operations required to prepare a system S for measurement. We note, however, that preparing S requires identifying S and hence identifying R, and that "measurement settings" are components of P, as illustrated in Figure 2. Preparation requires, in this case, the same operations as measurement; indeed, the two can be considered duals [57]. The dual of a cocone is a cone; combining the two with a single network of classifiers yields a cone-cocone diagram (CCCD) depicting information flow into, through, and then out of the classifier network [26]; it is natural to interpret the limit (of the cone) in a CCCD as a complete prepared state (including R) and the colimit (of the cocone) as a complete measured state, as in Section 3.2. Mutually non-commuting subsets of preparation procedures induce groupoid operations on the cone analogous to the G ij .
We emphasize that these results depend only on standard quantum theory, with no assumptions about the structures or properties of O and W beyond the Hilbert-space representation and the assumptions of separability and finiteness specified in the Introduction. They thus apply to any physical system, and depend quantitatively only on the number of degrees of freedom interpretable as subserving memory functions in some form.
We expect the groupoid concept to arise in a similar formalism to the one presented here when equivalence relations are used to classify various types of entanglement [58]. In addition, it has not escaped our notice that CCCDs bear certain structural similarities to another representation of information flow from preparation to measurement, the "amplituhedron" of [59]. This and a deeper understanding of the physical meaning of G await further work.
Author Contributions: Conceptualization, C.F. and J.F.G.; writing-original draft preparation, C.F. and J.F.G.; and writing-review and editing, C.F. and J.F.G. All authors have read and agreed to the published version of the manuscript.

Funding:
The work of C.F. was supported by the Federico and Elvia Faggin Foundation.

Conflicts of Interest:
The authors declare no conflict of interest.