Quantum Relative Entropy of Tagging and Thermodynamics

Thermodynamics establishes a relation between the work that can be obtained in a transformation of a physical system and its relative entropy with respect to the equilibrium state. It also describes how the bits of an informational reservoir can be traded for work using Heat Engines. Therefore, an indirect relation between the relative entropy and the informational bits is implied. From a different perspective, we define procedures to store information about the state of a physical system into a sequence of tagging qubits. Our labeling operations provide reversible ways of trading the relative entropy gained from the observation of a physical system for adequately initialized qubits, which are used to hold that information. After taking into account all the qubits involved, we reproduce the relations mentioned above between relative entropies of physical systems and the bits of information reservoirs. Some of them hold only under a restricted class of coding bases. The reason for it is that quantum states do not necessarily commute. However, we prove that it is always possible to find a basis (equivalent to the total angular momentum one) for which Thermodynamics and our labeling system yield the same relation.

Physical systems in a state ρ out of thermal equilibrium also allow the production of work. It turns out to be related to the relative entropy S(ρ||τ) := Tr {ρ log ρ − ρ log τ} with respect to the equilibrium state τ, again an informational quantity (in this paper, log(x) always represents the binary logarithm of x). Appendix B contains a short derivation of this result. Some recent reviews compile a variety of properties and functional descriptions of relative entropy [39][40][41][42]. Probably, the most closely connected to this paper is its interpretation as the average extra number of bits that are employed when a code optimized for a given probabilistic distribution of words is used for some other. This paper contributes a new procedure that also reveals a direct connection between the relative entropy of physical systems and information reservoirs circumventing Thermodynamics. It focuses on the quantum case, particularly when the relative entropy is defined for non-commuting density matrices.
The generation of work in Information Heat Engines always requires the transfer of thermal energy from a heat reservoir and needs adequate steering of a Hamiltonian. In Szilard Engines [14,43,44], they occur at the same time as the piston moves within the cylinder. In the one particle case, every bit from an information reservoir enables the generation of k B (ln 2)T mechanical work. Other thermal machines, such as turbines, also imply tuning Hamiltonians and heat transfer. Combining these devices, it is possible to produce work by increasing the entropy of informational qubits and use it to build up the relative entropy of a physical system with respect to its thermal equilibrium state. The net heat and work would vanish (see Figure 1). This consideration naturally motivates the question of whether it would be possible to do the same transformation only with informational manipulations, without any reference to Hamiltonians, temperatures or heat baths. According to Thermodynamics, work can be reversibly obtained from a heat bath by consuming B bits of an information reservoir and also by decreasing its relative entropy S(ρ||τ) with respect to the thermal state τ.

Out of Equillibrium
In order to simplify the quantification of the resources involved, we exclusively consider unitary operations. In addition, we only allow observational interactions on the physical system. This restricts the set of transformations to those defined by controlled unitary gates, in which the state of the physical system remains in the controlling part. Basically, we compute the informational cost of labeling a physical system by considering the number of pure |0 state tagging qubits at the input minus those recovered at the output. In the following, we may use the terms "initialize" or "reset" to denote the action of driving a tagging qubit to a |0 state.
The tagging operation implies using some coding procedure to correlate the quantum states of a physical system and its label. Conversely, we also consider the process of deleting that correlation and returning the tagging qubits to an information reservoir. We assume that the code has to be optimal in the sense that it uses the least average amount of bits to identify the state of a physical system. Averaging is defined with respect to an underlying probability distribution of states. For this reason, we choose a Shannon coding technique; it is asymptotically optimal and provides a simple relation between the lengths and the probabilities of the codewords. We consider two degrees of tagging. The tight-labeling implies a reversible assignation of a label to every physical system. It is described in Section 2. The loose-labeling implies tight-labeling a collection of physical systems followed by a random shuffling. It is studied in Section 3. The discussion is presented in Section 4 and the conclusions in Section 5. A simple example, using magnetic spins, is given in Appendix E, in order to illustrate some of the ideas presented in the paper.

Tight Labeling
In this section, we present a method for writing a label, consisting of a set of qubits, with the purpose of representing the quantum state of a physical system. We call it tight because each label is assigned unambiguously to the state of the physical system that it is attached to.
We consider a sufficiently large set of identical physical systems that will be referred to as atoms. For simplicity, a two-dimensional Hilbert space for them is assumed. The statistical distribution for each atom is determined by its quantum state, which is known to be σ. The eigenstates of σ are denoted by |↑ , |↓ and their eigenvalues are ordered as σ ↓ ≤ σ ↑ . Atoms are grouped in clusters of a common length N. Besides the atoms, we assume unlimited availability of tagging qubits in either a pure |0 or a maximally mixed state (see Figure 2). Figure 2. Our labeling procedures assume the availability of tagging qubits in either a |0 or a maximally mixed state, and physical systems, referred to as atoms. The labeling assigns a set of H tagging qubits to a cluster of N atoms. The cost is defined as the number of tagging qubits in state |0 employed.
We consider a coding basis B C that diagonalizes σ ⊗N . Its 2 N vectors are denoted by |b 1 . . . b N B C , where b i can be either 0 or 1.
The operations considered are: 1. Any unitary transformation on a system T of tagging qubits. 2. Unitary transformations on joint states of cluster C and a system T of K tagging qubits, U : (C ⊗ T) → (C ⊗ T) that are defined, using the B C basis for C and the computational basis B T for T, by: where f (b 1 . . . b N ) is any function that transforms binary strings of N bits into strings of K bits. They can be considered as just probing operations with respect to clusters. It is also easy to check The cost of any operation will be tallied as the number of tagging qubits in a pure |0 state that are input and not recovered at the output.
We further choose a binary Shannon lossless code for the b 1 . . . b N sequences according to a frequency given by the eigenvalue σ b 1 ...b N of σ ⊗N that corresponds to |b 1 . . . b N B C . The procedure defines a function c(b 1 . . . b N ) that assigns a binary codeword to every string b 1 . . . b N . Let H be the maximum length of all the codewords. We further define a label as an array of H tagging qubits. The 2 H vectors of the label computational basis B L are denoted by | 1 . . . H B L , where i can be either 0 or 1.
The coding procedure determines a unitary operator U c that acts on every cluster-label pair (C, L). It can be defined by specifying how the elements of the basis B C ⊗ B L transform; it is given by: where c * (b 1 , . . . , b N ) represents the Shannon codeword c(b 1 , . . . , b N ) supplemented with the necessary trailing 0s to match a length H.
Labeling an N-atoms cluster C in state ρ means applying the unitary operator U c to the cluster and a label of H tagging qubits in a |0 . . . 0 B L state, as is depicted in Figure 3. When ρ is diagonal in the B C basis, the operation is equivalent to the classical operation of coding according to the Shannon method and storing the codeword in the label. The resulting state is a classical statistical mixture of cluster-label pairs.
. . . Figure 3. (a) represents the coding procedure as a unitary operation U c , controlled by the cluster C, on the tagging qubits of the label L. If the tagging qubits are initially in a |0 state, they hold the coded string for the cluster; (b) represents the inverse operation, which is equal to U c , so that U 2 c is the identity transformation.
We define the width w(c(b 1 . . . b N )) of the codeword c(b 1 . . . b N ) assigned to the ket |b 1 . . . b N B C as the number of bits in c(b 1 . . . b N ). Only the leading w(c(b 1 . . . b N )) qubits of the label contain information. The H − w(c(b 1 . . . b N )) trailing qubits of L are superfluous and should be replaced by others in a completely mixed state. For this purpose, we define a new unitary operation U trimming by which the trailing qubits are swapped with those of a new label D. In the labeling process, D contains H maximally mixed qubits. The operation of U trimming is depicted in Figure 4 and explained more elaborately in Appendix C.
. . . : : :   Figure 4. Representation of the procedure employed for replacing the trailing qubits that need not be used in the codeword by maximally mixed ones, as explained in Appendix C. It is split into two unitary transformations. The first one, represented in (a), copies the first w(c( )) qubits of L into L that enters with all its tagging qubits in the |0 state. The remaining qubits are copied from the maximally mixed qubits of D. The second transformation, represented in (b), resets the L label and the H − w(c( )) trailing qubits of D. The overall function is recovering H − w(c( )) qubits in state |0 and generate a new label with maximally mixed trailing qubits.
In our setting, the width of the codeword is a quantum variable represented by the operatorŴ, which operates on the Hilbert space of the cluster. Its eigenvectors are those of the B C basis, and its eigenvalues are the widths of their Shannon codewords. Its effect on the basis vectors is: and also represents the cost of the labeling procedure. We further define the atomic widthŴ 1 asŴ/N. For sufficiently large N, w(c(b 1 , . . . , b N ))/N converges to − log σ b 1 ,...,b N /N. Accordingly, Thus, for sufficiently large N, the average atomic width W 1 (ρ) for a codeword in state ρ ⊗N is given by: which, taking into account that S(ρ ⊗N ) = NS(ρ), S(ρ ⊗N ||σ ⊗N ) = NS(ρ||σ), can be rewritten as and represents the atomic cost of labeling clusters in state ρ ⊗N . It is straightforward to reverse the process and check that an atomic yield of is obtained. A cluster in state ρ ⊗N can be described as being in a probabilistic mixture of eigenstates of ρ ⊗N . Each of them defines a tight-label that allows for unambiguously identifying which eigenstate of ρ ⊗N it is attached to. The cost expressed by Equation (6) represents the average length of the codewords that correspond to clusters drawn according to the distribution defined by ρ ⊗N . The equivalent situation in Thermodynamics corresponds to averaging the work that is necessary to produce pure state physical systems (as spin systems in the example of Appendix E) out of equilibrium, following the distribution given by ρ ⊗N . However, there is a subtle difference from the case that we want to model in the next section. In it, we still have clusters whose states are drawn from the distribution defined by ρ ⊗N , but we ignore the particular state of each cluster. To cope with this new situation, it would be sufficient to overlook the precise label that is attached to each cluster, but keeping the distribution that corresponds to ρ ⊗N . This is implemented through a process of label shuffling for a collection of clusters in state ρ ⊗N . The procedure is described in Section 3.

Loose Labeling
The tagging procedure put forth in the previous section outputs maximally correlated cluster-label pairs. In this section, we describe a procedure that reduces the correlation. To this end, the label L assigned to cluster C will be the codeword of another cluster C that belongs to a collection F of M clusters, all of them in state ρ C = ρ ⊗N . Therefore, the state of F is ρ F = (ρ ⊗N ) ⊗M = ρ ⊗NM . Figure 5 represents the process with unitary gates that acts on a collection of M labeled cluster pairs w 1 − c 1 , . . . , w M − c M , a random set of P tagging qubits p 1 , . . . , p P and an auxiliary label collection v 1 , . . . , v M . The role played by p 1 , . . . , p P is to generate a random shuffling of the labels. The process is analyzed in Appendix D, where it is shown that the average number R(F) of qubits in the p 1 , . . . , p P array that exit in a |0 state verifies that, for large M, R(F) M converges to S L , where S L is the entropy associated with the probability distribution for the 2 N possible labels of the 2 N elements of the B C basis.
Therefore, taking into account the cost of tight-labeling the F collection, given by Equation (5), the one of loosely tagging the clusters of F is which leads to a value per atom: Next, we will find a suitable expression for The probabilities for the set of codewords are the eigenvalues of the density matrix It is immediate to check that ). Thus, Equation (9) can be rewritten as Notice that E (·) B C represents a CPTP (Completely Positive Trace Preserving) map, which is not a unitary transformation. However, all the operations of our labeling system are unitary. The CPTP map is used here just as a means to find a convenient expression for S L , not as a real operation on the state of the clusters and labels.
. . . Next, we define a particular base B C for which Equation (11) will only retain the last term in the N → ∞ limit.
Let X, Y, Z be the operators represented by the Pauli matrices in the |↑ , |↓ base, which diagonalizes σ. In the Hilbert space of the i−th atom of a cluster, they are denoted by X i , Y i , Z i . For the whole cluster, we define: Let B M be the basis which diagonalizes S z , S 2 (the well-known momentum basis in Quantum Physics).
In the B M basis, any state ρ ⊗N is a mixture of states ρ ⊗N = ∑ r i ρ i , where ρ i is a state defined within the i − th invariant subspace of S 2 . The entropy of this mixture is the sum of the entropy S r associated with the mixture and the weighted entropy of the different ρ i : The same decomposition can be applied to the E ρ ⊗N B M state. Taking into account that ρ ⊗N , so that using Equation (13) , and substracting both entropies, we arrive at The maximum entropy of a state ρ i is log d i , where d i is the dimension of the i−th invariant subspace. Its maximum value is N + 1. Therefore, the absolute value of the right half side of Equation (15) can not be greater than log(N + 1). In the N → ∞ limit, so that, for sufficiently large N,

Discussion
A well-known situation in Thermodynamics is the availability of a thermal reservoir of freely available, non-interacting atoms in a Gibbs state, given by: whereĤ is the Hamiltonian of each atom, T is the temperature and Z(Ĥ, T) represents the partition function.
A cluster of atoms in another state ρ can be used to obtain work from the energy of the thermal reservoir. The obtainable work per atom, as given in [41,45] and derived in Appendix B, is and is usually collected by some physical means (mechanical, electromagnetic, etc.) which involves coupling to thermal baths and mechanical or electromagnetic energy-storage systems. Remarkably, it only depends on the relative entropy S(ρ||τ) that is connected to the physics through the dependence of τ on the HamiltonianĤ and the temperature T. The work obtained matches the heat transferred from the thermal reservoir plus the decrease of the internal energy of the atom. After the process, the state of the atom is τ. The reverse operation, driving a system initially in state τ to an out of equilibrium state ρ demands the same amount of work to be supplied in the process.
In this paper, we have come across the relative entropy from a very different approach. We have chosen to employ a coding system that asymptotically minimizes the labeling cost for τ. Shannon coding for state τ satisfies this requirement. In the process of labeling, we incur a cost that can be evaluated after substitution of τ for σ in Equation (17), in the case of loose-labeling. It is equivalent to the process of driving a system from state τ to state ρ, with the following observation: while in the Thermodynamic operation the process of transforming requires work, in the labeling approach, it needs qubits in the |0 state.
In a different Thermodynamic setting, we know that work can also be obtained from informational qubits at Information Heat Engines in the presence of a heat bath. Typically, they enter in a known pure state (say |0 ) and exit in a maximally mixed one. They need to be coupled to a physical system that should be able to equilibrate with a thermal reservoir and couple to an external energy storage. The work obtained per qubit is Accordingly, it is clear that the relative entropy of physical systems with respect to a thermal state can be traded for bits of information reservoirs by means of engines and heat baths within a Thermodynamic context.
We claim that, in this paper, we have described a way to do the same with purely informational manipulations. Furthermore, the physical system is accessed for probing operations that reduce to acting on the information qubits according to its state. Labeling is a particular kind of these processes.
However, the loose-labeling cost, given by Equation (11), depends on the labeling strategy through the choice of the coding basis B C . The atomic cost does not converge to the relative entropy unless S(ρ ⊗N ) converges to S(E ρ ⊗N

B C
). This is trivial if ρ, σ commute, but, in a general quantum scenario, it can not be assumed. Nonetheless, even in the non-commuting case, it is accomplished by using the eigenbasis described in Section 3. For the general case, when another basis is chosen, S(ρ ⊗N ) does not converge to S(E ρ ⊗N

B C
) and the loose-labeling costs are lower than the quantum relative entropy. This can be deduced by substracting both: where we have taken into account that, because σ ⊗N is diagonal in B C , then Notice that σ ⊗N = E σ ⊗N B C , so that Equation (21) can be written as which is always positive by the monotonicity of quantum relative entropy. At any point, tracing out the label places the reduced state of the cluster back to σ. Therefore, we interact with the cluster just to obtain or delete information about it. The parallel with the situation in Thermodynamics is clear: bits from information reservoirs are traded for changing the relative entropy of state ρ with respect to σ, τ. From this point of view, the most important aspect of a physical system in a thermodynamical setting is knowing its state, so that it can be used to supply the corresponding work. It is the state that fixes the process by which work is obtained.
The two types of labeling described exhibit different costs. It is quite obvious because the loose-labeling implies shuffling. This leads to labels that are related not to a particular cluster, but to a collection of them that share some particular state. It is natural that the work given by Equation (19) is related to this cost because it assumes a process which is common to all of them. However, if we have a tight-labeled collection of clusters, we can process each one in a different way, chosen according to its label. Then, each cluster would contribute a work given by the relative entropy of the pure state |b 1 . . . b N B C identified according to the label. The average work value would be given by : which corresponds to the cost of tight-labeling, given by Equation (6), irrespective of the particular choice of the coding basis. From another perspective, loose-labeling is essentially the process of disarranging the tight-labels of a collection of M clusters. Let us first assume that σ, ρ commute. Asymptotically, as M → ∞, the number of possible orders for the set of labels tends to M S(ρ), which is precisely the difference between Equations (6) and (17). From a physical point of view, both expressions point to slightly different situations. When a thermal engine is tuned to supply work from a physical system out of equilibrium, its configuration depends on the state of the system. Each pure or mixed state requires different settings. Let W i be the work obtained from a system in a pure state |r i . Next, we consider two cases: (a) the pure state of each physical system is known, and the setting can be adjusted according to it.
Then, the average work obtained by processing a collection of physical systems is the weighted average of all the W i , each one contributing according to its corresponding eigenvalue r i in the density matrix ρ ⊗N = ∑ r i |r i r i |. It is given by Equation (24). (b) only the collective mixed state ρ ⊗NM of the collection is known. Then, the engine is tuned with a different set of parameters, and the average value of the extracted work is lower than in the previous case. It is given in Equation (19).
Situations (a) and (b) correspond to the tight and loose-labeling techniques, respectively. Work is immediately translated by information heat engines into reservoir bits. The conversion factor is given in Equation (20). The difference in the average value of work in (a) and (b) translates exactly into M S(ρ) bits. The same results can be extended to the case when σ, ρ do not commute, provided that coding is defined in a suitable basis.

Conclusions
In this paper, we have described two ways to label physical systems grouped in clusters. In both procedures, unitary transformations operate on a Hilbert space determined by the cluster and additional informational qubits. In the tight-labeling method, the label identifies pure states of the cluster, while, in the loose-labeling case, the label is chosen at random from a collection of clusters that share the same mixed quantum state. The costs of both procedures have been deduced in the asymptotic limit of infinite equal systems. The evaluation has been made counting the number of informational qubits in the pure |0 state in the final and the initial situations. They are related to the von Neumann and relative entropies S(ρ), S(ρ||σ), where ρ is the state of the physical systems and σ is the state relative to which the labeling is optimized. In both processes, no manipulation of the physical system is attempted. Its intervention only exhibits an observational character.
We have shown that the atomic cost of tight-labeling converges to the sum of the von Neumann and relative entropies W 1 = S(ρ) + S(ρ||σ). Remarkably, this result does not depend on the coding basis. However, in the loose-labeling case, the atomic cost depends not only on ρ, σ, but also on the coding basis. It is bounded by the relative entropy S(ρ||σ), to which it can converge when a right basis is chosen, as explained in Section 3.
Assuming this choice of basis, we have shown that the costs of labeling, both in the tight and loose versions, correspond to what thermodynamical processing predicts by combining the models of (a) work extraction from physical systems out of equilibrium, and (b) information heat engines powered by pure state qubits. Through writing and erasing labels, we have presented a procedure to trade relative entropy for von Neumann entropy of the physical system just by informational manipulation.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. Magnetic Spin Information Heat Engine
In order to illustrate the connection between information reservoirs and thermal machines that is mentioned in the Introduction, this appendix describes a short version of a magnetic Quantum Information Heat Engine [17]. Its input is a low entropy qubit Λ from an information reservoir, and its output is twofold: the same qubit in a higher entropy state and electrical work is delivered to a magnetic coil through an induced electromotive force. The energy stems from a thermal reservoir at temperature T. The engine is reversible and can be employed to lower the entropy of qubits from an information reservoir if some work is supplied. Figure A1. On the left, the basics components of the magnetic quantum information engine. The CNOT (Controlled NOT) gate is used to correlate the information Λ and the magnetic Γ qubits. The evolution of the electric current I coil is controlled by the state of Λ. The magnetic induction field B at Γ that is generated by the coil is kI coilẑ . The right part of the figure contains a graph of the evolutions od I coil , µ z (solid and dashed lines, respectively). The upper branches represent the case λ = 0, whereas the lower branches take place when λ = 1, as described in Appendix A .
The physical system within the engine (see Figure A1) consists of an internal magnetic spin 1 2 particle Γ that sits in the magnetic field generated by a classical induction coil C. The z-component of the magnetic moment of Γ is µ Γσz , where σ z is the third Pauli matrix. The machine operates cyclically, according to the following steps: (i) The electric current in the coil I coil is initially null. The qubit from the information reservoir enters in a partially mixed state λ 0 = q 0 |0 0| + (1 − q 0 ) |1 1|. The magnetic qubit is initially in a maximally mixed state γ 0 = 1 2 |↑ ↑| + |↓ ↓| . They undergo a CNOT (Controlled NOT) operation, controlled by the magnetic qubit. The result is an entangled mixed state: which defines two conditional states for the magnetic qubit Γ, each one corresponding to a state of the informational qubit Λ.
(ii) In this stage, the magnetic qubit Γ is isolated from the thermal reservoir. The current in the coil is raised to a value I (λ=0,1) coil controlled by λ. It is determined by making γ λ=0,1 equal to the Gibbs state for the corresponding value of the magnetic field B Γ = k I coilẑ , at Γ generated by the current I coil . The equation that fixes the two possible values of I which, together with Equation (A2), yields (iii) The current is gradually turned down until it is completely switched off. The process occurs slowly enough for assuming a thermal equilibrium state for Γ throughout this stage. Therefore, the final states for Λ, Γ are maximally mixed. Only the Λ qubit exits in a different state than it started in. The current and the Γ qubit are reset to their initial conditions.
Let Φ C−>C , Φ Γ−>C be the contributions to the magnetic flux through the coil from its own current and the magnetic spin Γ, respectively. We further assume that the induction field B Γ generated by the coil current I coil at the position of Γ is kI coilẑ . By the reciprocity theorem [46], we can state that Φ Γ−>C = k µ z . The differential work on the coil can be evaluated by which, in a cycle, integrates to where The net energy supplied to the current source is where µ z is the value of µ z (λ=0,1) throughout stage ii), which is easily evaluated to Now, Equation (A8) can be rewritten as which, in both cases λ = 0, 1, yields the same result: which represents the potential of the qubits in the information reservoir to produce work. If qubits in a λ 0 = |0 0| state are used, Equation (20) is obtained.

Appendix B. Work Output from a System Out of Thermal Equilibrium
In this appendix, we derive the average work that can be obtained from a physical system in an initial state ρ i with HamiltonianĤ i when we reversibly drive the system to a thermal equilibrium state ρ f with HamiltonianĤ f , related through Equation (18). The conclusion has already been given elsewhere [41], and we only present an alternative derivation that fits better with this paper. The process is reversible so that the work obtained is the maximum that can be achieved. We split the transformation into three steps (represented in Figure A2): 2.
3. Figure A2. Representation of the process described in Appendix B for the work extraction in a process beginning with HamiltonianĤ i in state ρ i and ending in HamiltonianĤ f and state ρ f . Stages 2 and 3 are isothermal.
1. The Hamiltonian is changed fromĤ i to the operatorĤ s for which ρ i is in thermal equilibrium at temperature T. It is given byĤ In this step, the Hamiltonian varies quickly, so that the state does not change. Considering that there is no heat exchange in this step, the work obtained is 2. The Hamiltonian is slowly taken back toĤ i , while thermal equilibrium is assured. Therefore, the system is driven to the state ρ r given by: The second principle of Thermodynamics implies that the work extracted is where S(ρ) = −Tr {ρ log ρ} and U(ρ,Ĥ) = Tr ρĤ . Taking into account Equations (A12) and (A14), W 2 evaluates to which, together with W 1 , adds up to 3. While keeping thermal equilibrium, the Hamiltonian is taken fromĤ i toĤ f . The work is obtained as in the previous step and yields: The final count gives The quantity F(Ĥ, T) := −k B (ln 2)T log Z(Ĥ, T) is known as the free energy for a physical system with HamiltonianĤ in equilibrium at temperature T. Equation (A19) can be rewritten as In the context of this paper, we assume thatĤ i =Ĥ f and the work that can be extracted from a system out of equilibrium is directly related to the relative entropy

Appendix C. Label Trimming
In this appendix, we describe the trimming operation depicted in Figure 4. The w(c( )) leading qubits of label L hold the codeword assigned to the cluster. The trailing qubits are in a |0 state. The purpose of the operation is to replace them with worthless qubits in a maximally mixed state. It is split into two unitary operations. The first one performs a transformation U trimming,1 on three labels L, L , D. L enters the process with all its tagging qubits in a |0 state, while those of D are maximally mixed. The operation is controlled by the qubits of L and copies its leading w(c( )) qubits to L . The remaining H − w(c( )) ones are copied from D. Now, L holds the desired label. The following operation, carried out by the gate U trimming,2 , aims at resetting the L label and the trailing H − w(c( )) qubits of D. It is controlled by L where the codeword is now stored. Considering the whole operation, H − w(c( )) valuable qubits are recovered in the process.

Appendix D. Label Shuffling
In this appendix, we explain how the shuffling of labels depicted in Figure 5 can be implemented by a unitary transformation. We have split the process into three operations, performed by the unitary gates U shuffling , U which p and U dispose w . The first gate operates on three groups of tagging qubits: W, V are collections of M labels and J is a collection of P qubits, where P is equal to M!. Initially, W contains the labels generated for a group E of M clusters; all the tagging qubits in V are in a |0 state and all the qubits in J are in a maximally mixed state. The gate writes in V the values in W according to the p-th permutation of the labels w 1 , . . . , w M . The value of p is read from the random group J. The unnecessary trailing qubits of each label are then trimmed according to the procedure described in Appendix C.
The second gate, U which p , identifies the p of the permutation from the values of W, V and resets the qubits of J that host the value of p.
The number of possible values for each label w i is 2 N . Let L i be the i-th one and n i (F) the number of labels in the collection F that are equal to L i . The number of permutations of the labels is The leading R(F) := log P(F) qubits of p 1 , . . . , p P are reset to state |0 after the gate. For large M, where S L is the entropy associated with the probability distribution for the 2 N possible labels of the 2 N elements of the B C basis.
After the U which p gate, W still holds the ordered collection of labels, while V holds a shuffled set. The last operation performed by U disposew , resets the ordered set by using the collection of clusters c 1 , . . . , c M that generated them. The final state is a collection of reset labels at W, a shuffled set at V, and a collection of |0 state qubits that correspond to the leading R(F) := log P(F) members of J.

Appendix E. Example
In order to illustrate the contents of this paper, we next present a simple example. Let us assume that the atoms are magnetic spin 1 2 systems in the presence of a Hamiltonian which determines a Gibbs state If we have a cluster of N spins in a pure state ρ ↑ := |↑ ↑|, we know, from Equation (A21), that we can obtain an average work of The same value is necessary for the reverse operation, whereby a system of N spins in state ρ ↑ is obtained from N systems in state τ, provided that work W ex is supplied. The same work can be obtained at an Information Heat Engine if W ex /k B (ln 2)T bits from an information reservoir are used. This consideration sets an exchange rate between spins and bits given by S(ρ ↑ || τ) bits per spin. We have described a particular model of Information Heat Engine that employs magnetic spins in [47]. Work is extracted by exposing the spins to a sequence of interactions with a suitable magnetic field in the presence of a heat bath at temperature T. The magnetic field is chosen according to the state of the spin.
Tight-labeling a cluster C of N spins implies applying a Shannon coding procedure that assigns a sequence L of tagging qubits to C. The coding procedure defined in Section 2 consists of two steps. We have found that the best way to keep track of the costs is to employ unitary transformations. They are defined by their action on the elements of a particular basis B C . We require τ to be diagonal in B C . For N > 1, there are many possible choices of B C . Let p(b) be the eigenvalue of τ for the element b of B C . Our coding procedure begins by assigning to b a sequence of tagging qubits. The leading − log p(b) represents the Shannon code for b, while the trailing ones are |0 state qubits (see Figure 3). The operation is defined in Equation (2). Then, the valuable trailing qubits are recovered by the trimming procedure described in Appendix C and represented in Figure 4.
In Section 2, we prove that the tight-labeling of a cluster of N spins in state ρ ↑ , according to a Shannon coding system optimized for a distribution of states given by τ, requires exactly S(ρ ↑ ) + S(ρ ↑ || τ) = S(ρ ↑ || τ) bits per spin in the N → ∞ limit. This is the same number of bits per spin found for the Thermodynamic approach at the beginning of this section. However, we would like to underline the different contexts in which they have been derived. They come as answers to the following questions: 1. How many bits from an information reservoir are needed to match the work that is required to take a cluster of N spins out of thermal equilibrium?
2. How many bits are needed to label the state of a cluster of N spins using a coding system optimized for a given distribution of states?
Note that, in the second context, there are no concepts like temperature, heat, work or Hamiltonian. However, the coincidence of the answers to both questions could have been predicted after some abstraction was made about the first situation. Considering the initial and final states of spins and information reservoir bits, the net result is the obtention of some spins in a known state after employing a number of bits from an information reservoir. In the labeling scenario, the same process has happened. Before the operation, in both cases, picking a cluster at random implies a quantum state τ for it. At the end, we can know that it is in state ρ ↑ . For the reverse process, given the unitarity of the tight-labeling procedure, un-labeling a state restores the |0 state of the label qubits. In the Thermodynamic scenario, processing a spin in state ρ ↑ , produces an average amount of work that can be used to reset the given number of informational bits. Again, in both contexts, we see the same initial and final situations.
Collections of spin clusters sampled from the same state can then undergo a loose-labeling procedure by which their tight-labels are reassigned at random. In this particular example, defined by a pure state ρ ↑ , all the labels are equal and no informational qubits are recovered. In fact, S(ρ ↑ ) = 0, so that the unitary costs of tight-and loose-labeling are the same and are given by the relative entropy S(ρ ↑ ||τ).
The most relevant result of the paper is that this situation holds even when the state to be labeled does not commute with the state σ that describes the underlying distribution, provided that a particular basis is chosen to define the coding procedure. In the example presented here, it is the basis B M that diagonalizes the magnitude and the z component of the total spin of the cluster.
Let us consider a state |→ := 2 −1/2 (|↑ + |↓ ) and its density matrix ρ → := |→ →|. Unlike ρ ↑ , ρ → does not commute with τ. It is impossible to find a basis that simultaneously diagonalizes ρ ⊗N → and τ ⊗N . Therefore, the tight-label assigned to |→ ⊗N must be a linear combination of the labels defined for the elements of B C . In the B ⊗M C basis, the collection of M systems in the σ → state is described by a superposition of basis states. Some of them contain the same labels for the M clusters, but others do not and thus may undergo a shuffling process that allows the recovery of some reset qubits. Its average number per cluster is given by S L , as defined in Appendix D. However, if the B M basis is chosen, then S L /N tends to 0 and the relative entropy is recovered as the unit cost, as explained in Section 3.
The same result extends to the case when the labels are shuffled in a collection of clusters at any mixed state ρ, whether or not it commutes with σ, as described in Section 3.