Battery Charging in Collision Models with Bayesian Risk Strategies

We constructed a collision model where measurements in the system, together with a Bayesian decision rule, are used to classify the incoming ancillas as having either high or low ergotropy (maximum extractable work). The former are allowed to leave, while the latter are redirected for further processing, aimed at increasing their ergotropy further. The ancillas play the role of a quantum battery, and the collision model, therefore, implements a Maxwell demon. To make the process autonomous and with a well-defined limit cycle, the information collected by the demon is reset after each collision by means of a cold heat bath.

A particularly nice feature of these models is that they allow for a clean implementation of autonomous processes: Ancillas arrive, undergo some physical process, and then leave. Different implementations can be used to perform different tasks, which are gauged by the changes in the ancilla's state. Moreover, the process is allowed to continue indefinitely, as long as new ancillas continue to arrive. Indeed, there have already been several proposals that employ collision models, e.g., quantum heat engines [30][31][32][33][34][35][36][37][38] or quantum thermometers [39][40][41].
In this paper, we discuss the implementation of an autonomous collision model engine aimed at charging quantum batteries. Battery charging in the quantum domain is an active field of study [42][43][44][45][46][47][48][49]. The present framework aims to produce a model in which this charging occurs autonomously, for an arbitrary number of charging units, and in a way that works for arbitrary initial battery states.
The input of the engine is a stream of ancillas, drawn randomly from some ensemble of states. The thermodynamic "usefulness" of each ancilla will be characterized by its ergotropy [50], which quantifies the maximum amount of work that can be extracted from it by means of a unitary interaction. The goal of the engine is then to increase the average ergotropy of the outgoing ancillas. This is accomplished by using information extracted from measurements in the system, as depicted in Figure 1 (the ancillas are never measured). This setup was inspired by Ref. [51], which studied the ergotropy that could be extracted from quantum correlations between a system and a single ancilla. Furthermore, it is opposite in spirit to, e.g., continuously monitored systems [52,53], where one uses measurements in the ancillas to learn something about the system [54][55][56][57]; here we instead use information about the system to learn about the ancillas.
The present study is also closely related to Ref. [58], which studied the flows of information through a Maxwell demon in a sequential collision scheme. In contrast, the new feature of the present study is in the description of the actual decision process of the demon. That is, while Ref. [58] is concerned with the overall flow of information, this does not specify what are the optimal strategies the demon should adopt in order to act upon that information since this is task-dependent. The present manuscript provides a concrete example of tge said decision process, where the focus is on the ergotropy.
The measurement outcomes are used to classify the ancillas as having either high or low ergotropy, which we model using Bayesian decision theory [59]. This, therefore, implements a Maxwell demon [60], which autonomously decides what to do with each ancilla. High ergotropy ancillas (defined according to some threshold) are allowed to leave, while low ergotropy ones are flagged for further processing. That is, they are redirected to go through another quantum channel aimed at increasing their ergotropy further ( Figure 1). In our case, we will model this in terms of an additional unitary pulse, but more general quantum channels can also be used.
The system, in this case, plays the role of a memory. As is well known, the process of acquiring information can, in principle, be done without any energetic cost. However, there is a fundamental cost in erasing the information [61,62], given by Landauer's principle [63]. We model this by assuming that the system is coupled to a cold heat bath that acts for a finite time in between collisions. As we show, this is crucial for the engine to operate autonomously.  A stream of ancillas, drawn from random states, interact with a system S. Measurements in S are then used to distinguish whether the ancillas have low or high ergotropy. This information is used by a (space invader) demon, operating under the paradigm of Bayesian Decision Theory, to decide whether or not the ancillas should be further processed or not, with the goal of increasing their ergotropy even further.

Basic Model
We consider a stream of ancillas, each prepared in a state |ψ A drawn from an ensemble of d possible states {|ψ i } (not necessarily orthogonal), with probability q i . Often, in the collision model literature, one assumes that the ancillas are in mixed states. This is a natural choice if one is interested in the steady-state properties of the system. However, here, for the task at hand, it is much more natural to assume that the ancillas are in pure states. Notwithstanding, all results below also hold for ensembles of mixed states. The notations ψ A = |ψ A ψ A | will be used whenever the ancilla state is pure.
The thermodynamic utility of each ancilla can be quantified by its ergotropy [50], which, for a generic ancilla state ρ A , is defined as where H A is the ancilla Hamiltonian, and the minimization is over all unitaries V. When the state is pure, this reduces to the more intuitive result where E A gs is the ground state of H A . The stream of ancillas is first put to interact with a system S, one at a time, for a fixed duration τ SA , according to some Hamiltonian H SA . If the system is in ρ S and the ancilla is in ψ A , this produces the map Immediately afterwards, the system is measured, which we describe by a set of Kraus operators {M x }, with m possible outcomes, x = 1, . . . , m, and satisfying ∑ x M † x M x = 1. The probability of outcome x, conditioned on the initial ancilla state, is where I A is the identity acting on the ancilla. Moreover, if outcome x is observed, the reduced state of SA should be updated to From this, the reduced states of system and ancilla, ρ S|x,ψ A and ρ A|x,ψ A , can be obtained by taking the partial trace.
In between collisions, the state of the system is allowed to relax in contact with a heat bath, which we describe by a Lindblad master equation acting for a fixed time τ SE . It is assumed for simplicity that τ SA τ SE so that, during the system-ancilla interaction, the system is approximately uncoupled from the bath.
Based on the outcome x, a demon tries to classify whether an ancilla has a high or a low ergotropy W (according to some model-specific threshold). The former can leave the process, while the latter are redirected for additional processing, aimed at increasing their ergotropy further. We describe this in terms of a unitary pulse O, so that the final state of the ancilla will be The meaning of low or high ergotropy is model specific and will be discussed further below. The ultimate goal of the engine is thus to produce an ensemble with average ergotropy higher than that of the initial ensemble {q i , |ψ i } where the subscript "raw" will always refer to the ancillas before entering the engine.

Bayesian Risk Analysis
Before discussing an actual implementation, we must first discuss the type of rationale that will be used by the demon in deciding whether the ergotropy is high or low. We do this using the concept of Bayesian risk analysis, as a general tool for implementing the decision process.
There are d possible preparations ψ i , and m possible outcomes x, each pair associated with a certain quantum state ρ A|x,ψ i Equation (5). It is assumed that the demon knows the possible set of states {ψ i }, but does not know the current ancilla state, nor the probabilities q i with which they were sampled (the latter restriction could be lifted without significantly altering the problem). At each collision, all the demon knows is, therefore, the outcome x.
Based on this, it may take one of a set of a actions α k , k = 1, . . . , a. Generally speaking, we could associate each action with a quantum channel E k , which will process the quantum state of the ancilla further. For example, in the case of Equation (6), action α 1 stands for "do nothing," while α 2 stands for the unitary channel O • O † . However, more generally, all kinds of channels can, in principle, be used.
In Bayesian risk analysis, we quantify each action by a certain gain, described by a non-negative function λ(α k |x, ψ i ) determining how much is gained from using action α k when the outcome is x and the state is ψ i (one could equivalently frame the problem in terms of risks, instead of gains). This set of functions determines the type of strategy the demon will use, and different functions will lead to different engine performances. An example could be the ergotropy (1) However, as we will show below, in specific models, simpler functions can often be employed.
For each outcome x, the demon's decision will then be to choose the action α k , which maximizes the Bayesian gain where P(ψ i |x) is the probability that the initial state was |ψ i given that the outcome in the system was x. According to Bayes's rule, this is further given by where P(x|ψ i ) is the likelihood function, given in Equation (4), and P(ψ i ) is the prior probability the demon associates to the ancilla being in |ψ i . If the demon does not know in advance how the ancillas are sampled, the priors P(ψ i ) will, in general, differ from the q i . In fact, at the beginning of the process, a natural choice of prior would be P(ψ i ) = 1/d. After each collision, however, the demon updates P(ψ i ) to the posterior P(ψ i |x), which can then be used as the prior for the next step. Under mild conditions, it is expected that in the steady-state this should converge to the true sampling probabilities q i .
We also mention that, in general, the state of the system is constantly changing. As a consequence, when the above procedure is used sequentially, it may cause P(ψ i |x) at the n-th step to depend on the outcomes of all past collisions, thus making the process highly non-Markovian. In fact, even in the limiting case of projective measurements, P(ψ i |x) would still depend on the previous outcome. This is directly associated with Benett's exorcism of Maxwell's demon [61]: while there is no minimum cost to acquire information, there is always a fundamental heat cost for erasing it (see also [62]). If the engine is to operate autonomously, the memory (which is, in this case, the system) must be reset at each step. In practice, the demon may continue to employ the same gain function (9), which would happen when it is unaware of whether the system has been fully reset or not. The only problem is that this may cause it to make wrong decisions. The better the memory is reset, the more accurate the demon's decision is.

Qubit-Qubit Model
We now consider a concrete implementation of this approach, where we assume that the system and ancillas are all made of qubits. The ancilla Hamiltonian is taken to be H A = −ωσ A z /2, where σ z is a Pauli matrix. The ground-state is thus the computational basis state |0 ; i.e., σ z |0 = |0 . The ergotropy (1) is then bounded between W ∈ [0, ω], with the maximum being for the excited state |1 .

The system-ancilla interaction is taken as
This is a typical pointer-basis type of measurement [64], with information on the ancilla's population being directly encoded in the system while at the same time causing the coherence's to dephase. The ergotropy (1) has contributions from both the populations and coherences [65]. The interaction with the system will keep the former intact but disturb the latter (measurement backaction). The goal, therefore, is to see if one can increase the ergotropy from the populations while, at the same time, not excessively harming that from the coherences.
The system is measured after each step in the eigenbasis |± = (|0 ± |1 )/ √ 2 of the σ x operator. To understand why this is a good measurement strategy, suppose that the system is initially prepared in ρ S = |0 0|, while the ancilla is in |ψ A = cos(θ/2)|0 + e iφ sin(θ/2)|1 . Then Equation (4) will produce the likelihoods For θ ∈ [0, π/2] (northern hemisphere in Bloch's sphere), the outcome x = +1 is more likely, while for θ ∈ [π/2, π] (southern hemisphere) it is actually x = −1. However, the ergotropy is directly related to the position in Bloch's sphere, being low in the former and high in the latter. This means that if x = +1 is observed, it is more likely that the ancilla has a low ergotropy. A very simple Bayesian strategy is thus to take the gain of no action (α 1 ) as λ(α 1 |x, ψ i ) = 1 when x = −1, and zero otherwise; and similarly λ(α 2 |x, ψ i ) = 1 when x = 1, and zero otherwise.
When the ancilla is flagged, it is more likely to be in the northern hemisphere. In this case, we can then apply an additional unitary pulse O = σ A x , which flips the ancilla's state to the southern hemisphere. Note that if the ergotropy is already high, this will generally spoil it. That is to say, whenever the demon makes a mistake, it will actually be degrading the ancilla's ergotropy. However, since correct decisions are more likely, it will, on average, increase it.
Finally, between measurements, the system is taken to interact with a zero temperature heat bath for a time τ SE , described by the master equation where γ is the coupling strength and D[L]ρ = LρL † − 1 2 {L † L, ρ}. Moreover, we assume H S = −ω S σ S z /2, with ω S is not necessarily resonant with the ancilla frequency ω.

Results
In what follows, the ancillas are all uniformly sampled from generic states |ψ i within the Bloch sphere, using the appropriate Haar measure. We start by assuming that γτ SE is sufficiently large so that, after each step, the state of the system is fully reset back to ρ S = |0 0|. Illustrative results are shown in Figure 2. The histogram in Figure 2a compares the raw ergotropy with that obtained at the output of the engine for fixed gτ SA = π/8. As is evident, the engine charges the ancillas, leading to a final ensemble with clearly larger ergotropy.
In Figure 2b, we show the average ergotropy as a function of gτ SA , where it is evident that stronger interactions lead to monotonic improvements in the charging process. This is expected since higher gτ SA implies more information is available to the demon to make the decision. We also show, for comparison, the ergotropy that would be obtained if all ancillas were to be processed by the engine, irrespective of the measurement outcomes (labeled "engine"). In this case, the interaction with the system causes an overall degradation of W.
This happens because the interaction in Equation (11) dephases the ancillas. Hence, the coherent part of the ergotropy tends to be lost (while the population part is unaffected). Next, we investigate what happens when the state of the ancilla is not fully reset after each step. Due to the projective nature of the measurement, after each collision, the system will either be in |+ or in |− . The state, after a time γτ SE , under the action of Equation (13), will thus be which are thus taken as the initial states of the next collision. Results for the average ergotropy are shown in Figure 3. As can be seen, when γτ SE is finite, the ergotropy is gradually reduced. This happens because when the system is not properly erased, it affects the demon's ability to make proper decisions. In fact, if γτ SE is very small, one can even obtain an average ergotropy that is worse than that of a fully random ensemble. Figure 3. The curve marked "finite reset" depicts the dependence of the average ergotropy on the system relaxation time γτ SE . The data was sampled from N = 10 4 simulations, with the system-ancilla interaction strength fixed at gτ SA = π/8. The other two curves, marked "raw" and "processed," are shown for comparison and are similar to those from Figure 2b.

Energetics
We now discuss in further detail the energetics of the problem. A closely related discussion can also be found in [66]. We divide the problem into three steps: interaction, measurement, and conditional unitary pulse. For simplicity, we focus on full system resets (γτ SE → ∞). The interaction in Equation (11) does not affect the energy of the ancillas since [H SA , H A ] = 0. However, it does affect the energy of the system. The net change in energy of system plus ancilla, in one collision, assuming the ancilla is in ψ A , is thus given by This change reflects the inherent work cost associated with the interaction H SA , known as on/off work [5,56]. Notice, however, that this will depend on the Hamiltonian in the system, which has a generic gap ω S (not necessarily resonant with the ancilla's gap ω). The on/off work can thus be made arbitrarily small by choosing ω S to be small. This means that it is possible to operate the engine in a regime where the energy cost of the collision is negligible.
Next, we turn to the effects of the measurement. We assume that the ancilla's initial state has the generic form |ψ A = cos(θ/2)|0 + e iφ sin(θ/2)|1 . The average energy of the ancillas after the measurement, given outcomes x = ±1, will then be Averaging this over the probabilities from Equation (12) recovers the initial average energy ψ A |H A |ψ A . Thus, up to this point, no work is performed in the ancillas (on average).
The actual work comes from the controlled unitary pulse, which is applied only when x = +1. This causes the energy of the ancillas to change tõ The net work is therefore in which W − = 0 when x = −1. The average work is thus Notice how work is still performed even if the system and ancilla do not interact (gτ SA = 0). This happens because, even though they do not interact, we assume that the system is nonetheless still measured, thus yielding equally likely outcomes x = ±1. That is to say, half of the time, the pulse is applied.
We now analyze this from the perspective of the ergotropy. The initial ergotropy is W 0 = ω sin 2 (θ/2). After the measurements (but before the pulse), the ergotropies conditioned on each outcome are Since the measurement does not perform any work, on average, we simply have ∑ x P(x|ψ A )W x = W 0 = ω sin 2 (θ/2), as it must be.
The net change in ergotropy is, of course, the work injected The final average ergotropy is then If gτ SA , this reduces to ω/2, which is half the maximum value it may have. Thus, if the machine is applied under no information about the ancillas whatsoever, it would result in an average ergotropy of ω/2. Furthermore, if gτ SA = π/4, the average ergotropy achieves its maximum value ω. This, therefore, fully accounts for the behavior observed in Figure 2.

Discussion
In this paper, we put forth the idea of an autonomous engine, which processes random incoming ancillas with the goal of increasing their ergotropy. There are endless possible variations of such an engine that one might construct. The goal of the present proposal was to build a minimal engine where the basic effects could be made evident. In particular, they are the following. First, the idea that, in reality, ancillas are usually sampled from an ensemble of pure states. Collision models often assume that the ancillas arrive in mixed states ρ A , which could be viewed as the ensemble average. However, for the present purposes, it is much more realistic to assume that in each collision, the state of the ancilla is pure but not necessarily known. In fact, for the example in Figure 2, the ensemble average would be simply the identity ρ A = I A /2. Sampling over pure states, therefore, naturally accounts for mixed states as well.
The second relevant aspect of this construction is the need for the state of the system to be properly reset after each step, as it plays the role of a memory. If this is not done, the ability of the demon in making a decision based on the measurement outcomes is severely degraded, as Figure 3 illustrates very clearly.
Finally, the third relevant point is the energetic balance of the problem. This has long been a major advantage of collisional models, as it enables for precise accounting of all possible energy sources and sinks. The analysis in Section 6 showed how this can be used to pinpoint, at the level of each possible measurement outcome, whether or not work is being performed, and how this affects the ergotropy at each step. Of course, the process also does not violate the second law of thermodynamics, provided one includes the information about the demon within the entropic balance.

Conflicts of Interest:
The authors declare no conflict of interest.