## 1. Introduction

In this article, we will aim to clarify some issues related to the “coarse graining” process that is typically asserted to be associated with the monotone increase of entropy implied by the second law of thermodynamics. While many sources discuss the “coarse graining” process in a somewhat intuitive manner, it is extremely difficult to find explicit, controllable and fully-calculable models; see, for instance, [

1,

2,

3,

4,

5,

6,

7,

8] (a more general background, on coarse graining in various renormalization group settings, can be found in [

9,

10,

11,

12,

13,

14,

15,

16,

17]). What we would be seeking in this current article is some process (either purely mathematical or possibly physical) that takes an arbitrary system with specified initial entropy

S and that monotonically drives the entropy to its maximum value; that is,

$lnN$, for any finite classical or quantum system described by the Shannon or von Neumann entropies [

18,

19,

20]. Whenever possible, it would be desirable to introduce an adjustable smearing parameter to continuously interpolate between the original system and the smeared system.

For some purposes, it is better to deal with a gedanken-process, since a gedanken-process is in principle reversible, and then, it is more obvious how the “hidden information” is hiding in the correlations. The techniques that work best for classical Shannon entropy often do not carry over to quantum von Neumann entropy, or vice versa. We shall develop a number of different and complementary models for coarse graining, adapted to either classical or quantum behaviour, and (classically) to the continuum versus discretium character of the system. (The discretium is often alternatively referred to as the discretum; the choice is a matter of taste. Discretum is more Latinate; discretium is more Anglophone.)

The relevance of having explicit, controllable and calculable methods available to describe coarse graining emerges when we deal with physical issues of information; specifically, the “hidden information” lurking in the correlations rendered unobservable by the coarse graining. The current study provides some tools to treat these processes, which we plan to adapt to understand what is happening with the quantum information in black hole formation and evaporation, where the classical gravitational collapse that initially forms the black hole is an extremely dramatic coarse graining process; one that encodes ideas about a possible resolution of the black hole information puzzle [

21,

22,

23,

24,

25,

26,

27,

28,

29].

## 2. Structure of the Article

We shall first present a brief historical survey of the coarse graining problem in

Section 3 and then discuss coarse graining in the context of classical Shannon entropy in

Section 4. For Shannon entropy in the continuum, a coarse graining in terms of an idealized and generalized diffusion process (possibly a gedanken-process, rather than a real physical diffusion) is the easiest to implement. More generally, for Shannon entropy, the distinction between continuum and discretium is important, and any coarse graining that maps the continuum into the discretium is particularly delicate. We then turn to the quantum von Neumann entropy in

Section 5, elucidating several coarse graining processes based on maximal mixing, on partial traces and on decoherence. We introduce Hawking’s super-scattering operator in

Section 6, with an aim to using it in

Section 7 where we define a wide class of diffusion-like processes on Hilbert space (rather than in physical space). Finally, we wrap up with a discussion in

Section 8.

## 3. History

Coarse graining plays a crucial role in our description of the world. It allows us to work with very large or complex systems by choosing a more reasonable and tractable number of variables (and consequently, reducing the observable degrees of freedom of the system under consideration). See, for instance, [

1,

2,

3,

4,

5,

6,

7,

8]. Coarse graining should be chosen so as to still describe the key properties we are most interested in, “integrating over” properties of perceived lesser importance. These coarse graining processes are used in very different fields (from biology to physics to astronomy), but for the current article, we are mostly interested applications to quantum information theory (though in the future, we plan to consider quantum fields in curved spacetimes).

The price of coarse graining is that some properties or physical effects, corresponding to the degrees of freedom that we are not looking at, will be “hidden” in the system. When calculating the entropy of these coarse-grained systems, there is an associated increase of system entropy due to uncertainty regarding the coarse-grained degrees of freedom. We shall see that the difference of entropy after imposing coarse graining can be understood as “hidden information”:

Because coarse graining can sometimes be viewed as a gedanken-process, an agreement to simply “not look at” certain features of the system, the coarse-grained entropy is unavoidably contextual and may be observer dependent.

Classically, the way to describe entropy associated with information is provided by the Shannon entropy, which first introduced the interpretation of entropy as a measure of the lack of information [

18,

19]. In the language of communications engineers:

where

${p}_{i}$ is the probability associated with each microstate and the entropy is measured in “shannons”/“bits”. Physicists prefer to use the natural logarithm, ln, and measure the entropy in “natural units” (nats). There are several ways to define information, though in general, the information can be considered as the reduction of uncertainty. The maximum information, can be defined as the difference between the maximum possible entropy and the value we measure:

This is often called negentropy [

18,

19], and it is a measure of distance to complete uncertainty,

${p}_{i}=\frac{1}{N}$. In the context of quantum resource theory, this can be interpreted as the unique measure of information that is stable under a subset of “noisy operations” [

30]. In terms of our hidden information due to coarse graining:

In the realm of quantum mechanics, the previous picture changes because the information is stored in a different way, and the states can be entangled. The operator that represents the statistics of the physical states can be represented as a density matrix. The analogous entropy now refers to the entanglement entropy between different subsystems that were coarse-grained, and it is formulated by the von Neumann entropy [

20]:

Here, $\rho $ is the density matrix, and $\mathrm{tr}$ denotes the trace. This von Neumann entropy has to be conserved in unitary quantum evolution. However, there exists the possibility (or necessity in some cases) that we decide to “look” only at one part of the system and forget about the rest. This process is different when quantum properties have to be taken into account and will often correspond to “trace” over a specified subsystem. In this quantum case, it can also be seen that coarse graining corresponds to an increment of the entropy completely related with the “hidden” quantum information.

Other interesting questions emerge when the transition from quantum to classical systems is taken into account. Despite valiant efforts, this process, the emergence of classicality and its interpretation are still not completely well-understood. One of the strong possibilities comes from the idea of decoherence [

31,

32,

33,

34,

35,

36,

37], that is based on the separation of relevant and irrelevant degrees of freedom of the system, and removing the interference terms (off-diagonal terms of the density matrix) that precisely comes from the interaction of the emergent classical system (relevant degrees of freedom) with the environment (irrelevant degree of freedom). It is easy to see that this process arises from a specific process for coarse graining the system.

Therefore, in this approximation, the classical realm corresponds to a set of coarse-grained non-interfering histories (called decoherent histories approach to quantum mechanics) [

2,

37]. It is important to note that there also exist other interesting ideas about classical emergence, that also involve some notion of coarse graining (see, for instance, [

3,

4].)

In both cases, classical and quantum, it is important to understand that the “hidden information” corresponds to correlations that we decide not to look at with our particular coarse graining method. This implies that the measured entropy depends on the choice of the particular coarse graining developed; entropy is contextual and can be observer dependent. For this reason, it is interesting for the physical interpretation to have some knowledge of “where” the hidden information is hidden, “what” we hide, and how we hide it, in order to have complete control of our system. The importance of understanding these processes can be understood for instance in the study of some open questions related with information (as for instance, the black hole information puzzle [

21,

22,

23,

24,

25,

26,

27,

28,

29], where it is not completely well understood where the information flows during the Hawking evaporation process). In previous literature, coarse graining has been widely considered (for instance, in [

1,

2,

3,

4,

5,

6,

7,

8] and [

38,

39,

40,

41,

42,

43,

44,

45,

46]), but in most of those cases, coarse graining is assumed ab initio, preventing one from having a complete control of the system. In this paper, we shall study different quantifiable models of coarse graining the entropy, and the selection of a suitable tunable smearing parameter that will allow us to manipulate coarse graining in a controllable manner and understand what is hidden and what is correlated.

## 4. Coarse Graining the Classical Shannon Entropy

For a classical system with a finite, or at worst denumerable, number of states (a discretium of states), the Shannon entropy can be defined as:

There are many abstract theorems that can be proven for this quantity, but which will not be the main focus here [

47,

48,

49,

50]. In contrast, for a non-denumerable number of states (a continuum of states; say three-space for definiteness), it is better to write:

Here,

${\rho}_{*}$ is some arbitrary-but-fixed parameter that is needed for dimensional consistency; it is just a convenient normalizing constant that effectively sets the zero for

S (in particular, continuum entropy differences are always independent of

${\rho}_{*}$). Note that under re-scaling, we have the trivial result:

Below, we exhibit three forms of classical coarse graining suitable for three situations:

Continuum ⟶ continuum (independent of ${\rho}_{*}$);

Continuum ⟶ discretium; (not manifestly independent of ${\rho}_{*}$; care and discretion required);

Discretium ⟶ discretium (trivially independent of ${\rho}_{*}$).

#### 4.1. Continuum ⟶ Continuum: Diffusion-Based Forms of Coarse Graining

Starting from the probability density

$\rho \left(x\right)$, coarse-grain using some sort of convolution process:

Here,

$K(x,y;t)$ is some positive normalized kernel;

t is not necessarily physical time; it is simply a parameter that controls how far you smear out the probability distribution. This kernel can be viewed as representing a “defocussing” process. A delta-function initially concentrated at

${x}_{0}$, that is

$\delta (x-{x}_{0})$, becomes “smeared out” to a time-dependent and space-dependent “blob”

$K(x,{x}_{0},t)$. Now, from the basic definition of Shannon entropy (in the continuum), we have:

Even before specializing the kernel

K, we see:

Here, thanks to the normalizing condition, the RHS is actually independent of ${\rho}_{*}$.

Now, let us be more specific: let

K be an ordinary diffusion operator (that is, formally we can write

$K=exp\left\{t\sigma {\nabla}^{2}\right\}$), where

$\sigma $ is the diffusion constant. Then:

Therefore, in this ordinary diffusion situation,

Integration by parts (noting that

${\rho}_{*}$ drops out of the calculation) now yields:

Therefore, under diffusion-like-smoothing, entropy always rises as the smearing parameter

t increases. We can easily generalize this to position-dependent, direction-dependent (and even density-dependent) diffusivity. Suppose the kernel (and so the probability density) satisfies:

This now describes a generalized diffusion process, with

${\sigma}^{ij}(x,\rho \left(x\right))$ being a position-dependent, direction-dependent (and even density-dependent), diffusion “constant”. Then, under the same logic as previously:

with non-negativity holding as long as the matrix

${\sigma}^{ij}(x,{\rho}_{t}\left(x\right))$ is positive semidefinite. For either simple or generalized diffusion, we can then define:

Here, t is not necessarily physical time; it is simply the smearing parameter. If we view the diffusion process as a gedanken-process, a simple de-focussing of our vision described by the smearing parameter t, then the effect is clearly reversible, and ${I}_{\mathrm{correlations}}$ is simply the information hiding in the correlations that we can no longer see due to our deliberately blurred vision; but this is information we can easily regain simply by refocussing our vision. Overall, this construction yields a nice explicitly controllable entropy flow as a function of the coarse graining parameter t.

#### 4.2. Continuum ⟶ Discretium: Box-Averaging Forms of Coarse Graining

Starting from the continuum, let us now divide the universe up into a denumerable set of non-overlapping boxes

${B}_{i}$, which nevertheless cover the entire universe. Then, define individual probabilities

${p}_{i}$ for each box,

so that you are throwing away (agreeing not to look at) detailed information of the probability distribution inside each individual box. Now, compare the continuum and discretium entropies:

We shall consider two cases; a “simple” all-or-nothing box averaging and a more subtle box averaging where one first diffuses the probability density in the continuum before implementing box-averaging.

#### 4.2.1. Simple Box-Averaging

For “simple” box-averaging, where we fix individual boxes

${B}_{i}$, but do not have any tunable parameter to work with, we shall show that (unfortunately, the

${\rho}_{*}$ dependence is unavoidable):

To prove this, use the idea of (classical) relative entropy (a measure of distinguishability between two entropies). It is a standard result that if

$\rho \left(x\right)$ and

${\rho}_{0}\left(x\right)$ are any two normalized probability distributions, then:

Let us now define a “box-wise-constant” probability density

${\rho}_{B}\left(x\right)$ as follows:

(We do not care what happens on the box boundaries, simply because they are of measure zero.)

Then,

${\rho}_{B}\left(x\right)$ is certainly a normalized probability distribution,

$\int {\rho}_{B}\left(x\right)\phantom{\rule{0.277778em}{0ex}}{d}^{3}x=1$. Consequently:

However, since

${\rho}_{B}\left(x\right)$ is piecewise constant (in fact, box-wise constant) on the LHS, we then have:

and we see that

$S\left({\rho}_{B}\right)\ge S\left(\rho \right)$. However, we can actually do quite a bit more, since working in a slightly different direction, we have:

Now, using the normalization

${\sum}_{i}{p}_{i}=1$, we have:

However, note that we can interpret:

where

$\overline{V}$ is just the (probability weighted) geometric mean volume of the boxes

${B}_{i}$. That is, we have:

Rearranging, we have the rigorous inequality:

The RHS is actually independent of

${\rho}_{*}$. To verify this note:

Therefore, we are free to pick ${\rho}_{*}$ to taste. Three particularly useful options are:

Simply pick

${\rho}_{*}={\overline{V}}^{-1}$, in which case, we have:

If all boxes have the same volume

V, then automatically, we have

$\overline{V}=V$. In this situation, we simply pick

${\rho}_{*}={V}^{-1}$, in which case:

Alternatively, keep

${\rho}_{*}$ arbitrary, and simply live with the fact that the best we can say is:

The central result is always:

Subject to these technical quibbles, we can then define:

This is both independent of ${\rho}_{*}$ and guaranteed non-negative. Physically, this represents the “hidden information” encoded in the density-density correlations inside each of the boxes ${B}_{i}$, correlations that we are simply agreeing not to look into. Note that box-averaging in this context is an all-or-nothing proposition, there is no tunable parameter to adjust.

#### 4.2.2. Continuum Diffusion Followed by Box-Averaging

Now, let us introduce a tunable parameter

t, allowing the continuum probability density

$\rho \left(x\right)$ to evolve under some diffusion process (either physical or a gedanken-process), so that

$\rho \left(x\right)\to {\rho}_{t}\left(x\right)$ before box-averaging. Then, in terms of the (now time-dependent) boxed probability density:

we immediately have without any further need of calculation:

However, a more subtle result is this:

Here, we have used the fact that

${\rho}_{{B}_{t}}$ is box-wise constant as a function of

x and the positivity of relative Shannon entropy. This argument now guarantees that

$S\left({\rho}_{{B}_{t}}\right)$ is monotone non-decreasing as a function of the smearing parameter

t. If

$\overline{V}$ is held fixed (for example, if all boxes are the same size so that

$\overline{V}=V$), this furthermore implies

${\dot{S}}_{{B}_{t}}\ge 0$. However, in general, because:

we have:

Thus:

which in general is nonzero (but independent of

${\rho}_{*}$) without imposing extra conditions. Nevertheless, what we do have is already quite powerful: we always have the result that

$S\left({\rho}_{{B}_{t}}\right)$ is monotone non-decreasing, and whenever

$\dot{\overline{V}}=0$, we even have

${S}_{{B}_{t}}$ monotone non-decreasing as a function of the smearing parameter

t. We can then define the “hidden information” as:

This is both independent of ${\rho}_{*}$ and guaranteed non-negative. Physically, this represents the “hidden information” encoded in the now smearing-parameter-dependent density-density correlations inside each of the boxes ${B}_{i}$, correlations that we are simply agreeing not to look into. Note that box-averaging in this context thus yields a nice explicitly controllable entropy flow as a function of the coarse graining parameter t.

#### 4.3. Discretium ⟶ Discretium: Aggregation/Averaging Forms of Coarse Graining

Suppose we have a denumerable set of states with probabilities

${p}_{i}$ for which we define:

(This could be done for instance, after breaking the universe up into a denumerable set of boxes.) Let us now consider two ways of “aggregating” these probabilities.

#### 4.3.1. Naive Aggregation

Take any two states (any two boxes), say

a and

b, and simply average their probabilities, so that:

Note that this process leaves the number of states (boxes) fixed; it is inadvisable to merge the boxes, as that tends ultimately to decrease the entropy in a manner that is physically uninteresting. Then, set

${p}_{a}=\overline{p}\pm \u03f5$ and

${p}_{b}=\overline{p}\mp \u03f5$ (where we can always choose signs so that

$\u03f5\ge 0$). We have:

Thus, the result of aggregating (averaging) any two states in this manner is always to increase the Shannon entropy:

Note that the end point of this aggregation process occurs when all of the

${p}_{i}$ are equal, so

${p}_{i}=1/N$ and

$S\to lnN$, which is its maximum possible value. We can now define the “hidden information” as:

These are now correlations between the states (boxes) that we lumped together in the averaging process (this is at this stage an “all-or-nothing” process with no tunable parameter).

#### 4.3.2. Asymmetric Aggregation

A generalization of this notion of aggregation is to define an “asymmetric averaging”:

Then, ${\overline{p}}_{new}=\overline{p}$, the average is unaffected, whereas $|{p}_{a,\mathrm{new}}-{p}_{b,\mathrm{new}}|=|2\lambda -1\left|\phantom{\rule{0.277778em}{0ex}}\right|{p}_{a}-{p}_{b}|<|{p}_{a}-{p}_{b}|$. Therefore, the new ${p}_{a}$ and ${p}_{b}$ move “closer together”.

Then, we can again write

${p}_{a}=\overline{p}\pm \u03f5$ and

${p}_{b}=\overline{p}\mp \u03f5$, and in addition, we have

${p}_{a,\mathrm{new}}=\overline{p}\pm {\u03f5}_{2}$ and

${p}_{b,\mathrm{new}}=\overline{p}\mp {\u03f5}_{2}$, with both

$\u03f5$ and

${\u03f5}_{2}$ positive, and

${\u03f5}_{2}<\u03f5$. However, then, the same argument as used previously now gives:

Thus, the result of aggregating (averaging) any two states in this manner is always to increase the Shannon entropy. The rest of the argument follows as previously; with the same conclusions (this argument can also be rephrased in terms of the concavity of the Shannon entropy as a function of the

${p}_{i}$). We can again define the “hidden information” as:

These are now correlations between the states (boxes) that we lumped together in the asymmetric averaging process.

#### 4.4. Summary: Coarse Graining Classical Shannon Entropy

As we have just seen, it is possible to construct a number of reasonably well-behaved models for classical coarse graining of the Shannon entropy. Some of these models are “all-or-nothing”, without any tunable smearing parameter, whereas other models depend on a tunable coarse graining parameter (that we have chosen to designate t even when it might not represent physical time). The trickiest of these classical constructions is when one coarse-grains a continuum down to a discretium, since the normalizing parameter ${\rho}_{*}$ (which is essential to normalizing the continuum Shannon entropy) no longer appears in the discrete Shannon entropy. Which if any of these models is most relevant depends on the particular process under consideration; in all cases, these constructions provide a well-defined framework to use and allow explicit definition and determination of the hidden information.

## 5. Coarse Graining the Quantum Von Neumann Entropy

For quantum systems described by a normalized density matrix

$\rho $, the von Neumann entropy is:

There is no continuum versus discretium distinction to worry about, and since the density matrix is automatically dimensionless, similarly there is never any need to introduce a “reference density” ${\rho}_{*}$. There are still at least three interesting and fundamental notions of coarse graining to consider:

In

Section 7, we shall subsequently extend the constructions of this current section by the use of a variant of Hawking’s super-scattering operator to be introduced in

Section 6.

#### 5.1. Quantum Coarse Graining by Maximal Mixing

For simplicity, let the dimension of the Hilbert space be finite: N. Now, let ${I}_{N}$ denote the N-dimensional identity matrix. Then, $\frac{{I}_{N}}{N}$ is a properly normalized density matrix with entropy $lnN$. In fact, this is the maximally-mixed density matrix with maximum entropy given the dimensionality of the Hilbert space.

Now, consider this gedanken-process: Acting on the density matrix

$\rho $, to keep things properly normalized, take:

Here, one is using the smearing parameter s (which is certainly not physical time) to selectively ignore what would otherwise be physically accessible information; effectively one is “blurring” (that is, coarse graining) the system by throwing away (ignoring) position information in Hilbert space. (This is strictly a gedanken-process; we know of no physical way of implementing this particular mapping. Nevertheless, it is still appropriate to call this a (mathematical) coarse graining).

Now, use the fact that the von Neumann entropy is strictly concave in the density matrix:

whence:

Then, we can formally define the hidden information by setting:

This represents the amount of information that is temporarily lost in the

s-dependent blurring process, which in view of the gadanken-nature of the blurring process is recoverable simply by de-blurring one’s vision. In this construction, one also has a strict upper bound on the hidden information:

It is important to note that this partial maximal mixing procedure involves some subtleties in its interpretation. Compared to other more physical coarse graining procedures, maximal mixing has a more mathematical flavour. Nevertheless, it is still appropriate to call it coarse graining due it continues to satisfy the crucial feature of selectively ignoring some physical aspects of the system, which in turn leads to an effective amount of hidden information.

#### 5.2. Quantum Coarse Graining by Partial Traces

Many quantum information sources and texts hint at how taking partial traces is an example of “coarse graining”, but never quite seem to really get explicit about it. Below, we outline a few explicit constructions, some tunable, some “all-or-nothing”.

#### 5.2.1. Non-Tunable Partial Traces

Suppose the Hilbert space factorizes

${H}_{AB}={H}_{A}\otimes {H}_{B}$ (though the density matrix need not factorize), and define “simultaneous coarse grainings” onto the sub-spaces

${H}_{A}$ and

${H}_{B}$ by:

Here, the notation A:Beffectively means this: slide a (unphysical, purely mathematical) barrier between the sub-spaces

A and

B, so there is no longer any cross-talk between the two sub-spaces (notationally, it is more common to see notation such as

${S}_{A;B}$, but this seems to us to unnecessarily suggest an asymmetry between the two factor sub-spaces). The idea is that people “living in”

A agree not to look into

B and vice versa. Then, it is relatively easy to prove the quite standard result that:

and furthermore, to derive the “triangle inequalities”:

Combining, these results, we see:

The hidden information can be defined by:

This is just the completely usual notion of mutual information. The triangle inequalities give:

(so the hidden information is always non-negative), and furthermore:

so the hidden information is always bounded from above. Note that the current construction has no tunable parameter; the coarse graining is “all-or-nothing”, a feature that we shall now show how to evade.

#### 5.2.2. Tunable Partial Traces

Let us now consider this model:

This smoothly interpolates between the identity process at

$s=0$ and the factorized subspace process described in the previous subsection at

$s=1$ (the parameter

s is not physical time; it merely characterizes the amount of coarse graining; so this is at this stage a gedanken-process). Concavity of the von Neumann entropy yields:

whence:

By the triangle inequality, this implies:

On the other hand, because partial trace applied to

${\rho}_{s}$ yields

${\rho}_{s}\to {\rho}_{A}\otimes {\rho}_{B}$ (independent of the value of

s as long as it is in the range

$s\in [0,1]$), we have:

Now, consider the entropy flow under evolution of the parameter

s:

However, we can write this as:

Now, applying the Klein inequality for quantum relative entropies, the positivity of relative von Neumann entropy, we have:

Thus, this tuned entropy is nicely monotonic (non-decreasing) in the tuning parameter

s and is a good candidate for a (explicit and controllable) coarse graining process. We can then safely define the “hidden information” as:

There is still a nice lower bound:

guaranteeing positivity of the hidden information. On the other hand, the best general upper bound seems to be relatively weak:

#### 5.2.3. Asymmetric Partial Trace and Maximal Mixing

Here is a more brutal asymmetric combination of partial trace and maximal mixing:

Here, one is not just taking a partial trace over the subsystem B; one is also maximally mixing the subspace B. Therefore, this process is asymmetric in the way it treats Subsystems A and B (this is best thought of as a gedanken-process). To verify that entropy has increased in this “all-or-nothing” coarse graining process, we simply note that:

where, specifically, it is easy to see that:

The hidden information can easily be defined by:

and is guaranteed non-negative. There is also an upper bound:

This seems to be the best upper bound one can extract in this context.

#### 5.2.4. Asymmetric Partial Trace Plus Maximal Mixing, with Tunable Parameter

Starting from the previous case, now consider the effect of adding a tunable parameter:

(This is again best thought of as a gedanken-process, rather than a physical coarse graining.) By concavity of the von Neumann entropy:

so that:

To establish an upper bound, note that under the process envisaged in the previous

Section 5.2.3 (asymmetric partial trace and mixing), we have (now for all

s in the appropriate interval):

so by the arguments in the previous

Section 5.2.3, we have:

Checking monotonicity in the parameter

s now requires that we investigate:

To proceed, we adapt the discussion in

Section 5.2.2 (tunable partial traces). We note we can write the above as:

Now, applying the Klein inequality, the positivity of relative von Neumann entropy, we have:

Thus, this partial trace plus maximal mixing tunable entropy is nicely monotonic (non-decreasing) in the tuning parameter

s and is another good candidate for a (explicit and controllable) coarse graining process. Then, for the “hidden information”, we can safely define:

We note:

thereby guaranteeing non-negativity and a nice lower bound. The best general upper bound in this context seems to be:

#### 5.3. Quantum Coarse-Graining Defined by Decoherence

Choose any fixed, but arbitrary basis for the Hilbert space, and let

${P}_{a}$ be the corresponding one-dimensional projection operators onto these basis elements. Therefore, we have:

Now, use these projection operators to define decoherence (relative to the specified basis). Decoherence is typically viewed as a physical process associated with a semi-classical environment associated with the specified basis, but we could also think of it as a gedanken-process where we are just agreeing to ignore part of the density matrix (for a general background on decoherence, see for instance, [

31,

32,

33,

34,

35,

36,

37]).

#### 5.3.1. Full Decoherence

Define full decoherence as follows:

(This preserves the

$\mathrm{tr}\left(\rho \right)=1$ condition and kills off all non-diagonal elements in the density matrix.) When viewed as a gedanken-process, one is effectively agreeing to not look at the off-diagonal elements. Now, in this particular, case concavity does not tell one anything useful:

One just obtains the trivial bound

$S\left({\rho}_{D}\right)\ge 0$. Considerably more interesting in this context is the relative entropy inequality (the Klein inequality), which states that for any two density matrices

$\rho $ and

$\sigma $:

In particular:

which can be rearranged to yield:

Now,

${\rho}_{D}$ is diagonal, so

$ln{\rho}_{D}$ is diagonal, so in the trace in the LHS above, only the diagonal elements of

$\rho $ contribute:

Therefore, overall, we see that

$S\left({\rho}_{D}\right)\ge S\left(\rho \right)$; the decoherence process is entropy non-decreasing (as it should be). In a completely standard manner, we now define the hidden information as:

where the hidden information is hiding in the off-diagonal correlations that one has agreed to not look at in implementing the decoherence process. The best upper bound seems to be:

The full decoherence of the off-diagonal terms allows us to completely forget about quantum effects; the coarse-grained system is effectively classical (albeit a classical stochastic system).

#### 5.3.2. Partial (Tunable) Decoherence

Now, define partial decoherence as follows:

(This preserves the

$\mathrm{tr}\left(\rho \right)=1$ condition, but now only partially suppresses all non-diagonal elements in the density matrix.) We note that concavity implies:

By using the Klein inequality:

However, since in the current situation,

${\rho}_{D}$ is diagonal (and so,

$ln{\rho}_{D}$ is also diagonal) and since all of the diagonal elements of

${\rho}_{{D}_{s}}$ are by construction constants independent of

s, this implies:

Thus,

$S\left({\rho}_{D}\right)\ge S\left({\rho}_{{D}_{s}}\right)$, which has the nice interpretation that partial decoherence involves less entropy than full decoherence. Overall, we obtain the chain of inequalities:

We can do even more and demonstrate that under partial decoherence, the entropy increases monotonically in the control parameter

s. We note:

Again appealing to the Klein inequality, we have

$\mathrm{tr}({\rho}_{D}ln\left({\rho}_{{D}_{s}}\right))\le \mathrm{tr}({\rho}_{D}ln\left({\rho}_{D}\right))=-S\left({\rho}_{D}\right)$, whence:

Therefore, again, we have a nice calculable monotonically non-decreasing explicit model for coarse graining. In a completely standard manner, we now define the hidden information as:

where the hidden information is hiding in the off-diagonal correlations that have been partially suppressed because one has agreed to not look at them too carefully in implementing the decoherence process. The best generic upper bound seems to be:

Thus, we have obtained the same upper bound as in the previous subsection, where we completely suppressed the off-diagonal terms.

#### 5.4. Summary: Coarse Graining Quantum Von Neumann Entropy

In the current section, we have built three distinct explicit mathematical/physical models of quantum coarse graining of the von Neumann entropy. These models were based on maximal mixing; on partial traces (four sub-varieties); and on decoherence (two sub-varieties), seven sub-varieties in total. Some of these models were “all-or-nothing” (untunable) models; others have an explicit coarse graining parameter

s, and for these tunable models, we carefully proved that the entropy was monotonic in the smearing parameter. This already gives one a much more concrete idea of what coarse graining in a quantum context should be taken to be. In the next two sections, we shall extend these ideas first by showing that these coarse graining processes can be viewed as specific examples of Hawking’s super-scattering operator [

21] and then apply super-scattering to define a notion of diffusion in Hilbert space.

## 6. Hawking’s Super-Scattering Operator

In an attempt to develop a non-unitary version of quantum mechanics, Hawking introduced the notion of a super-scattering operator

$\$}$; see [

21]. At its most basic, a super-scattering operator is taken to be a trace-preserving linear mapping from density matrices to density matrices:

$\rho \to {\displaystyle \$}\rho $, with

$\mathrm{tr}\left(\rho \right)=1$ implying

$\mathrm{tr}\left({\displaystyle \$}\rho \right)=1$. A proper super-scattering operator is one that cannot simply be factorized into a tensor product of unitary scattering matrices

$\mathbb{S}$. That is

$\$}\ne {\mathbb{S}}^{\u2020}\otimes \mathbb{S}$, implying

$\rho \to {\displaystyle \$}\rho \ne {\mathbb{S}}^{\u2020}\phantom{\rule{0.166667em}{0ex}}\rho \phantom{\rule{0.166667em}{0ex}}\mathbb{S}$. While these ideas were originally mooted in a black hole context, there is nothing intrinsically black hole related in the underlying concept. It is sometimes implicitly demanded that super-scattering operators be entropy non-decreasing

$S\left({\displaystyle \$}\rho \right)\ge S\left(\rho \right)$, but one could just as easily demand linearity and trace-preservation as the key defining characteristics and then divide super-scattering operators into entropy non-decreasing, entropy non-increasing, and other. This point of view has the advantage that the inverse of a super-scattering operator

${{\displaystyle \$}}^{-1}$, if it exists, is also a super-scattering operator.

To add to the confusion, depending on the sub-field, people often use different terminology for essentially the same concept, such as “trace-preserving (completely) positive operators”, or “quantum maps”, or “quantum process”, or “quantum channels”. Usage (and precise definition) within various sub-fields is (unfortunately) not entirely standardized. We will try to keep things simple with a minimal set of assumptions and with terminology that is likely to resonate widely with the larger sub-communities of interest.

Perhaps the simplest super-scattering process one can think of is this: pick a set of positive numbers

${p}_{i}$, such that

${\sum}_{i}{p}_{i}=1$ (which you might like to think of as classical probabilities) and a set of unitary matrices

${U}_{i}$. Now, consider the mapping:

This is not a unitary transformation; it is instead an “incoherent sum of unitary transformations”. However, note that the process is linear and trace preserving. Then, for this particular super-scattering process, we have:

By concavity of the von Neumann entropy:

That is, this particular super-scattering operator is entropy non-decreasing, ${S}_{\{p,U\}}\ge S$. Unfortunately, we do not have any nice monotonicity result describing how ${S}_{\{p,U\}}$ grows as a function of the ${p}_{i}$ and the ${U}_{i}$, this is again an “all-or-nothing scenario” (indeed, systematically finding constructions for which we are always guaranteed a nice monotonicity result is the topic of the next section).

As an aside, we point out that if one chooses the

${p}_{i}$ and

${U}_{i}$ from some classical stochastic ensemble, then one should work with (classical) expectation values:

A suitable choice of the (classical) statistical ensemble for the

${p}_{i}$ and

${U}_{i}$ could then be used to build the decoherence operator

$\rho \to {\rho}_{D}$ described in

Section 5.3.1 above.

In a rather different direction, many of the coarse graining processes considered in

Section 5 above can be reinterpreted as super-scattering processes. For instance, in addition to the construction given immediately above, the following are all entropy non-decreasing super-scattering processes:

**Name** | **Definition** | **Condition** |

Maximal | $\phantom{|}\rho \to {{\displaystyle \$}}_{M}\phantom{\rule{0.166667em}{0ex}}\rho =(1-s)\rho +s\phantom{\rule{0.277778em}{0ex}}\frac{{I}_{N}}{N}}$ | $s\in [0,1]$. |

Partial/maximal | $\phantom{|}\rho \to {{\displaystyle \$}}_{P}\phantom{\rule{0.166667em}{0ex}}\rho ={\rho}_{A}\otimes \frac{{I}_{B}}{{N}_{B}}}$ | - |

Partial/maximal/tunable | $\phantom{|}\rho \to {{\displaystyle \$}}_{{P}_{s}}\phantom{\rule{0.166667em}{0ex}}\rho =(1-s)\rho +s{\rho}_{A}\otimes \frac{{I}_{B}}{{N}_{B}}}$ | $s\in [0,1]$. |

Full decoherence | $\phantom{|}\rho \to {{\displaystyle \$}}_{D}\phantom{\rule{0.166667em}{0ex}}\rho =\sum _{a}{P}_{a}\phantom{\rule{0.277778em}{0ex}}\mathrm{tr}\left({P}_{a}\rho \right)}$ | - |

Partial decoherence | $\phantom{|}\rho \to {{\displaystyle \$}}_{{D}_{s}}\phantom{\rule{0.166667em}{0ex}}\rho =(1-s)\rho +s\left\{\sum _{a}{P}_{a}\mathrm{tr}\left({P}_{a}\rho \right)\right\}}$ | $s\in [0,1]$. |

Incoherent sum | $\phantom{|}\rho \to {{\displaystyle \$}}_{\{p,U\}}\phantom{\rule{0.166667em}{0ex}}\rho =\sum _{i}{p}_{i}\phantom{\rule{0.277778em}{0ex}}{U}_{i}^{\u2020}\phantom{\rule{0.166667em}{0ex}}\rho \phantom{\rule{0.166667em}{0ex}}{U}_{i}}$ | - |

Oddly enough, it is the partial trace process ${\rho}_{AB}\to {\rho}_{A}\otimes {\rho}_{B}$ that is not a super-scattering process. The problem is that while the individual partial traces ${\rho}_{AB}\to {\rho}_{A}$ and ${\rho}_{AB}\to {\rho}_{B}$ are both linear functions, the direct-product reassembly process ${\rho}_{A}\otimes {\rho}_{B}$ is not linear. Therefore, ${\rho}_{AB}\to {\rho}_{A}\otimes {\rho}_{B}$ is not a linear mapping; it is not a super-scattering process. However, when viewed as a nonlinear process ${\rho}_{AB}\to \mathcal{N}{\rho}_{AB}={\rho}_{A}\otimes {\rho}_{B}$, the nonlinear operator $\mathcal{N}$ is idempotent, ${\mathcal{N}}^{2}=\mathcal{N}$. We will come back to this point later.

## 7. Diffusion on Hilbert Space

We shall now develop the notion of diffusion on Hilbert space, similar to what we did in physical space when considering continuum Shannon entropies. There are two routes to consider:

#### 7.1. Super-Scattering-Based Diffusion

Now, consider the class of super-scattering operators

${{\displaystyle \$}}_{*}$ that are entropy increasing:

Certainly, the decoherence super-scattering operator

${{\displaystyle \$}}_{D}$ falls in this class; so does

${{\displaystyle \$}}_{{D}_{s}}$. Ditto for the other super-scattering operators (based on maximal mixing, partial traces and the like), considered above:

${{\displaystyle \$}}_{M}$,

${{\displaystyle \$}}_{P}$,

${{\displaystyle \$}}_{{P}_{s}}$ and

${{\displaystyle \$}}_{\{p,U\}}$. In particular, such entropy non-decreasing super-scattering operators must leave maximally mixed states maximally mixed:

For any one of these entropy non-decreasing

${{\displaystyle \$}}_{*}$ operators, we now define:

Then, for the von Neumann entropy, we have:

However, the relative entropy inequality (the Klein inequality) is:

so automatically:

whence:

That is, as long as the super-scattering operator

${{\displaystyle \$}}_{*}$ is guaranteed entropy non-decreasing, then so is the Hilbert-space diffusion process

$\rho \to {\rho}_{t}={e}^{-t}{e}^{t{\$}_{*}}\rho $. Since the entropy is always non-decreasing, the only thing that

${\rho}_{t}$ can possibly converge on as

$t\to \infty $ is the maximally mixed state

$\frac{{I}_{N}}{N}$ with the maximal entropy

$lnN$. That is, the compound object:

is itself a super-scattering process that can be thought of as a diffusion operator on Hilbert space, corresponding to a “Laplacian”-like operator

$\Delta ={{\displaystyle \$}}_{*}-{I}_{N\otimes N}$, which has the property

$\mathrm{tr}(\Delta \rho )=0$ (in analogy with the property

$\int \Delta \rho \left(x\right)\phantom{\rule{0.277778em}{0ex}}{d}^{3}x=0$ holding in the continuum for classical probability distributions). As usual, we can define the hidden information as:

which is guaranteed positive and monotonic in the coarse graining parameter

t. The

t parameter may or may not represent physical time; we could view the entire construction as a gedanken-process in which case,

t would just be a tuning parameter giving explicit control of the extent of coarse graining. The only generic upper bound on the hidden information in this context seems to be:

#### 7.2. Partial Trace Based Diffusion

Recall that the partial trace process can be viewed as a nonlinear process

${\rho}_{AB}\to \mathcal{N}{\rho}_{AB}={\rho}_{A}\otimes {\rho}_{B}$. The nonlinear operator

$\mathcal{N}$ is idempotent,

${\mathcal{N}}^{2}=\mathcal{N}$. Now, consider:

Here, we have set

$s=1-{e}^{-t}\in [0,1]$ as

$t\in [0,\infty )$. That is, a diffusion-like process based on the nonlinear operator

$\mathcal{N}$ recovers the process of quantum coarse graining by tuning the partial traces that we had previously considered; see Equation (

71). That is, the compound object:

while it represents a nonlinear process when acting on density matrices, nevertheless it has formal similarities to a diffusion process. As usual, we can define the hidden information as:

which is guaranteed positive and monotonic in the coarse graining parameter

t. Again, the

t parameter may or may not represent physical time; we could view the entire construction as a gedanken-process in which case

t would just be a tuning parameter giving explicit control of the extent of coarse graining. The only generic upper bound on the hidden information in this context seems to be:

## 8. Discussion

We see that we can model the coarse graining procedure, in a quantifiable and controllable manner, by following different methods depending on what aspect of the system one is taking into account. Classically, when considering the Shannon entropy, we have studied coarse graining allowing the system change (or not) between a continuum and discretium character and also differentiating situations where we introduce a tunable smearing parameter to track the change from continuum to discretium. We have studied coarse graining that involves diffusion, density-density correlations in a box (depending on a smearing parameter or not) and correlation between the states of two boxes, obtaining in every cases a controllable and monotonic flux of entropy.

On the other hand, we have considered the quantum von Neumann entropy, coarse-grained by maximal mixing or by partial traces, tunable or not, asymmetric in the treatment and also combined with maximal mixing. That corresponds to blurring process depending on a smearing parameter and the choice of separate subsystems that “decide” not to look at each other. In all cases, we have found a positivity of entropy flow and a generic upper bound.

We have also studied the coarse graining induced by decoherence, either partial or complete, allowing us to either remove all off-diagonal terms or control the extent to which we partially suppressed them. We have obtained also a consistent definition of the hidden information and an upper bound for it, one that corresponds (as it should) to complete decoherence.

We have introduced an analogy with the Hawking’s super-scattering operator that allows us to define a generic class of quantum coarse grainings. Finally, we have developed a notion of diffusion in Hilbert space, defined by means of super-scattering or partial traces, again finding a consistent and controlled definition of the entropy flow.

It is important to note that all of these models give an explicit and calculable idea of how the entropy flow increases and can be controlled when we coarse grain a system. These models can be useful tools in order to study some physical processes concerning information issues; such as the well-known information puzzle in black hole physics [

21,

22,

23,

24,

25,

26,

27,

28,

29], the ontological status of Bekenstein’s horizon entropy [

51,

52,

53,

54,

55,

56] and the very existence of horizons themselves [

57,

58,

59].