Entropy of the Canonical Occupancy (Macro) State in the Quantum Measurement Theory

The paper analyzes the probability distribution of the occupancy numbers and the entropy of a system at the equilibrium composed by an arbitrary number of non-interacting bosons. The probability distribution is obtained through two approaches: one involves tracing out the environment from a bosonic eigenstate of the combined environment and system of interest (the empirical approach), while the other involves tracing out the environment from the mixed state of the combined environment and system of interest (the Bayesian approach). In the thermodynamic limit, the two coincide and are equal to the multinomial distribution. Furthermore, the paper proposes to identify the physical entropy of the bosonic system with the Shannon entropy of the occupancy numbers, fixing certain contradictions that arise in the classical analysis of thermodynamic entropy. Finally, by leveraging an information-theoretic inequality between the entropy of the multinomial distribution and the entropy of the multivariate hypergeometric distribution, Bayesianism of information theory and empiricism of statistical mechanics are integrated into a common “infomechanical” framework.


I. INTRODUCTION
The standard approach to the computation of entropy of systems of indistinguishable particle is to subtract log(N !), where N is the number of system's particles, to the entropy of system's microstates.However, when the entropy of microstates is smaller than log(N !), e.g. at temperature low enough, the difference between entropy of microstates and log(N !) becomes negative.Since the entropy of any random variable is guaranteed to be always non-negative, entropy of microstates minus log(N !) cannot be entropy.This pathology is not surprising because, after the subtraction of log(N !), it is no more specified the random variable and the probability distribution the resulting formula refers to.Our crucial observation is that the random variable is the vector of the occupancy numbers.This pushes us to work out the probability distribution of the occupancy numbers of a system of bosons at the equilibrium.The paper also shows that the empirical distribution derived from the modern quantum approach to thermodynamic of [2]and [3] converges to the information-theoretic distribution derived from Jaynes MxEnt principle [1].

II. NOTATION AND TERMINOLOGY
Let the eigenstates allowed to a quantum particle be identified by the quantum numbers belonging to the set C (the set of colors), where, here and in what follows, the blackboard bold character denote discrete sets and |C| is the number of elements of C. Consider a system made by N such non-interacting particles (the N colored balls).The Hamiltonian of the system is where C N is the Cartesian product of C with itself N times, the eigenvector |c N (the overline denotes vectors and the size of the vector will be omitted for brevity in the following when it is unnecessary) is c i is the color of the i-th particle, and where ǫ(c) is the c-th energy eigenvalue of the single-particle Hamiltonian, in other words, the energy of color c.In the first quantization formalism, the set of eigenkets {|c , c ∈ C N } is a complete orthonormal basis for the Hilbert space C N of the system.In statistical mechanics, c is the microstate of the system of distinguishable particles.N particles of the same species can be distinguishable, for instance, when they are confined in N identical but distinct regions of space.Let the particles be bosons and call n(c, c) the occupancy number of color c, that is the number of bosons of microstate c whose color is c: where δ(•) is the indicator function: In what follows, the dependency on the microstate will be omitted when what matters are only the occupancy numbers.Also, it is understood that the occupancy numbers obey to the constraint For finite |C| and N , the size |N| of the set N spanned by the vector n of the occupancy numbers is the negative binomial coefficient [4]: In statistical mechanics, the occupancy macrostate, or, in short, macrostate, denoted C N (n), is the set of microstates whose occupancy numbers are n.The number of elements of which is equal to the number of distinct permutations of the elements of a vector c whose occupancy numbers are n.The elements of the set In the second quantization formalism, the occupancy quantum state |n of a system of N bosons that are non-interacting between them is specified by the occupancy numbers: see [5] for a derivation of (5) based on standard arguments.The |N| states of the set {|n , n ∈ N} form a complete set of orthogonal basis states for the bosonic Hilbert subspace N of the Hilbert space C N , N ⊆ C N .The Hamiltonian operator ĤN of the bosonic system is the projection of the Hamiltonian operator ĤC N onto the bosonic subspace: where

III. EMPIRICAL APPROACH
Let us call "universe" the union of system of interest and environment.In our analogy between the system of interest and the N colored balls, the universe is represented by the colored balls contained in the urn from which the N colored balls are drawn, while the environment are the colored balls that remain inside the urn after the extraction.In the following we use U and E in the subscript to identify the universe and the environment, e,g, N U is the number of bosons of the universe, while we don't put any subscript for the system of interest, so N U ≥ N , N U ≥ N E .Following [2], we do not limit the scope of our analysis to the thermal state, rather we consider the system in a generalized canonical state of equilibrium with the environment.Let us consider the case where the pure state of the universe is a bosonic eigenstate |n U , and let nU play the role of a vector of known parameters in the mixed state of the system of interest.The case of pure states of the universe that are linear combination of bosonic eigenstates will be discussed later on in the paper.
Let us consider the mixed state and write the pure state in the form where N(n U ) is the set of macrostates of the system of interest compatible with nU .The partial trace of the outer product Since (c, cE ) and (c ′ , c′ E ) belong to the same macrostate nU of the universe, there are W (n E ) = W (n U − n) microstates cE that are equal to c′ E when both c and c′ belong to the same macrostate of the system of interest, while δ(c E − c′ E ) is always zero when c and c′ belong to different macrostates of the system of interest, because in this case also cE and c′ E must belong to different macrostates of the environment, so they cannot be equal.Based on this reasoning, we conclude that where the uppercase calligraphic character denotes random variables and p X (x) is the probability that X = x.The probability distribution {p N (n, nU )} defined by the equality (11), is the multivariate hypergeometric distribution.This result is expected: the multivariate hypergeometric is the distribution of occupancy numbers of colors in drawing without replacement N balls out of an urn containing N U colored balls.For large but finite N U and N U ≫ N , tail bounds on the probability of deviations of N (c) from its expectation can be found in [6].
Writing the joint probability distribution of microstates in the form of a chain of conditional probability distributions, we promptly recognize that The first ring of the chain is the empirical one-particle distribution, that is the relative number of balls of color c = c 1 among the N U balls, which, as expected, is Using Stirling's formula and the Law of Large Numbers (LLN) we compute that is the generalized canonical distribution of microstates.Substituting (15) in the multivariate hypergeometric distribution we find that the generalized canonical distribution of the occupancy numbers is the multinomial distribution: For N U → ∞, the dependency on the absolute occupancy numbers of the universe that characterizes (9) turns, thanks to the LLN, into the dependency on the relative occupancy numbers of the universe.As shown by ( 16), the consequence is that microstates become independent and identically distributed (i.i.d.) and that, by the weak LLN, the empirical oneparticle distribution and, consequently, the mixed state ν of ( 17), are the same for almost every bosonic eigenstate of the universe compatible with the constraints imposed on the universe, e.g., volume, temperature, or, equivalently, expected energy (here and in what follows pC is a shorthand for {p C (c)}).
The multinomial distribution defined by equality (17) is the occupancy distribution of colors in drawing with replacement of N balls out of an urn containing colored balls with relative frequency distribution of colors in the urn equal to pC .Actually, by the LLN, drawing without replacement tends to drawing with replacement as the number of balls contained in the urn tends to infinity.For large but finite N , convergence in the weak sense of N −1 N to pC is studied by concentration inequalities that bound the probability of deviations of N −1 N from pC , see [7] for recent advances on the subject.

IV. BAYESIAN APPROACH
In the Bayesian approach, the vector of known parameters nU becomes the vector of random parameters NU .The multivariate hypergeometric distribution (11) is the Bayesian likelihood {p N | NU (n, nU )}, that is a conditional distribution where the occupancy numbers of the universe play the role of random conditions, the distribution {p NU (n U )} of the random occupancy numbers of the universe is the Bayesian prior, the sought distribution {p N (n)} of system's occupancy numbers is the Bayesian marginal.System's mixed state is obtained by tracing out the environment from the mixed state of the universe characterized by the prior.Substituting the multinomial distribution for the prior and the partial trace (10), after straightforward manipulations we find that, whichever is the number of particles of the universe, the marginal that characterizes the mixed state of the system of interest is the multinomial distribution.This shows that the Bayesian approach is self-consistent, in the sense that the mixed state of any sub-system is always multinomially distributed when the mixed state of the system is multinomially distributed, or, equivalently, when the distribution of microstates is i.i.d., hence when it maximizes the Sahnnon entropy of microstates for the given one-particle distribution pC .The one-particle distribution, in place of being found empirically by ( 13) from the knowledge of the occupancy numbers of the universe, in Jaynes' information-theoretic approach, the famous (and debated) MaxEnt approach, is found by maximization of the oneparticle entropy under the constraints imposed on the system.Theorem 2 of [8] proves that multinomial models maximize Shannon's entropy of the occupancy distribution constrained to N pC , see also [9] for the multinomial distribution as the MaxEnt distribution in statistical mechanics.

V. ENTROPY
The Shannon entropy H X of the random vector X is the expectation of the surprise H( X ): where is the classical expectation operator over the random variable X inside the argument of the deterministic function f (•).In physics, log(x) is the natural logarithm of x and the Boltzmann constant, which is in front of the logarithm, is omitted here for brevity.The random H( X ) is called surprise because it reflects the surprise that the experimenter experiences when the result of his experiment is X .It is based on a probability distribution, but it is not an expectation, it is a property of the specific result X of the experiment.As such, the surprise H( N ) can be regarded as the pre-measurement physical entropy of the system that the measurement finds in state | N .The use of a quantum equivalent of the surprise is not standard.We suggest that, in analogy to classical entropy, one could use the surprise for the eigenvalues of the following entropy observable Ŝν : where the second equality of the last line follows from orthogonality of the bosonic eigenstates.Note that we completely skip the notion of phase space, leading to the exact probability distribution (17) of the quantum occupancy numbers, and, as a consequence, to the exact surprise and to the exact Shannon entropy.Conversely, the standard phase space approach inherently leads to approximations to entropy, that ask for improvement at low temperature/density ratio, see e.g.[10], still remaining approximations.
The entropy of a probability distribution of the form ( 11) is H C is the Shannon entropy of microstates, hence of the distinguishable particles.The conditional Shannon entropy H C| N is due to indistinguishability of particles.Indistinguishability of particles prevents the access to log(W ( N )) units of information, whose expectation is is just the term that is subtracted to the Shannon entropy of distinguishable particles in (21).The term log(N !) in ( 22) was introduced by Gibbs to make the non-quantized phase-space (differential) entropy of systems of indistinguishable particles compatible with his paradox.We observe that, while the probability that two or more particles have the same position and momentum is zero because position and momentum are dense variables, the probability that two or more particles occupy the same quantum state is not zero.This probability leads to the sum of expectations in (22).As the entropy of microstates becomes lower and lower, this sum becomes closer and closer to log(N !), till becoming equal to it when all the particles occupy the ground state.This prevents system's entropy to become negative, as it happens for instance with the Sackur-Tetrode formula, also when the entropy of microstates becomes vanishingly small.Equality (21) is equation ( 11) of [11], where the authors call the entropy of the distribution of the occupancy numbers entropy fluctuations.Apart of certain exceptions, the authors of [11] consider these "entropy fluctuations" negligible compared to the "entropy" of the system, failing to recognize that the entropy of the occupancy numbers is the thermodynamic entropy of a system of indistinguishable particles.When microstates are i.i.d.random variables we have The following sequence of inequalities sandwiches the Boltzmann entropy log(W (N pC )) between the two terms that contribute to H N : where, with some abuse of notation, here and in what follows the factorials of the real numbers in the denominator of W (N pC ) are intended as x! = Γ(x + 1), where Γ(•) is the Gamma function.The first inequality is (11.22) of [4], the second inequality is obtained by applying the Jensen inequality In statistical mechanics it is standard to derive from Stirling's formula an approximation between the two terms of (24).The expectation appearing in ( 22) is see [12], see [13] for the calculation of the above expectation in integral form, see also [11] for approximations to the entropy of the multinomial distribution in the context of statistical mechanics.
Before concluding this section we remark that, if we pretend that entropy is a variable of state, then the probability distribution of the occupancy numbers must depend only on the state of the system.However, the multivariate hypergeometric distribution depends also on the state of the universe.The dependency becomes weaker and weaker as the number of particles of the universe tends to infinity, but it remains that this makes the empirical approach incompatible with the notion of entropy as variable of state: entropy can be a variable of state only if we renounce to the empirical approach.

VI. TWO EXAMPLES
In the case of an ideal monoatomic dilute gas in a cubic container of side L, one particle of the gas is modelled as a quantum "particle in a box" with three degrees of freedom, whose energy eigenvalues with aperiodic boundary conditions are where c consists of the three quantum numbers (c x , c y , c z ), m is the mass of the particle and h = 6.626 • 10 −34 J • s is the Planck constant.When the gas is at the thermal equilibrium at temperature T Kelvin degrees with the heat bath, maximization of entropy with constrained temperature leads to the Boltzmann distribution for pC and, by the i.i.d.assumption, to the multinomial distribution where k B = 1.38 • 10 −23 J/K is the Boltzmann constant and Z is the one-particle partition function: When the temperature-to-density ratio is high, it becomes possible to employ two approximations.In the first one, the partition function is approximated to an integral, see eqn.19.54 of [14], leading to In the second one, the sum of expectations in ( 22) is neglected and, for large number of particles, log(N !) is approximated to N log(N )−N by Stirling's formula.The result is the textbook Sackur-Tetrode entropy formula: A detailed numerical analysis of the entropy of the ideal gas can be found in [15], [16].
As a second example, consider a container that contains a particle of gas at the thermal equilibrium, and divide the container into two sub-containers of equal size by inserting a wall.If the wall functions as a piston, system's state is the second state of a one-particle Szilard engine.The state of the system after the insertion of the wall is represented by the joint random variable C = (B, C ′ ) where pC ′ is the Boltzmann distribution of the states of one particle in the subcontainer and B is a binary random variable independent of C ′ that indicates which of the two containers the measurement will localize the particle in.
The entropy of the state after the insertion of the wall is where the famous log(2) of Landauer [17] comes from the random variable B. We have numerically evaluated the partition function of the Boltzmann distribution with the parametrization of [18], that is mass of the particle m = 9.11 • 10 −31 kg, temperature T = 300 K, and one-dimensional box of size L = 20 • 10 −9 m.We obtain that the entropy of the single particle with one degree of freedom before the insertion of the piston is 1.988 in k B units, while with size of the onedimensional box equal to 10•10 −9 m, that is, after the insertion of the piston, the entropy in k B units is 1.243, leading to the difference 1.988 − 0.693 − 1.243 = 0.052, in excellent agreement with the entropy fall shown in Fig. 3 of [18], where the result is derived by the phase space approach.
In the case of N particles, B is a binomial random variable (N, V ′ /V ), where V is the total volume, V ′ is the volume of one of the two sub-containers.The probability distribution of the macrostate is

VII. DISCUSSION
So far, we have proved that, for N U → ∞, system's state converges to canonicality for almost every bosonic eigenstate of the universe.We hereafter disprove the more general claim of [2], [3] that, for N U → ∞, system's mixed state converges to canonicality for almost every pure state of the universe.First we present a counterexample, then we discuss why the arguments of [2], [3] fail.
Suppose that bosons can occupy two states, let N U = 2M U + 1 and let Consider N = 1 and use the first quantization formalism to express the state of the one-particle system.Computing the partial traces and taking the limit for M U → ∞ one finds This example shows that, when the universe is not in a bosonic eigenstate, the presence of terms coming from the cross outer products |n U n′ U | prevents system's mixed state to converge to canonicality.
The claim of [2] and [3] is based on typicality of microstates of the universe, and for this reason convergence to canonicality of the system is called canonical typicality in [3].When applied to the surprise of i.i.d.microstates, the LLN leads to where, here and in the rest of this section, N is the number of particles of the universe and η is an arbitrarily small positive real number.For any N and η, the set of microstates that satisfy the inequality (31) is the information-theoretic typical set, which, for small η and N large enough, in statistical mechanics is the set of the accessible microstates.Paper [19] uses the properties of the information-theoretic typical set to characterize weak convergence to equiprobability of the accessible microstates.In the geometrical view of typicality, the distribution of microstates tends to be uniform over a spherical shell of the Hilbert space that surrounds the surface of the sphere of the Hilbert space determined by the constraints imposed on the universe.As N increases, the relative thickness of the spherical shell (i.e., the thickness of the shell divided by the radius of the sphere) diminishes, becoming zero in the thermodynamic limit.Thanks to (23) and (31), the consideration of the relative thickness of the spherical shell basically is enough to capture entropy of microstates.However, the increasing absolute randomness of the occupancy numbers, that in the geometrical view is represented by the absolute thickness of the spherical shell, makes the entropy of macrostates bigger and bigger as N grows, because that spherical shell of increasing absolute thickness contains a number of bosonic eigenstates that increases with N , none of which dominates in probability over the others.Therefore, unlike what happens with microstates, when macrostates are considered, the absolute thickness of the spherical shell cannot be ignored, elsewhere we miss the concept itself of entropy of macrostates.Actually, looking at the relative thickness is how to look at the entropy per particle, and it is easy to see that Papers [2] and [3], as virtually all the textbooks and the research papers of statistical mechanics, guided by the idea of capturing properties that descend from typicality of microstates, claim more or less explicitly that, in the thermodynamic limit, all the properties of the system are preserved if the spherical shell is identified with the surface of the sphere.As shown by the previous discussion about entropy of microstates and entropy of macrostates, this is true when dealing with properties of microstates, but this is no more true when dealing with properties of macrostates.Specifically, since at most one bosonic eigenstate can lie exactly on the surface of the sphere, if the surface is considered in place of the spherical shell, then at most one bosonic eigenstate will survive.We conclude that the consideration of the surface of the sphere in place of the spherical shell wrongly knocks out all the bosonic eigenstates excepting at most one, and, with them, it wrongly knocks out also all the cross outer products between two different bosonic eigenstates of the previous counterexample.

VIII. CONCLUSION
Entropy is a macroscopic property of a physical system and, at the same time, it is a mathematical property of a random variable/vector.Given this, entropy must be a property of system's random macrostates, specifically, in the case of bosonic systems, of system's random occupancy macrostates.This intuition motivated us to work out the empirical probability distribution and the Bayesian probability distribution of macrostates of bosonic systems at the equilibrium.As expected from previous results, the empirical probability distribution converges to the Bayesian one when the number of particles of the universe from which the empirical distribution is obtained tends to infinity.
Before concluding the paper, we propose for future study a new engaging connection between statistical mechanics and information theory.Let us regard a PVM operated on the universe as a POVM operated on the system.In quantum information theory, the non-negative difference between the entropy of the multinomial Bayesian marginal of the system and the expectation over the multinomial Bayesian prior of the universe of the entropy of the multivariate hypergeometric Bayesian likelihood of the system, that is is equal to the quantum information brought by the POVM operated on the system.The Bayesian approach, which is con-troversial in physics, can be overcome by observing that, according to [8], the difference between the entropy H N ,empirical of the multinomial distribution of the system based on the empirical one-particle distribution of microstates (13) and the entropy of the multivariate hypergeometric of the system in ( 10) is always non-negative: H N ,empirical + n∈N p N (n, nU ) log(p N (n, nU )) ≥ 0. This difference is the "empirical information" about the system brought by the bosonic eigenstate of the universe resulting from the PVM.

p
N ′ |B (n ′ , b)p N ′′ |B (n ′′ , b)p B (b), where {p N ′ |B (n ′ , b)} ({p N ′′ |B (n ′′ , b)})is the probability distribution of the occupancy numbers of a gas with B (N − B) particles in the sub-container of volume V ′ (V − V ′ ).