Next Article in Journal
An Improved Chaotic Optimization Algorithm Applied to a DC Electrical Motor Modeling
Next Article in Special Issue
Robust Macroscopic Quantum Measurements in the Presence of Limited Control and Knowledge
Previous Article in Journal
Label-Driven Learning Framework: Towards More Accurate Bayesian Network Classifiers through Discrimination of High-Confidence Labels
Previous Article in Special Issue
Quantum Information: What Is It All About?
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Entropic Updating of Probabilities and Density Matrices

Department of Physics, University at Albany (SUNY), Albany, NY 12222, USA
Entropy 2017, 19(12), 664; https://doi.org/10.3390/e19120664
Submission received: 2 November 2017 / Revised: 1 December 2017 / Accepted: 2 December 2017 / Published: 4 December 2017
(This article belongs to the Special Issue Quantum Information and Foundations)

Abstract

:
We find that the standard relative entropy and the Umegaki entropy are designed for the purpose of inferentially updating probabilities and density matrices, respectively. From the same set of inferentially guided design criteria, both of the previously stated entropies are derived in parallel. This formulates a quantum maximum entropy method for the purpose of inferring density matrices in the absence of complete information.

1. Introduction

We design an inferential updating procedure for probability distributions and density matrices such that inductive inferences may be made. The inferential updating tools found in this derivation take the form of the standard and quantum relative entropy functionals, and thus we find the functionals are designed for the purpose of updating probability distributions and density matrices, respectively. Previously formulated design derivations which found the entropy to be a tool for inference originally required five design criteria (DC) [1,2,3], this was reduced to four in [4,5,6], and then down to three in [7]. We reduced the number of required DC down to two while also providing the first design derivation of the quantum relative entropy—using the same design criteria and inferential principles in both instances.
The designed quantum relative entropy takes the form of Umegaki’s quantum relative entropy, and thus it has the “proper asymptotic form of the relative entropy in quantum (mechanics)” [8,9,10]. Recently, Wilming, etc. [11] gave an axiomatic characterization of the quantum relative entropy that “uniquely determines the quantum relative entropy”. Our derivation differs from their’s, again in that we design the quantum relative entropy for a purpose, but also that our DCs are imposed on what turns out to be the functional derivative of the quantum relative entropy rather than on the quantum relative entropy itself. The use of a quantum entropy for the purpose of inference has a large history: Jaynes [12,13] invented the notion of the quantum maximum entropy method [14], while it was perpetuated by [15,16,17,18,19,20,21,22] and many others. However, we find the quantum relative entropy to be the suitable entropy for updating density matrices, rather than the von Neuman entropy [23], as is suggested in [24]. I believe the present article provides the desired motivation for why the appropriate quantum relative entropy for updating density matrices, from prior to posterior, should be logarithmic in form while also providing a solution for updating non-uniform prior density matrices [24]. The relevant results of these papers may be found using the quantum relative entropy with suitably chosen prior density matrices.
It should be noted that because the relative entropies were reached by design, they may be interpreted as such, “the relative entropies are tools for updating”, which means we no longer need to attach an interpretation ex post facto—as a measure of disorder or amount of missing information. In this sense, the relative entropies were built for the purpose of saturating their own interpretation [4,7], and, therefore, the quantum relative entropy is the tool designed for updating density matrices.
This article takes an inferential approach to probabilities and density matrices that is expected to be notionally consistent with the Bayesian derivations of Quantum Mechanics, such as Entropic Dynamics [7,25,26,27], as well as Bayesian interpretations of Quantum Mechanics, such as QBism [28]. The quantum maximum entropy method is, however, expected to be useful independent of one’s interpretation of Quantum Mechanics because the entropy is designed at the level of density matrices rather than being formulated from arguments about the “inner workings” of Quantum Mechanics. This inferential approach is, at the very least, verbally convenient so we will continue writing in this language.
A few applications of the quantum maximum entropy method are given in an another article [29]. By maximizing the quantum relative entropy with respect to a “data constraint” and the appropriate prior density matrix, the Quantum Bayes Rule [30,31,32,33,34] (a positive-operator valued measure (POVM) measurement and collapse) is derived. The quantum maximum entropy method can reproduce the density matrices in [35,36] that are cited as “Quantum Bayes Rules”, but the required constraints are difficult to motivate; however, it is expected that the results of this paper may be useful for further understanding Machine Learning techniques that involve the quantum relative entropy [37]. The Quantum Bayes Rule derivation in [29] is analogous to the standard Bayes Rule derivation from the relative entropy given in [38], as was suggested to be possible in [24]. This article provides the foundation for [29], and thus, the quantum maximum entropy method unifies a few topics in Quantum Information and Quantum Measurement through entropic inference.
As is described in this article and in [29], the quantum maximum entropy method is able to provide solutions even if the constraints and prior density matrix in question do not all mutually commute. This might be useful for subjects as far reaching as [39], which seeks to use Quantum Theory as a basis for building models for cognition. The immediate correspondence is that the quantum maximum entropy method might provide a solution toward addressing the empirical evidence for noncommutative cognition, which is how one’s cognition changes when addressing questions in permuted order [39]. A simpler model for noncommutative cognition may also be possible by applying sequential updates via the standard maximum entropy method with their order permuted. Sequential updating does not, in general, give the same resultant probability distribution when the updating order is permuted—this is argued to be a feature of the standard maximum entropy method [40]. Similarly, sequential updating in the quantum maximum entropy method also has this feature, but it should be noted that the noncommutativity of sequential updating is different in principle than simultaneously updating with respect to expectation values of noncommuting operators.
The remainder of the paper is organized as follows: first, we will discuss some universally applicable principles of inference and motivate the design of an entropy function able to rank probability distributions. This entropy function will be designed such that it is consistent with inference by applying a few reasonable design criteria, which are guided by the aforementioned principles of inference. Using the same principles of inference and design criteria, we find the form of the quantum relative entropy suitable for inference. The solution to an example of updating 2 × 2 prior density matrices with respect to expectation values over spin matrices that do not commute with the prior via the quantum maximum entropy method is given in the Appendix B. We end with concluding remarks (I thank the reviewers for providing several useful references in this section).

2. The Design of Entropic Inference

Inference is the appropriate updating of probability distributions when new information is received. Bayes rule and Jeffrey’s rule are both equipped to handle information in the form of data; however, the updating of a probability distribution due to the knowledge of an expectation value was realized by Jaynes [12,13,14] through the method of maximum entropy. The two methods for inference were thought to be devoid of one another until the work of [38,40], which showed Bayes Rule and Jeffrey’s Rule to be consistent with the method of maximum entropy when the expectation values were in the form of data [38,40]. In the spirit of the derivation we will carry on as if the maximum entropy method were not known and show how it may be derived as an application of inference.
Given a probability distribution φ ( x ) over a general set of propositions x X , it is self evident that if new information is learned, we are entitled to assign a new probability distribution ρ ( x ) that somehow reflects this new information while also respecting our prior probability distribution φ ( x ) . The main question we must address is: “Given some information, to what posterior probability distribution ρ ( x ) should we update our prior probability distribution φ ( x ) ?”, that is,
φ ( x ) * ρ ( x ) ?
This specifies the problem of inductive inference. Since “information” has many colloquial, yet potentially conflicting, definitions, we remove potential confusion by defining information operationally ( ) as the rationale that causes a probability distribution to change (inspired by and adapted from [7]). Directly from [7]:
Our goal is to design a method that allows a systematic search for the preferred posterior distribution. The central idea, first proposed in [4], is disarmingly simple: to select the posterior, first rank all candidate distributions in increasing order of preference and then pick the distribution that ranks the highest. Irrespective of what it is that makes one distribution preferable over another (we will get to that soon enough), it is clear that any ranking according to preference must be transitive: if distribution ρ 1 is preferred over distribution ρ 2 , and ρ 2 is preferred over ρ 3 , then ρ 1 is preferred over ρ 3 . Such transitive rankings are implemented by assigning to each ρ ( x ) a real number S [ ρ ] , which is called the entropy of ρ , in such a way that if ρ 1 is preferred over ρ 2 , then S [ ρ 1 ] > S [ ρ 2 ] . The selected distribution (one or possibly many, for there may be several equally preferred distributions) is that which maximizes the entropy functional.
Because we wish to update from prior distributions φ to posterior distributions ρ by ranking, the entropy functional S [ ρ , φ ] is a real function of both φ and ρ . In the absence of new information, there is no available rationale to prefer any ρ to the original φ , and thereby the relative entropy should be designed such that the selected posterior is equal to the prior φ (in the absence of new information). The prior information encoded in φ ( x ) is valuable and we should not change it unless we are informed otherwise. Due to our definition of information, and our desire for objectivity, we state the predominate guiding principle for inductive inference:
The Principle of Minimal Updating (PMU):
A probability distribution should only be updated to the extent required by the new information.
This simple statement provides the foundation for inference [7]. If the updating of probability distributions is to be done objectively, then possibilities should not be needlessly ruled out or suppressed. Being informationally stingy, that we should only update probability distributions when the information requires it, pushes inductive inference toward objectivity. Thus, using the PMU helps formulate a pragmatic (and objective) procedure for making inferences using (informationally) subjective probability distributions [41].
This method of inference is only as universal and general as its ability to apply equally well to any specific inference problem. The notion of “specificity” is the notion of statistical independence; a special case is only special in that it is separable from other special cases. The notion that systems may be “sufficiently independent” plays a central and deep-seated role in science and the idea that some things can be neglected and that not everything matters, is implemented by imposing criteria that tells us how to handle independent systems [7]. Ironically, the universally shared property by all specific inference problems is their ability to be independent of one another—they share independence. Thus, a universal inference scheme based on the PMU permits:
Properties of Independence (PI):
Subdomain Independence: When information is received about one set of propositions, it should not affect or change the state of knowledge (probability distribution) of the other propositions (else information was also received about them too);
And,
Subsystem Independence: When two systems are a priori believed to be independent and we only receive information about one, then the state of knowledge of the other system remains unchanged.
The PIs are special cases of the PMU that ultimately take the form of design criteria in this design derivation. The process of constraining the form of S [ ρ , φ ] by imposing design criteria may be viewed as the process of eliminative induction, and after sufficient constraining, a single form for the entropy remains. Thus, the justification behind the surviving entropy is not that it leads to demonstrably correct inferences, but, rather, that all other candidate entropies demonstrably fail to perform as desired [7]. Rather than the design criteria instructing one how to update, they instruct in what instances one should not update. That is, rather than justifying one way to skin a cat over another, we tell you when not to skin it, which is operationally unique—namely you don’t do it—luckily enough for the cat.

The Design Criteria and the Standard Relative Entropy

The following design criteria (DC), guided by the PMU, are imposed and formulate the standard relative entropy as a tool for inference. The form of this presentation is inspired by [7].
DC1: Subdomain Independence
We keep DC1 from [7] and review it below. DC1 imposes the first instance of when one should not update—the Subdomain PI. Suppose the information to be processed does not refer to a particular subdomain D of the space X of xs. In the absence of new information about D , the PMU insists we do not change our minds about probabilities that are conditional on D . Thus, we design the inference method so that φ ( x | D ) , the prior probability of x conditional on x D , is not updated and therefore the selected conditional posterior is
P ( x | D ) = φ ( x | D ) .
(The notation will be as follows: we denote priors by φ , candidate posteriors by lower case ρ , and the selected posterior by upper case P.) We emphasize the point is not that we make the unwarranted assumption that keeping φ ( x | D ) unchanged is guaranteed to lead to correct inferences. It need not; induction is risky. The point is, rather, that, in the absence of any evidence to the contrary, there is no reason to change our minds and the prior information takes priority.
DC1 Implementation
Consider the set of microstates x i X belonging to either of two non-overlapping domains D or its compliment D , such that X = D D and = D D . For convenience, let ρ ( x i ) = ρ i . Consider the following constraints:
ρ ( D ) = i D ρ i a n d ρ ( D ) = i D ρ i ,
such that ρ ( D ) + ρ ( D ) = 1 , and the following “local” expectation value constraints over D and D ,
A = i D ρ i A i a n d A = i D ρ i A i ,
where A = A ( x ) is a scalar function of x and A i A ( x i ) . As we are searching for the candidate distribution which maximizes S while obeying (2) and (3), we maximize the entropy S S [ ρ , φ ] with respect to these expectation value constraints using the Lagrange multiplier method,
0 = δ ( S λ [ ρ ( D ) i D ρ i ] μ [ A i D ρ i A i ] λ [ ρ ( D ) i D ρ i ] μ [ A i D ρ i A i ] ) ,
and, thus, the entropy is maximized when the following differential relationships hold:
δ S δ ρ i = λ + μ A i i D ,
δ S δ ρ i = λ + μ A i i D .
Equations (2)–(5), are n + 4 equations we must solve to find the four Lagrange multipliers { λ , λ , μ , μ } and the n probability values { ρ i } associated to the n microstates { x i } . If the subdomain constraint DC1 is imposed in the most restrictive case, then it will hold in general. The most restrictive case requires splitting X into a set of { D i } domains such that each D i singularly includes one microstate x i . This gives,
δ S δ ρ i = λ i + μ i A i in   each   D i .
Because the entropy S = S [ ρ 1 , ρ 2 , ; φ 1 , φ 2 , ] is a functional over the probability of each microstate’s posterior and prior distribution, its variational derivative is also a function of said probabilities in general,
δ S δ ρ i ϕ i ( ρ 1 , ρ 2 , ; φ 1 , φ 2 , ) = λ i + μ i A i for   each   ( i , D i ) .
DC1 is imposed by constraining the form of ϕ i ( ρ 1 , ρ 2 , ; φ 1 , φ 2 , ) = ϕ i ( ρ i ; φ 1 , φ 2 , ) to ensure that changes in A i A i + δ A i have no influence over the value of ρ j in domain D j , through ϕ i , for i j . If there is no new information about propositions in D j , its distribution should remain equal to φ j by the PMU. We further restrict ϕ i such that an arbitrary variation of φ j φ j + δ φ j (a change in the prior state of knowledge of the microstate j) has no effect on ρ i for i j and therefore DC1 imposes ϕ i = ϕ i ( ρ i , φ i ) , as is guided by the PMU. At this point, it is easy to generalize the analysis to continuous microstates such that the indices become continuous i x , sums become integrals, and discrete probabilities become probability densities ρ i ρ ( x ) .
Remark
We are designing the entropy for the purpose of ranking posterior probability distributions (for the purpose of inference); however, the highest ranked distribution is found by setting the variational derivative of S [ ρ , φ ] equal to the variations of the expectation value constraints by the Lagrange multiplier method,
δ S δ ρ ( x ) = λ + i μ i A i ( x ) .
Therefore, the real quantity of interest is δ S δ ρ ( x ) rather than the specific form of S [ ρ , φ ] . All forms of S [ ρ , φ ] that give the correct form of δ S δ ρ ( x ) are equally valid for the purpose of inference. Thus, every design criteria may be made on the variational derivative of the entropy rather than the entropy itself, which we do. When maximizing the entropy, for convenience, we will let,
δ S δ ρ ( x ) ϕ x ( ρ ( x ) , φ ( x ) ) ,
and further use the shorthand ϕ x ( ρ , φ ) ϕ x ( ρ ( x ) , φ ( x ) ) , in all cases.
DC1’: In the absence of new information, our new state of knowledge ρ ( x ) is equal to the old state of knowledge φ ( x ) .
This is a special case of DC1, and is implemented differently than in [7]. The PMU is in principle a statement about informational honestly—that is, one should not “jump to conclusions” in light of new information and in the absence of new information, one should not change their state of knowledge. If no new information is given, the prior probability distribution φ ( x ) does not change, that is, the posterior probability distribution ρ ( x ) = φ ( x ) is equal to the prior probability. If we maximizing the entropy without applying constraints,
δ S δ ρ ( x ) = 0 ,
then DC1’ imposes the following condition:
δ S δ ρ ( x ) = ϕ x ( ρ , φ ) = ϕ x ( φ , φ ) = 0 ,
for all x in this case. This special case of the DC1 and the PMU turns out to be incredibly constraining as we will see over the course of DC2.
Comment
If the variable x is continuous, DC1 requires that when information refers to points infinitely close but just outside the domain D , that it will have no influence on probabilities conditional on D [7]. This may seem surprising as it may lead to updated probability distributions that are discontinuous. Is this a problem? No.
In certain situations (e.g., physics) we might have explicit reasons to believe that conditions of continuity or differentiability should be imposed and this information might be given to us in a variety of ways. The crucial point, however—and this is a point that we keep and will keep reiterating—is that unless such information is explicitly given, we should not assume it. If the new information leads to discontinuities, so be it.
DC2: Subsystem Independence
DC2 imposes the second instance of when one should not update—the Subsystem PI. We emphasize that DC2 is not a consistency requirement. The argument we deploy is not that both the prior and the new information tells us the systems are independent, in which case consistency requires that it should not matter whether the systems are treated jointly or separately. Rather, DC2 refers to a situation where the new information does not say whether the systems are independent or not, but information is given about each subsystem. The updating is being designed so that the independence reflected in the prior is maintained in the posterior by default via the PMU and the second clause of the PIs [7].
The point is not that when we have no evidence for correlations we draw the firm conclusion that the systems must necessarily be independent. They could indeed have turned out to be correlated and then our inferences would be wrong. Again, induction involves risk. The point is rather that if the joint prior reflects independence and the new evidence is silent on the matter of correlations, then the prior independence takes precedence. As before, in this case subdomain independence, the probability distribution should not be updated unless the information requires it [7].
DC2 Implementation
Consider a composite system, x = ( x 1 , x 2 ) X = X 1 × X 2 . Assume that all prior evidence led us to believe the subsystems are independent. This belief is reflected in the prior distribution: if the individual system priors are φ 1 ( x 1 ) and φ 2 ( x 2 ) , then the prior for the whole system is their product φ 1 ( x 1 ) φ 2 ( x 2 ) . Further suppose that new information is acquired such that φ 1 ( x 1 ) would by itself be updated to P 1 ( x 1 ) and that φ 2 ( x 2 ) would be itself be updated to P 2 ( x 2 ) . By design, the implementation of DC2 constrains the entropy functional such that, in this case, the joint product prior φ 1 ( x 1 ) φ 2 ( x 2 ) updates to the selected product posterior P 1 ( x 1 ) P 2 ( x 2 ) [7].
The argument below is considerably simplified if we expand the space of probabilities to include distributions that are not necessarily normalized. This does not represent any limitation because a normalization constraint may always be applied. We consider a few special cases below:
Case 1: We receive the extremely constraining information that the posterior distribution for system 1 is completely specified to be P 1 ( x 1 ) while we receive no information at all about system 2. We treat the two systems jointly. Maximize the joint entropy S [ ρ ( x 1 , x 2 ) , φ ( x 1 ) φ ( x 2 ) ] subject to the following constraints on the ρ ( x 1 , x 2 ) :
d x 2 ρ ( x 1 , x 2 ) = P 1 ( x 1 ) .
Notice that the probability of each x 1 X 1 within ρ ( x 1 , x 2 ) is being constrained to P 1 ( x 1 ) in the marginal. We therefore need a one Lagrange multiplier λ 1 ( x 1 ) for each x 1 X 1 to tie each value of d x 2 ρ ( x 1 , x 2 ) to P 1 ( x 1 ) . Maximizing the entropy with respect to this constraint is,
δ S d x 1 λ 1 ( x 1 ) d x 2 ρ ( x 1 , x 2 ) P 1 ( x 1 ) = 0 ,
which requires that
λ 1 ( x 1 ) = ϕ x 1 x 2 ρ ( x 1 , x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) ,
for arbitrary variations of ρ ( x 1 , x 2 ) . By design, DC2 is implemented by requiring φ 1 φ 2 P 1 φ 2 in this case, therefore,
λ 1 ( x 1 ) = ϕ x 1 x 2 P 1 ( x 1 ) φ 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) .
This equation must hold for all choices of x 2 and all choices of the prior φ 2 ( x 2 ) as λ 1 ( x 1 ) is independent of x 2 . Suppose we had chosen a different prior φ 2 ( x 2 ) = φ 2 ( x 2 ) + δ φ 2 ( x 2 ) that disagrees with φ 2 ( x 2 ) . For all x 2 and δ φ 2 ( x 2 ) , the multiplier λ 1 ( x 1 ) remains unchanged as it constrains the independent ρ ( x 1 ) P 1 ( x 1 ) . This means that any dependence that the right-hand side might potentially have had on x 2 and on the prior φ 2 ( x 2 ) must cancel out. This means that
ϕ x 1 x 2 P 1 ( x 1 ) φ 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) = f x 1 ( P 1 ( x 1 ) , φ 1 ( x 1 ) ) .
Since φ 2 is arbitrary in f, suppose further that we choose a constant prior set equal to one, φ 2 ( x 2 ) = 1 , therefore
f x 1 ( P 1 ( x 1 ) , φ 1 ( x 1 ) ) = ϕ x 1 x 2 P 1 ( x 1 ) 1 , φ 1 ( x 1 ) 1 = ϕ x 1 P 1 ( x 1 ) , φ 1 ( x 1 )
in general. This gives
λ 1 ( x 1 ) = ϕ x 1 P 1 ( x 1 ) , φ 1 ( x 1 ) .
The left-hand side does not depend on x 2 , and therefore neither does the right-hand side. An argument exchanging systems 1 and 2 gives a similar result.
Case 1—Conclusion: When the system 2 is not updated the dependence on φ 2 and x 2 drops out,
ϕ x 1 x 2 P 1 ( x 1 ) φ 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) = ϕ x 1 P 1 ( x 1 ) , φ 1 ( x 1 ) ,
and vice-versa when system 1 is not updated,
ϕ x 1 x 2 φ 1 ( x 1 ) P 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) = ϕ x 2 P 2 ( x 2 ) , φ 2 ( x 2 ) .
As we seek the general functional form of ϕ x 1 x 2 , and because the x 2 dependence drops out of (19) and the x 1 dependence drops out of (20) for arbitrary φ 1 , φ 2 and φ 12 = φ 1 φ 2 , the explicit coordinate dependence in ϕ consequently drops out of both such that,
ϕ x 1 x 2 ϕ ,
as ϕ = ϕ ( ρ ( x ) , φ ( x ) ) must only depend on coordinates through the probability distributions themselves. (As a double check, explicit coordinate dependence was included in the following computations but inevitably dropped out due to the form the functional equations and DC1’. By the argument above, and for simplicity, we drop the explicit coordinate dependence in ϕ here.)
Case 2: Now consider a different special case in which the marginal posterior distributions for systems 1 and 2 are both completely specified to be P 1 ( x 1 ) and P 2 ( x 2 ) , respectively. Maximize the joint entropy S [ ρ ( x 1 , x 2 ) , φ ( x 1 ) φ ( x 2 ) ] subject to the following constraints on the ρ ( x 1 , x 2 ) ,
d x 2 ρ ( x 1 , x 2 ) = P 1 ( x 1 ) and d x 1 ρ ( x 1 , x 2 ) = P 2 ( x 2 ) .
Again, this is one constraint for each value of x 1 and one constraint for each value of x 2 , which, therefore, require the separate multipliers μ 1 ( x 1 ) and μ 2 ( x 2 ) . Maximizing S with respect to these constraints is then,
0 = δ S d x 1 μ 1 ( x 1 ) d x 2 ρ ( x 1 , x 2 ) P 1 ( x 1 ) d x 2 μ 2 ( x 2 ) d x 1 ρ ( x 1 , x 2 ) P 2 ( x 2 ) ,
leading to
μ 1 ( x 1 ) + μ 2 ( x 2 ) = ϕ ρ ( x 1 , x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) .
The updating is being designed so that φ 1 φ 2 P 1 P 2 , as the independent subsystems are being updated based on expectation values which are silent about correlations. DC2 thus imposes,
μ 1 ( x 1 ) + μ 2 ( x 2 ) = ϕ P 1 ( x 1 ) P 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) .
Write (25) as,
μ 1 ( x 1 ) = ϕ P 1 ( x 1 ) P 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) μ 2 ( x 2 ) .
The left-hand side is independent of x 2 so we can perform a trick similar to that we used before. Suppose we had chosen a different constraint P 2 ( x 2 ) that differs from P 2 ( x 2 ) and a new prior φ 2 ( x 2 ) that differs from φ 2 ( x 2 ) except at the value x ¯ 2 . At the value x ¯ 2 ,the multiplier μ 1 ( x 1 ) remains unchanged for all P 2 ( x 2 ) , φ 2 ( x 2 ) , and thus x 2 . This means that any dependence that the right-hand side might potentially have had on x 2 and on the choice of P 2 ( x 2 ) , φ 2 ( x 2 ) must cancel out, leaving μ 1 ( x 1 ) unchanged. That is, the Lagrange multiplier μ ( x 2 ) “pushes out” these dependences such that
ϕ P 1 ( x 1 ) P 2 ( x 2 ) , φ 1 ( x 1 ) φ 2 ( x 2 ) μ 2 ( x 2 ) = g ( P 1 ( x 1 ) , φ 1 ( x 1 ) ) .
Because g ( P 1 ( x 1 ) , φ 1 ( x 1 ) ) is independent of arbitrary variations of P 2 ( x 2 ) and φ 2 ( x 2 ) on the left hand side (LHS) above—it is satisfied equally well for all choices. The form of g = ϕ ( P 1 ( x 1 ) , q 1 ( x 1 ) ) is apparent if P 2 ( x 2 ) = φ 2 ( x 2 ) = 1 as μ 2 ( x 2 ) = 0 similar to Case 1 as well as DC1’. Therefore, the Lagrange multiplier is
μ 1 ( x 1 ) = ϕ P 1 ( x 1 ) , φ 1 ( x 1 ) .
A similar analysis carried out for μ 2 ( x 2 ) leads to
μ 2 ( x 2 ) = ϕ P 2 ( x 2 ) , φ 2 ( x 2 ) .
Case 2—Conclusion: Substituting back into (25) gives us a functional equation for ϕ ,
ϕ P 1 P 2 , φ 1 φ 2 = ϕ P 1 , φ 1 + ϕ P 2 , φ 2 .
The general solution for this functional equation is derived in the Appendix A.3, and is
ϕ ( ρ , φ ) = a 1 ln ( ρ ( x ) ) + a 2 ln ( φ ( x ) ) ,
where a 1 , a 2 are constants. The constants are fixed by using DC1’. Letting ρ 1 ( x 1 ) = φ 1 ( x 1 ) = φ 1 gives ϕ ( φ , φ ) = 0 by DC1’, and, therefore,
ϕ ( φ , φ ) = ( a 1 + a 2 ) ln ( φ ) = 0 ,
so we are forced to conclude a 1 = a 2 for arbitrary φ . Letting a 1 A = | A | such that we are really maximizing the entropy (although this is purely aesthetic) gives the general form of ϕ to be
ϕ ( ρ , φ ) = | A | ln ρ ( x ) φ ( x ) .
As long as A 0 , the value of A is arbitrary as it always can be absorbed into the Lagrange multipliers. The general form of the entropy designed for the purpose of inference of ρ is found by integrating ϕ , and, therefore,
S ( ρ ( x ) , φ ( x ) ) = | A | d x ( ρ ( x ) ln ρ ( x ) φ ( x ) ρ ( x ) ) + C [ φ ] .
The constant in ρ , C [ φ ] , will always drop out when varying ρ . The apparent extra term ( | A | ρ ( x ) d x ) from integration cannot be dropped while simultaneously satisfying DC1’, which requires ρ ( x ) = φ ( x ) in the absence of constraints or when there is no change to one’s information. In previous versions where the integration term ( | A | ρ ( x ) d x ) is dropped, one obtains solutions like ρ ( x ) = e 1 φ ( x ) (independent of whether φ ( x ) was previously normalized or not) in the absence of new information. Obviously, this factor can be taken care of by normalization, and, in this way, both forms of the entropy are equally valid; however, this form of the entropy better adheres to the PMU through DC1’. Given that we may regularly impose normalization, we may drop the extra ρ ( x ) d x term and C [ φ ] . For convenience then, (34) becomes
S ( ρ ( x ) , φ ( x ) ) S * ( ρ ( x ) , φ ( x ) ) = | A | d x ρ ( x ) ln ρ ( x ) φ ( x ) ,
which is a special case when the normalization constraint is being applied. Given normalization is applied, the same selected posterior ρ ( x ) maximizes both S ( ρ ( x ) , φ ( x ) ) and S * ( ρ ( x ) , φ ( x ) ) , and the star notation may be dropped.
Remarks
It can be seen that the relative entropy is invariant under coordinate transformations. This implies that a system of coordinates carry no information and it is the “character” of the probability distributions that are being ranked against one another rather than the specific set of propositions or microstates they describe.
The general solution to the maximum entropy procedure with respect to N linear constraints in ρ , A i ( x ) , and normalization gives a canonical-like selected posterior probability distribution,
ρ ( x ) = φ ( x ) exp ( i α i A i ( x ) ) .
The positive constant | A | may always be absorbed into the Lagrange multipliers so we may let it equal unity without loss of generality. DC1’ is fully realized when we maximize with respect to a constraint on ρ ( x ) that is already held by φ ( x ) , such as x 2 = x 2 ρ ( x ) d x , which happens to have the same value as x 2 φ = x 2 φ ( x ) d x , then its Lagrange multiplier is forcibly zero α 1 = 0 (as can be seen in (36) using (34)), in agreement with Jaynes. This gives the expected result ρ ( x ) = φ ( x ) as there is no new information. Our design has arrived at a refined maximum entropy method [12] as a universal probability updating procedure [38].

3. The Design of the Quantum Relative Entropy

In the last section, we assumed that the universe of discourse (the set of relevant propositions or microstates) X = A × B × was known. In quantum physics, things are a bit more ambiguous because many probability distributions, or many experiments, can be associated with a given density matrix. In this sense, it is helpful to think of density matrices as “placeholders” for probability distributions rather than a probability distributions themselves. As any probability distribution from a given density matrix, ρ ( · ) = T r ( | · · | ρ ^ ) , may be ranked using the standard relative entropy, it is unclear why we would chose one universe of discourse over another. In lieu of this, such that one universe of discourse is not given preferential treatment, we consider ranking entire density matrices against one another. Probability distributions of interest may be found from the selected posterior density matrix. This moves our universe of discourse from sets of propositions X H to Hilbert space(s).
When the objects of study are quantum systems, we desire an objective procedure to update from a prior density matrix φ ^ to a posterior density matrix ρ ^ . We will apply the same intuition for ranking probability distributions (Section 2) and implement the PMU, PI, and design criteria to the ranking of density matrices. We therefore find the quantum relative entropy S ( ρ ^ , φ ^ ) to be designed for the purpose of inferentially updating density matrices.

3.1. Designing the Quantum Relative Entropy

In this section, we design the quantum relative entropy using the same inferentially guided design criteria as were used in the standard relative entropy.
DC1: Subdomain Independence
The goal is to design a function S ( ρ ^ , φ ^ ) that is able to rank density matrices. This insists that S ( ρ ^ , φ ^ ) be a real scalar valued function of the posterior ρ ^ , and prior φ ^ density matrices, which we will call the quantum relative entropy or simply the entropy. An arbitrary variation of the entropy with respect to ρ ^ is,
δ S ( ρ ^ , φ ^ ) = i j δ S ( ρ ^ , φ ^ ) δ ρ i j δ ρ i j = i j δ S ( ρ ^ , φ ^ ) δ ρ ^ i j δ ( ρ ^ ) i j = i j δ S ( ρ ^ , φ ^ ) δ ρ ^ T j i δ ( ρ ^ ) i j = T r δ S ( ρ ^ , φ ^ ) δ ρ ^ T δ ρ ^ ,
where T r ( ) is the trace. We wish to maximize this entropy with respect to expectation value constraints, such as A = T r ( A ^ ρ ^ ) on ρ ^ . Using the Lagrange multiplier method to maximize the entropy with respect to A and normalization, and setting the variation equal to zero,
δ S ( ρ ^ , φ ^ ) λ [ T r ( ρ ^ ) 1 ] α [ T r ( A ^ ρ ^ ) A ] = 0 ,
where λ and α are the Lagrange multipliers for the respective constraints. Because S ( ρ ^ , φ ^ ) is a real number, we inevitably require δ S to be real, but without imposing this directly, we find that requiring δ S to be real requires ρ ^ , A ^ to be Hermitian. At this point, it is simpler to allow for arbitrary variations of ρ ^ such that,
T r δ S ( ρ ^ , φ ^ ) δ ρ ^ T λ 1 ^ α A ^ δ ρ ^ = 0 .
For these arbitrary variations, the variational derivative of S must satisfy,
δ S ( ρ ^ , φ ^ ) δ ρ ^ T = λ 1 ^ + α A ^
at the maximum. As in the remark earlier, all forms of S that give the correct form of δ S ( ρ ^ , φ ^ ) δ ρ ^ T under variation are equally valid for the purpose of inference. For notational convenience, we let
δ S ( ρ ^ , φ ^ ) δ ρ ^ T ϕ ( ρ ^ , φ ^ ) ,
which is a matrix valued function of the posterior and prior density matrices. The form of ϕ ( ρ ^ , φ ^ ) is already “local” in ρ ^ (the variational derivative is with respect to the whole density matrix), so we don’t need to constrain it further as we did in the original DC1.
DC1’: In the absence of new information, the new state ρ ^ is equal to the old state φ ^
Applied to the ranking of density matrices, in the absence of new information, the density matrix φ ^ should not change, that is, the posterior density matrix ρ ^ = φ ^ is equal to the prior density matrix. Maximizing the entropy without applying any constraints gives,
δ S ( ρ ^ , φ ^ ) δ ρ ^ T = 0 ^ ,
and, therefore, DC1’ imposes the following condition in this case:
δ S ( ρ ^ , φ ^ ) δ ρ ^ T = ϕ ( ρ ^ , φ ^ ) = ϕ ( φ ^ , φ ^ ) = 0 ^ .
As in the original DC1’, if φ ^ is known to obey some expectation value A ^ , and then if one goes out of their way to constrain ρ ^ to that expectation value and nothing else, it follows from the PMU that ρ ^ = φ ^ , as no information has been gained. This is not imposed directly but can be verified later.
DC2: Subsystem Independence
The discussion of DC2 is the same as the standard relative entropy DC2—it is not a consistency requirement, and the updating is designed so that the independence reflected in the prior is maintained in the posterior by default via the PMU when the information provided is silent about correlations.
DC2 Implementation
Consider a composite system living in the Hilbert space H = H 1 H 2 . Assume that all prior evidence led us to believe the systems were independent. This is reflected in the prior density matrix: if the individual system priors are φ ^ 1 and φ ^ 2 , then the joint prior for the whole system is φ ^ 1 φ ^ 2 . Further suppose that new information is acquired such that φ ^ 1 would itself be updated to ρ ^ 1 and that φ ^ 2 would be itself be updated to ρ ^ 2 . By design, the implementation of DC2 constrains the entropy functional such that in this case, the joint product prior density matrix φ ^ 1 φ ^ 2 updates to the product posterior ρ ^ 1 ρ ^ 2 so that inferences about one do not affect inferences about the other.
The argument below is considerably simplified if we expand the space of density matrices to include density matrices that are not necessarily normalized. This does not represent any limitation because normalization can always be easily achieved as one additional constraint. We consider a few special cases below:
Case 1: We receive the extremely constraining information that the posterior distribution for system 1 is completely specified to be ρ ^ 1 while we receive no information about system 2 at all. We treat the two systems jointly. Maximize the joint entropy S [ ρ ^ 12 , φ ^ 1 φ ^ 2 ] , subject to the following constraints on the ρ ^ 12 ,
T r 2 ( ρ ^ 12 ) = ρ ^ 1 .
Notice all of the N 2 elements in H 1 of ρ ^ 12 are being constrained. We therefore need a Lagrange multiplier which spans H 1 and therefore it is a square matrix λ ^ 1 . This is readily seen by observing the component form expressions of the Lagrange multipliers ( λ ^ 1 ) i j = λ i j . Maximizing the entropy with respect to this H 2 independent constraint is
0 = δ S i j λ i j T r 2 ( ρ ^ 1 , 2 ) ρ ^ 1 i j ,
but reexpressing this with its transpose ( λ ^ 1 ) i j = ( λ ^ 1 T ) j i , gives
0 = δ S T r 1 ( λ ^ 1 [ T r 2 ( ρ ^ 1 , 2 ) ρ ^ 1 ] ) ,
where we have relabeled λ ^ 1 T λ ^ 1 , for convenience, as the name of the Lagrange multipliers are arbitrary. For arbitrary variations of ρ ^ 12 , we therefore have
λ ^ 1 1 ^ 2 = ϕ ρ ^ 12 , φ ^ 1 φ ^ 2 .
DC2 is implemented by requiring φ ^ 1 φ ^ 2 ρ ^ 1 φ ^ 2 , such that the function ϕ is designed to reflect subsystem independence in this case; therefore, we have
λ ^ 1 1 ^ 2 = ϕ ρ ^ 1 φ ^ 2 , φ ^ 1 φ ^ 2 .
Had we chosen a different prior φ ^ 2 = φ ^ 2 + δ φ ^ 2 , for all δ φ ^ 2 the LHS λ ^ 1 1 ^ 2 remains unchanged given that ϕ is independent of scalar functions (I would like to thank M. Krumm for pointing this out.) of φ ^ 2 , as those could be lumped into λ ^ 1 while keeping ρ ^ 1 fixed. The potential dependence on scalar functions of φ ^ 2 can be removed by imposing DC2 in a subsystem independent situation where ρ ^ 1 in ϕ need not be fixed under variations of φ ^ 2 . The resulting equation in such a situation, for instance maximizing the entropy of an independent joint prior with respect to T r ( A ^ 1 1 ^ 2 · ρ ^ 12 ) = A , facilitated by a scalar Lagrange multiplier λ , and after imposing DC2,
λ A ^ 1 1 ^ 2 = ϕ ρ ^ 1 φ ^ 2 , φ ^ 1 φ ^ 2 .
For subsystem independence to be imposed here, ρ ^ 1 must be independent of variations in φ ^ 2 , and, therefore, in a general subsystem independent case, ϕ is independent of scalar functions of φ ^ 2 . This means that any dependence that the right-hand side of (48) might potentially have had on φ ^ 2 must drop out, meaning,
ϕ ρ ^ 1 φ ^ 2 , φ ^ 1 φ ^ 2 = f ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 .
Since φ ^ 2 is arbitrary, suppose further that we choose a unit prior, φ ^ 2 = 1 ^ 2 , and note that ρ ^ 1 1 ^ 2 and φ ^ 1 1 ^ 2 are block diagonal in H 2 . Because the LHS is block diagonal in H 2 ,
f ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 = ϕ ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 .
The RHS is block diagonal in H 2 and, because the function ϕ is understood to be a power series expansion in its arguments,
f ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 = ϕ ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 = ϕ ρ ^ 1 , φ ^ 1 1 ^ 2 .
This gives
λ ^ 1 1 ^ 2 = ϕ ρ ^ 1 , φ ^ 1 1 ^ 2 ,
and, therefore, the 1 ^ 2 factors out and λ ^ 1 = ϕ ρ ^ 1 , φ ^ 1 . A similar argument exchanging systems 1 and 2 shows λ ^ 2 = ϕ ρ ^ 2 , φ ^ 2 .
Case 1—Conclusion: The analysis leads us to conclude that when the system 2 is not updated, the dependence on φ ^ 2 drops out,
ϕ ρ ^ 1 φ ^ 2 , φ ^ 1 φ ^ 2 = ϕ ρ ^ 1 , φ ^ 1 1 ^ 2 ,
and, similarly,
ϕ φ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 = 1 ^ 1 ϕ ρ ^ 2 , φ ^ 2 .
Case 2: Now consider a different special case in which the marginal posterior distributions for systems 1 and 2 are both completely specified to be ρ ^ 1 and ρ ^ 2 , respectively. Maximize the joint entropy, S [ ρ ^ 12 , φ ^ 1 φ ^ 2 ] , subject to the following constraints on the ρ ^ 12 ,
T r 2 ( ρ ^ 12 ) = ρ ^ 1 and T r 1 ( ρ ^ 12 ) = ρ ^ 2 ,
where T r i ( ) is the partial trace function, which a trace over the vectors in over
H i . Here, each expectation value constrains the entire space H i , where ρ ^ i lives. The Lagrange multipliers must span their respective spaces, so we implement the constraint with the Lagrange multiplier operator μ ^ i , then,
0 = δ ( S T r 1 ( μ ^ 1 [ T r 2 ( ρ ^ 12 ) ρ ^ 1 ] ) T r 2 ( μ ^ 2 [ T r 1 ( ρ ^ 12 ) ρ ^ 2 ] ) ) .
For arbitrary variations of ρ ^ 12 , we have
μ ^ 1 1 ^ 2 + 1 ^ 1 μ ^ 2 = ϕ ρ ^ 12 , φ ^ 1 φ ^ 2 .
By design, DC2 is implemented by requiring φ ^ 1 φ ^ 2 ρ ^ 1 ρ ^ 2 in this case; therefore, we have
μ ^ 1 1 ^ 2 + 1 ^ 1 μ ^ 2 = ϕ ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 .
Write (59) as
μ ^ 1 1 ^ 2 = ϕ ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 1 ^ 1 μ ^ 2 .
The LHS is independent of changes that might occur in H 2 on the RHS of (60). This means that any variation of ρ ^ 2 and φ ^ 2 must be “pushed out” by μ ^ 2 —it removes the dependence of ρ ^ 2 and φ ^ 2 in ϕ . Any dependence that the RHS might potentially have had on ρ ^ 2 , φ ^ 2 must cancel out in a general subsystem independent case, leaving μ ^ 1 unchanged. Consequently,
ϕ ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 1 ^ 1 μ ^ 2 = g ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 .
Because g ( ρ ^ 1 , φ ^ 1 ) is independent of arbitrary variations of ρ ^ 2 and φ ^ 2 on the LHS above—it is satisfied equally well for all choices. The form of g ( ρ ^ 1 , φ ^ 1 ) reduces to the form of f ( ρ ^ 1 , φ ^ 1 ) from Case 1 when ρ ^ 2 = φ ^ 2 = 1 ^ 2 and, similarly, DC1’ gives μ ^ 2 = 0 . Therefore, the Lagrange multiplier is
μ ^ 1 1 ^ 2 = ϕ ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 .
A similar analysis is carried out for μ ^ 2 leading to
1 ^ 1 μ ^ 2 = 1 ^ 1 ϕ ( ρ ^ 2 , φ ^ 2 ) .
Case 2—Conclusion: Substituting back into (59) gives us a functional equation for ϕ ,
ϕ ( ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 ) = ϕ ( ρ ^ 1 , φ ^ 1 ) 1 ^ 2 + 1 ^ 1 ϕ ( ρ ^ 2 , φ ^ 2 ) ,
which is
ϕ ( ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 ) = ϕ ( ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 ) + ϕ ( 1 ^ 1 ρ ^ 2 , 1 ^ 1 φ ^ 2 ) .
The general solution to this matrix valued functional equation is derived in Appendix A.5 and is
ϕ ( ρ ^ , φ ^ ) = A ln ( ρ ^ ) + B ln ( φ ^ ) ,
where tilde A is a “super-operator” having constant coefficients and twice the number of indicies as ρ ^ and φ ^ as discussed in the Appendix (i.e., A ln ( ρ ^ ) i j = k A i j k ( log ( ρ ^ ) ) k and similarly for B ln ( φ ^ ) ). DC1’ imposes
ϕ ( φ ^ , φ ^ ) = A ln ( φ ^ ) + B ln ( φ ^ ) = 0 ^ ,
which is satisfied in general when A = B , and, now,
ϕ ( ρ ^ , φ ^ ) = A ln ( ρ ^ ) ln ( φ ^ ) .
We may fix the constant A by substituting our solution into the RHS of Equation (64), which is equal to the RHS of Equation (65),
A 1 ( ln ( ρ ^ 1 ) ln ( φ ^ 1 ) ) 1 ^ 2 + 1 ^ 1 A 2 ( ln ( ρ ^ 2 ) ln ( φ ^ 2 ) )
= A 12 ln ( ρ ^ 1 1 ^ 2 ) ln ( φ ^ 1 1 ^ 2 ) + A 12 ln ( 1 ^ 1 ρ ^ 2 ) ln ( 1 ^ 1 φ ^ 2 ) ,
where A 12 acts on the joint space of 1 and 2 and A 1 , A 2 acts on single subspaces 1 or 2, respectively. Using the well known log tensor product identity in this case (The proof is demonstrated by taking the log of ρ ^ 1 1 ^ 2 exp ( ρ ^ 1 ) 1 ^ 2 = exp ( ρ ^ 1 1 ^ 2 ) and substituting ρ ^ 1 = log ( ρ ^ 1 ) .), ln ( ρ ^ 1 1 ^ 2 ) = ln ( ρ ^ 1 ) 1 ^ 2 , the RHS of Equation (69) becomes
= A 12 ln ( ρ ^ 1 ) 1 ^ 2 ln ( φ ^ 1 ) 1 ^ 2 + A 12 1 ^ 1 ln ( ρ ^ 2 ) 1 ^ 1 ln ( φ ^ 2 ) .
Note that arbitrarily letting ρ ^ 2 = φ ^ 2 gives
A 1 ( ln ( ρ ^ 1 ) ln ( φ ^ 1 ) ) 1 ^ 2 = A 12 ln ( ρ ^ 1 ) 1 ^ 2 ln ( φ ^ 1 ) 1 ^ 2 ,
or arbitrarily letting ρ ^ 1 = φ ^ 1 gives
1 ^ 1 A 2 ( ln ( ρ ^ 2 ) ln ( φ ^ 2 ) ) = A 12 1 ^ 1 ln ( ρ ^ 2 ) 1 ^ 1 ln ( φ ^ 2 ) .
As A 12 , A 1 , and A 2 are constant tensors, inspecting the above equalities determines the form of the tensor to be A = A 1 where A is a scalar constant and 1 is the super-operator identity over the appropriate (joint) Hilbert space.
Because our goal is to maximize the entropy function, we let the arbitrary constant A = | A | and distribute 1 identically, which gives the final functional form,
ϕ ( ρ ^ , φ ^ ) = | A | ln ( ρ ^ ) ln ( φ ^ ) .
“Integrating” ϕ gives a general form for the quantum relative entropy,
S ( ρ ^ , φ ^ ) = | A | T r ( ρ ^ log ρ ^ ρ ^ log φ ^ ρ ^ ) + C [ φ ^ ] = | A | S U ( ρ ^ , φ ^ ) + | A | T r ( ρ ^ ) + C [ φ ^ ] ,
where S U ( ρ ^ , φ ^ ) is Umegaki’s form of the relative entropy [42,43,44], the extra | A | T r ( ρ ^ ) from integration is an artifact present for the preservation of DC1’, and C [ φ ^ ] is a constant in the sense that it drops out under arbitrary variations of ρ ^ . This entropy leads to the same inferences as Umegaki’s form of the entropy with an added bonus that ρ ^ = φ ^ in the absence of constraints or changes in information—rather than ρ ^ = e 1 φ ^ , which would be given by maximizing Umegaki’s form of the entropy. In this sense, the extra | A | T r ( ρ ^ ) only improves the inference process as it more readily adheres to the PMU though DC1’; however, now, because S U 0 , we have S ( ρ ^ , φ ^ ) T r ( ρ ^ ) + C [ φ ^ ] , which provides little nuisance. In the spirit of this derivation, we will keep the T r ( ρ ^ ) term there, but, for all practical purposes of inference, as long as there is a normalization constraint, it plays no role, and we find (letting | A | = 1 and C [ φ ^ ] = 0 ),
S ( ρ ^ , φ ^ ) S * ( ρ ^ , φ ^ ) = S U ( ρ ^ , φ ^ ) = T r ( ρ ^ log ρ ^ ρ ^ log φ ^ ) ,
Umegaki’s form of the relative entropy. S * ( ρ ^ , φ ^ ) is an equally valid entropy because, given normalization is applied, the same selected posterior ρ ^ maximizes both S ( ρ ^ , φ ^ ) and S * ( ρ ^ , φ ^ ) .

3.2. Remarks

Due to the universality and the equal application of the PMU by using the same design criteria for both the standard and quantum case, the quantum relative entropy reduces to the standard relative entropy when [ ρ ^ , φ ^ ] = 0 or when the experiment being preformed ρ ^ ρ ( a ) = T r ( ρ ^ | a a | ) is known. The quantum relative entropy we derive has the correct asymptotic form of the standard relative entropy in the sense of [8,9,10]. Further connections will be illustrated in a follow up article that is concerned with direct applications of the quantum relative entropy. Because two entropies are derived in parallel, we expect the well-known inferential results and consequences of the relative entropy to have a quantum relative entropy representation.
Maximizing the quantum relative entropy with respect to some constraints A ^ i , where { A ^ i } are a set of arbitrary Hermitian operators, and normalization 1 ^ = 1 , gives the following general solution for the posterior density matrix:
ρ ^ = exp α 0 1 ^ + i α i A ^ i + ln ( φ ^ ) = 1 Z exp i α i A ^ i + ln ( φ ^ ) 1 Z exp C ^ ,
where α i are the Lagrange multipliers of the respective constraints and normalization may be factored out of the exponential in general because the identity commutes universally. If φ ^ 1 ^ , it is well known that the analysis arrives at the same expression for ρ ^ after normalization, as it would if the von Neumann entropy were used, and thus one can find expressions for thermalized quantum states ρ ^ = 1 Z e β H ^ . The remaining problem is to solve for the N Lagrange multipliers using their N associated expectation value constraints. In principle, their solution is found by computing Z and using standard methods from Statistical Mechanics,
A ^ i = α i ln ( Z ) ,
and inverting to find α i = α i ( A ^ i ) , which has a unique solution due to the joint concavity (convexity depending on the sign convention) of the quantum relative entropy [8,9] when the constraints are linear in ρ ^ . The simple proof that (77) is monotonic in α , and therefore invertible, is that its derivative α A ^ i = A ^ i 2 A ^ i 2 0 . Between the Zassenhaus formula [45]
e t ( A ^ + B ^ ) = e t A ^ e t B ^ e t 2 2 [ A ^ , B ^ ] e t 3 6 ( 2 [ B ^ , [ A ^ , B ^ ] ] + [ A ^ , [ A ^ , B ^ ] ] ) ,
and Horn’s inequality [46,47,48], the solutions to (77) lack a certain calculational elegance because it is difficult to express the eigenvalues of C ^ = log ( φ ^ ) + α i A ^ i (in the exponential) in simple terms of the eigenvalues of the A ^ i ’s and φ ^ , in general, when the matrices do not commute. The solution requires solving the eigenvalue problem for C ^ , such the the exponential of C ^ may be taken and evaluated in terms of the eigenvalues of the α i A ^ i s and the prior density matrix φ ^ . A pedagogical exercise is starting with a prior that is a mixture of spin-z up and down φ ^ = a | + + | + b | | ( a , b 0 ), maximizing the quantum relative entropy with respect to an expectation of a general Hermitian operator with which the prior density matrix does not commute. This example for spin is given in the Appendix B.

4. Conclusions

This approach emphasizes the notion that entropy is a tool for performing inference and downplays counter-notional issues that arise if one interprets entropy as a measure of disorder, a measure of distinguishability, or an amount of missing information [7]. Because the same design criteria, guided by the PMU, are applied equally well to the design of a relative and quantum relative entropy, we find that both the relative and quantum relative entropy are designed for the purpose of inference. Because the quantum relative entropy is the functional that fits the requirements of a tool designed for the inference of density matrices, we now know what it is and how to use it—formulating an inferential quantum maximum entropy method. This article provides the foundation for [29], which, in particular, derives the Quantum Bayes Rule and collapse as special cases of the quantum maximum entropy method, as was craved in [24], analogous to [38,40]’s treatment for deriving Bayes Rule using the standard maximum entropy method. The quantum maximum entropy method thereby unifies a few topics in Quantum Information and Quantum Measurement through entropic inference.

Acknowledgments

I must give ample acknowledgment to Ariel Caticha who suggested the problem of justifying the form of the quantum relative entropy as a criterion for ranking of density matrices. He cleared up several difficulties by suggesting that design constraints be applied to the variational derivative of the entropy rather than the entropy itself. In addition, he provided substantial improvements to the method for imposing DC2 that led to the functional equations for the variational derivatives ( ϕ 12 = ϕ 1 + ϕ 2 )—with more rigor than in earlier versions of this article. His time and guidance are all greatly appreciated—thanks, Ariel. I would also like to thank M. Krumm, the reviewers, as well as our information physics group at UAlbany for our many intriguing discussions about probability, inference, and quantum mechanics.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A

The Appendix loosely follows the relevant sections in [49], and then uses the methods reviewed to solve the relevant functional equations for ϕ . The last section is an example of the quantum maximum entropy method applied to a mixed spin state.

Appendix A.1. Simple Functional Equations

From [49] pages 31–44.
Theorem A1.
If Cauchy’s functional equation
f ( x + y ) = f ( x ) + f ( y )
is satisfied for all real x, y, and if the function f ( x ) is (a) continuous at a point, (b) nonegative for small positive x’s, or (c) bounded in an interval, then,
f ( x ) = c x
is the solution to (A1) for all real x. If (A1) is assumed only over all positive x, y, then under the same conditions, (A2) holds for all positive x.
Proof. 
The most natural assumption for our purposes is that f ( x ) is continuous at a point (which later extends to continuity all points as given by Darboux [50]). Cauchy solved the functional equation by induction. In particular, Equation (A1) implies,
f ( i x i ) = i f ( x i ) ,
and if we let each x i = x as a special case to determine f, we find
f ( n x ) = n f ( x ) .
We may let n x = m t such that
f ( x ) = f ( m n t ) = m n f ( t ) .
Letting lim t 1 f ( t ) = f ( 1 ) = c gives
f ( m n ) = m n f ( 1 ) = m n c ,
and, because for t = 1 , x = m n above, we have
f ( x ) = c x ,
which is the general solution of the linear functional equation. In principle, c can be complex. The importance of Cauchy’s solution is that it can be used to give general solutions to the following Cauchy equations:
f ( x + y ) = f ( x ) f ( y ) ,
f ( x y ) = f ( x ) + f ( y ) ,
f ( x y ) = f ( x ) f ( y ) ,
by preforming consistent substitution until they are the same form as (A1), as given by Cauchy. We will briefly discuss the first two. ☐
Theorem A2.
The general solution of f ( x + y ) = f ( x ) f ( y ) is f ( x ) = e c x for all real or for all positive x , y that are continuous at one point and, in addition to the exponential solution, the solution f ( 0 ) = 1 and f ( x ) = 0 for ( x > 0 ) are in these classes of functions.
The first functional f ( x + y ) = f ( x ) f ( y ) is solved by first noting that it is strictly positive for real x, y, f ( x ) , which can be shown by considering x = y ,
f ( 2 x ) = f ( x ) 2 > 0 .
If there exists f ( x 0 ) = 0 , then it follows that f ( x ) = f ( ( x x 0 ) + x 0 ) = 0 , a trivial solution, hence the reason why the possibility of being equal to zero is excluded above. Given f ( x ) is nowhere zero, we are justified in taking the natural logarithm ln ( x ) , due to its positivity f ( x ) > 0 . This gives,
ln ( f ( x + y ) ) = ln ( f ( x ) ) + ln ( f ( y ) ) ,
and letting g ( x ) = ln ( f ( x ) ) gives,
g ( x + y ) = g ( x ) + g ( y ) ,
which is Cauchy’s linear equation, and thus has the solution g ( x ) = c x . Because g ( x ) = ln ( f ( x ) ) , one finds in general that f ( x ) = e c x .
Theorem A3.
If the functional equation f ( x y ) = f ( x ) + f ( y ) is valid for all positive x , y then its general solution is f ( x ) = c ln ( x ) given it is continuous at a point. If x = 0 (or y = 0 ) are valid, then the general solution is f ( x ) = 0 . If all real x , y are valid except 0, then the general solution is f ( x ) = c ln ( | x | ) .
In particular, we are interested in the functional equation f ( x y ) = f ( x ) + f ( y ) when x , y are positive. In this case, we can again follow Cauchy and substitute x = e u and y = e v to get,
f ( e u e v ) = f ( e u ) + f ( e v ) ,
and letting g ( u ) = f ( e u ) gives g ( u + v ) = g ( u ) + g ( v ) . Again, the solution is g ( u ) = c u and, therefore, the general solution is f ( x ) = c ln ( x ) when we substitute for u. If x could equal 0, then f ( 0 ) = f ( x ) + f ( 0 ) , which has the trivial solution f ( x ) = 0 . The general solution for x 0 , y 0 and x , y positive is therefore f ( x ) = c ln ( x ) .

Appendix A.2. Functional Equations with Multiple Arguments

From [49] pages 213–217. Consider the functional equation,
F ( x 1 + y 1 , x 2 + y 2 , , x n + y n ) = F ( x 1 , x 2 , , x n ) + F ( y 1 , y 2 , , y n ) ,
which is a generalization of Cauchy’s linear functional Equation (A1) to several arguments. Letting x 2 = x 3 = = x n = y 2 = y 3 = = y n = 0 gives
F ( x 1 + y 1 , 0 , , 0 ) = F ( x 1 , 0 , , 0 ) + F ( y 1 , 0 , , 0 ) ,
which is the Cauchy linear functional equation having solution F ( x 1 , 0 , , 0 ) = c 1 x 1 , where F ( x 1 , 0 , , 0 ) is assumed to be continuous or at least measurable majorant. Similarly,
F ( 0 , , 0 , x k , 0 , , 0 ) = c k x k ,
and if you consider
F ( x 1 + 0 , 0 + y 2 , 0 , , 0 ) = F ( x 1 , 0 , , 0 ) + F ( 0 , y 2 , 0 , , 0 ) = c 1 x 1 + c 2 y 2 ,
and, as y 2 is arbitrary, we could have let y 2 = x 2 such that in general
F ( x 1 , x 2 , , x n ) = c i x i ,
formulating the general solution.

Appendix A.3. Relative Entropy

We are interested in the following functional equation:
ϕ ( ρ 1 ρ 2 , φ 1 φ 2 ) = ϕ ( ρ 1 , φ 1 ) + ϕ ( ρ 2 , φ 2 ) .
This is an equation of the form,
F ( x 1 y 1 , x 2 y 2 ) = F ( x 1 , x 2 ) + F ( y 1 , y 2 ) ,
where x 1 = ρ ( x 1 ) , y 1 = ρ ( x 2 ) , x 2 = φ ( x 1 ) , and y 2 = φ ( x 2 ) . First, assume all q and p are greater than zero. Then, substitute: x i = e x i and y i = e y i and let F ( x 1 , x 2 ) = F ( e x 1 , e x 2 ) and so on such that
F ( x 1 + y 1 , x 2 + y 2 ) = F ( x 1 , x 2 ) + F ( y 1 , y 2 ) ,
which is of the form of (A15). The general solution for F is therefore
F ( x 1 + y 1 , x 2 + y 2 ) = a 1 ( x 1 + y 1 ) + a 2 ( x 2 + y 2 ) = a 1 ln ( x 1 y 1 ) + a 2 ln ( x 2 y 2 ) = F ( x 1 y 1 , x 2 y 2 ) ,
which means the general solution for ϕ is
ϕ ( ρ 1 , φ 1 ) = a 1 ln ( ρ ( x 1 ) ) + a 2 ln ( φ ( x 1 ) ) .
In such a case, when φ ( x 0 ) = 0 for some value x 0 X , we may let φ ( x 0 ) = ϵ , where ϵ is as close to zero as we could possibly want—the trivial general solution ϕ = 0 is saturated by the special case when ρ = φ from DC1’. Here, we return to the text.

Appendix A.4. Matrix Functional Equations

(This derivation is implied in [49] pages 347–349). First, consider a Cauchy matrix functional equation,
f ( X ^ + Y ^ ) = f ( X ^ ) + f ( Y ^ ) ,
where X ^ and Y ^ are n × n square matrices. Rewriting the matrix functional equation in terms of its components gives
f i j ( x 11 + y 11 , x 12 + y 12 , , x n n + y n n ) = f i j ( x 11 , x 12 , , x n n ) + f i j ( y 11 , y 12 , , y n n )
and is now in the form of (A15), and, therefore, the solution is
f i j ( x 11 , x 12 , , x n n ) = , k = 0 n c i j k x k
for i , j = 1 , , n . We find it convenient to introduce super indices, A = ( i , j ) and B = ( , k ) such that the component equation becomes
f A = B c A B x B ,
and resembles the solution for the linear transformation of a vector from [49]. In general, we will be discussing matrices X ^ = X ^ 1 X ^ 2 X ^ N which stem from tensor products of density matrices. In this situation, X ^ can be thought of as 2 N index tensor or a z × z matrix where z = i N n i is the product of the ranks of the matrices in the tensor product or even as a vector of length z 2 . In such a case, we may abuse the super index notation where A and B lump together the appropriate number of indices such that (A28) is the form of the solution for the components in general. The matrix form of the general solution is
f ( X ^ ) = C ˜ X ^ ,
where C ˜ is a constant super-operator having components c A B .

Appendix A.5. Quantum Relative Entropy

The functional equation of interest is
ϕ ρ ^ 1 ρ ^ 2 , φ ^ 1 φ ^ 2 = ϕ ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 + ϕ 1 ^ 1 ρ ^ 2 , 1 ^ 1 φ ^ 2 .
These density matrices are Hermitian, positive semi-definite, have positive eigenvalues, and are not equal to 0 ^ . Because every invertible matrix can be expressed as the exponential of some other matrix, we can substitute ρ ^ 1 = e ρ ^ 1 , and so on for all four density matrices giving,
ϕ e ρ ^ 1 e ρ ^ 2 , e φ ^ 1 e φ ^ 2 = ϕ e ρ ^ 1 1 ^ 2 , e φ ^ 1 1 ^ 2 + ϕ 1 ^ 1 e ρ ^ 2 , 1 ^ 1 e φ ^ 2 .
Now, we use the following identities for Hermitian matrices:
e ρ ^ 1 e ρ ^ 2 = e ρ ^ 1 1 ^ 2 + 1 ^ 1 ρ ^ 2
and
e ρ ^ 1 1 2 ^ = e ρ ^ 1 1 ^ 2 ,
to recast the functional equation as,
ϕ e ρ ^ 1 1 ^ 2 + 1 ^ 1 ρ ^ 2 , e φ ^ 1 1 ^ 2 + 1 ^ 1 φ ^ 2 = ϕ e ρ ^ 1 1 ^ 2 , e φ ^ 1 1 ^ 2 + ϕ e 1 ^ 1 ρ ^ 2 , e 1 ^ 1 φ ^ 2 .
Letting G ( ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 ) = ϕ e ρ ^ 1 1 ^ 2 , e φ ^ 1 1 ^ 2 , and the like, gives
G ( ρ ^ 1 1 ^ 2 + 1 ^ 1 ρ ^ 2 , φ ^ 1 1 ^ 2 + 1 ^ 1 φ ^ 2 ) = G ( ρ ^ 1 1 ^ 2 , φ ^ 1 1 ^ 2 ) + G ( 1 ^ 1 ρ ^ 2 , 1 ^ 1 φ ^ 2 ) .
This functional equation is of the form
G ( X ^ 1 + Y ^ 1 , X ^ 2 + Y ^ 2 ) = G ( X ^ 1 , X ^ 2 ) + G ( Y ^ 1 , Y ^ 2 ) ,
which has the general solution
G ( X ^ , Y ^ ) = A X ^ + B ˜ Y ^ ,
analogous to (A19), and finally, in general,
ϕ ( ρ ^ , φ ^ ) = A ln ( ρ ^ ) + B ˜ ln ( φ ^ ) ,
where A , B are super-operators having constant coefficients. Here, we return to the text.

Appendix B. Spin Example

Consider an arbitrarily mixed prior (in the spin-z basis for convenience) with a , b 0 ,
φ ^ = a | + + | + b | |
and a general Hermitian matrix in the spin- 1 / 2 Hilbert space,
c μ σ ^ μ = c 1 1 ^ + c x σ ^ x + c y σ ^ x + c z σ ^ z
= ( c 1 + c z ) | + + | + ( c x i c y ) | + | + ( c x + i c y ) | + | + ( c 1 c z ) | | ,
having a known expectation value,
T r ( ρ ^ c μ σ ^ μ ) = c .
Maximizing the entropy with respect to this general expectation value and normalization is:
0 = δ S λ [ T r ( ρ ^ ) 1 ] α ( T r ( ρ ^ c μ σ ^ μ ) c ) ,
which after varying gives the solution,
ρ ^ = 1 Z exp ( α c μ σ ^ μ + log ( φ ^ ) ) .
Letting
C ^ = α c μ σ ^ μ + log ( φ ^ )
gives
ρ ^ = 1 Z e C ^ = U e U 1 C ^ U U 1 = 1 Z U e λ ^ U 1 = e λ + Z U | λ + λ + | U 1 + e λ Z U | λ λ | U 1 ,
where λ ^ is the diagonalized matrix of C ^ having real eigenvalues. They are
λ ± = λ ± δ λ ,
due to the quadratic formula, where explicitly:
λ = α c 1 + 1 2 log ( a b ) ,
and
δ λ = 1 2 2 α c z + log ( a b ) 2 + 4 α 2 ( c x 2 + c y 2 ) .
Because λ ± and a , b , c 1 , c x , c y , c z are real, δ λ is real and 0 . The normalization constraint specifies the Lagrange multiplier Z,
1 = T r ( ρ ^ ) = e λ + + e λ Z ,
so Z = e λ + + e λ = 2 e λ cosh ( δ λ ) . The expectation value constraint specifies the Lagrange multiplier α ,
c = T r ( ρ ^ c μ σ μ ) = α log ( Z ) = c 1 + tanh ( δ λ ) α δ λ ,
which becomes
c = c 1 + tanh ( δ λ ) 2 δ λ 2 α ( c x 2 + c y 2 + c z 2 ) + c z log ( a b ) ,
or
c = c 1 + tanh 1 2 2 α c z + log ( a b ) 2 + 4 α 2 ( c x 2 + c y 2 ) 2 α ( c x 2 + c y 2 + c z 2 ) + c z log ( a b ) 2 α c z + log ( a b ) 2 + 4 α 2 ( c x 2 + c y 2 ) .
This equation is monotonic in α and therefore it is uniquely specified by the value of c. Ultimately, this is a consequence from the concavity of the entropy. The specific proof of (A52)’s monotonicity is below:
Proof. 
For ρ ^ to be Hermitian, C ^ is Hermitian and δ λ = 1 2 f ( α ) is real—furthermore, because δ λ is real f ( α ) 0 and thus δ λ 0 . Because f ( α ) is quadratic in α and positive, it may be written in vertex form,
f ( α ) = a ( α h ) 2 + k ,
where a > 0 , k 0 , and ( h , k ) are the ( x , y ) coordinates of the minimum of f ( α ) . Notice that the form of (A52) is
F ( α ) = tanh ( 1 2 f ( α ) ) f ( α ) × f ( α ) α .
Making the change of variables α = α h centers the function such that f ( α ) = f ( α ) is symmetric about α = 0 . We can then write
F ( α ) = tanh ( 1 2 f ( α ) ) f ( α ) × 2 a α ,
where the derivative has been computed. Because f ( α ) is a positive, symmetric, and monotonically increasing on the (symmetric) half-plane (for α greater than or less that zero), S ( α ) tanh ( 1 2 f ( α ) ) f ( α ) is also positive and symmetric, but it is unclear whether S ( α ) is strictly monotonic in the half-plane or not. We may restate
F ( α ) = S ( α ) × 2 a α .
We are now in a convenient position to preform the derivate test for monotonic functions:
α F ( α ) = 2 a S ( α ) + 2 a α α S ( α ) = 2 a S ( α ) 1 a α 2 a α 2 + k + a a α 2 a α 2 + k 1 tanh 2 ( 1 2 a α 2 + k ) 2 a S ( α ) 1 a ( α ) 2 a α 2 + k 0
because a , k , S ( α ) , and therefore a α 2 a α 2 + k are all > 0 . The function of interest F ( α ) is therefore monotonic for all α , and therefore it is monotonic for all α , completing the proof that there exists a unique real Lagrange multiplier α in (A52).
Although (A52) is monotonic in α , it is seemingly a transcendental equation. This can be solved graphically for the given values c , c 1 , c x , c y , c z , i.e., given the Hermitian matrix and its expectation value are specified. Equation (A52) and the eigenvalues take a simpler form when a = b = 1 2 because, in this instance, φ ^ 1 ^ and commutes universally so it may be factored out of the exponential in (A44). ☐

References

  1. Shore, J.E.; Johnson, R.W. Axiomatic derivation of the Principle of Maximum Entropy and the Principle of Minimum Cross-Entropy. IEEE Trans. Inf. Theory 1980, 26, 26–37. [Google Scholar] [CrossRef]
  2. Shore, J.E.; Johnson, R.W. Properties of Cross-Entropy Minimization. IEEE Trans. Inf. Theory 1981, 27, 472–482. [Google Scholar] [CrossRef]
  3. Csiszár, I. Why least squares and maximum entropy: An axiomatic approach to inference for linear inverse problems. Ann. Stat. 1991, 19, 2032. [Google Scholar] [CrossRef]
  4. Skilling, J. The Axioms of Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering; Erickson, G.J., Smith, C.R., Eds.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988. [Google Scholar]
  5. Skilling, J. Classic Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1988. [Google Scholar]
  6. Skilling, J. Quantified Maximum Entropy. In Maximum-Entropy and Bayesian Methods in Science and Engineering; Fougére, P.F., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1990. [Google Scholar]
  7. Caticha, A. Entropic Inference and the Foundations of Physics (Monograph Commissioned by the 11th Brazilian Meeting on Bayesian Statistics—EBEB-2012). Available online: http://www.albany.edu/physics/ACaticha-EIFP-book.pdf (accessed on 30 November 2017).
  8. Hiai, F.; Petz, D. The Proper Formula for Relative Entropy and its Asymptotics in Quantum Probability. Commun. Math. Phys. 1991, 143, 99–114. [Google Scholar] [CrossRef]
  9. Petz, D. Characterization of the Relative Entropy of States of Matrix Algebras. Acta Math. Hung. 1992, 59, 449–455. [Google Scholar] [CrossRef]
  10. Ohya, M.; Petz, D. Quantum Entropy and Its Use; Springer: New York, NY, USA, 1993; ISBN 0-387-54881-5. [Google Scholar]
  11. Wilming, H.; Gallego, R.; Eisert, J. Axiomatic Characterization of the Quantum Relative Entropy and Free Energy. Entropy 2017, 19, 241. [Google Scholar] [CrossRef]
  12. Jaynes, E.T. Information Theory and Statistical Mechanics. Phys. Rev. 1957, 106, 620–630. [Google Scholar] [CrossRef]
  13. Jaynes, E.T. Probability Theory: The Logic of Science; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  14. Jaynes, E.T. Information Theory and Statistical Mechanics II. Phys. Rev. 1957, 108, 171–190. [Google Scholar] [CrossRef]
  15. Balian, R.; Vénéroni, M. Incomplete descriptions, relevant information, and entropy production in collision processes. Ann. Phys. 1987, 174, 229–244. [Google Scholar] [CrossRef]
  16. Balian, R.; Balazs, N.L. Equiprobability, inference and entropy in quantum theory. Ann. Phys. 1987, 179, 97–144. [Google Scholar] [CrossRef]
  17. Balian, R. Justification of the Maximum Entropy Criterion in Quantum Mechanics. In Maximum Entropy and Bayesian Methods; Skilling, J., Ed.; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989; pp. 123–129. [Google Scholar]
  18. Balian, R. On the principles of quantum mechanics. Am. J. Phys. 1989, 57, 1019–1027. [Google Scholar] [CrossRef]
  19. Balian, R. Gain of information in a quantum measurement. Eur. J. Phys. 1989, 10, 208–213. [Google Scholar] [CrossRef]
  20. Balian, R. Incomplete descriptions and relevant entropies. Am. J. Phys. 1999, 67, 1078–1090. [Google Scholar] [CrossRef]
  21. Blankenbecler, R.; Partovi, H. Uncertainty, Entropy, and the Statistical Mechanics of Microscopic Systems. Phys. Rev. Lett. 1985, 54, 373–376. [Google Scholar] [CrossRef] [PubMed]
  22. Blankenbecler, R.; Partovi, H. Quantum Density Matrix and Entropic Uncertainty. In Proceedings of the Fifth Workshop on Maximum Entropy and Bayesian Methods in Applied Statistics, Laramie, WY, USA, 5–8 August 1985. [Google Scholar]
  23. Von Neumann, J. Mathematische Grundlagen der Quantenmechanik; Springer: Berlin, Germany, 1932; English Translation: Mathematical Foundations of Quantum Mechanics; Princeton University Press: Princeton, NY, USA, 1983. [Google Scholar]
  24. Ali, S.A.; Cafaro, C.; Giffin, A.; Lupo, C.; Mancini, S. On a Differential Geometric Viewpoint of Jaynes’ Maxent Method and its Quantum Extension. AIP Conf. Proc. 2012, 1443, 120–128. [Google Scholar]
  25. Caticha, A. Entropic Dynamics: Quantum Mechanics from Entropy and Information Geometry. Available online: https://arxiv.org/abs/1711.02538 (accessed on 30 November 2017).
  26. Reginatto, M.; Hall, M.J.W. Quantum-classical interactions and measurement: A consistent description using statistical ensembles on configuration space. J. Phys. Conf. Ser. 2009, 174, 012038. [Google Scholar] [CrossRef]
  27. Reginatto, M.; Hall, M.J.W. Information geometry, dynamics and discrete quantum mechanics. AIP Conf. Proc. 2013, 1553, 246–253. [Google Scholar]
  28. Caves, C.; Fuchs, C.; Schack, R. Quantum probabilities as Bayesian probabilities. Phys. Rev. A 2002, 65, 022305. [Google Scholar] [CrossRef]
  29. Vanslette, K. The Quantum Bayes Rule and Generalizations from the Quantum Maximum Entropy Method. Available online: https://arxiv.org/abs/1710.10949 (accessed on 30 November 2017).
  30. Schack, R.; Brun, T.; Caves, C. Quantum Bayes rule. Phys. Rev. A 2001, 64, 014305. [Google Scholar] [CrossRef]
  31. Korotkov, A. Continuous quantum measurement of a double dot. Phys. Rev. B 1999, 60, 5737–5742. [Google Scholar] [CrossRef]
  32. Korotkov, A. Selective quantum evolution of a qubit state due to continuous measurement. Phys. Rev. B 2000, 63, 115403. [Google Scholar] [CrossRef]
  33. Jordan, A.; Korotkov, A. Qubit feedback and control with kicked quantum nondemolition measurements: A quantum Bayesian analysis. Phys. Rev. B 2006, 74, 085307. [Google Scholar] [CrossRef]
  34. Hellmann, F.; Kamiński, W.; Kostecki, P. Quantum collapse rules from the maximum relative entropy principle. New J. Phys. 2016, 18, 013022. [Google Scholar] [CrossRef]
  35. Warmuth, M. A Bayes Rule for Density Matrices. In Advances in Neural Information Processing Systems 18, Proceedings of the Neural Information Processing Systems Conference, Montréal, QC, Canada, 7–12 December 2005; Neural Information Processing Systems Foundation, Inc.: La Jolla, CA, USA, 2015. [Google Scholar]
  36. Warmuth, M.; Kuzmin, D. A Bayesian Probability Calculus for Density Matrices. Mach. Learn. 2010, 78, 63–101. [Google Scholar] [CrossRef]
  37. Tsuda, K. Machine learning with quantum relative entropy. J. Phys. Conf. Ser. 2009, 143, 012021. [Google Scholar] [CrossRef]
  38. Giffin, A.; Caticha, A. Updating Probabilities. Presented at the 26th International Workshop on Bayesian Inference and Maximum Entropy Methods (MaxEnt 2006), Paris, France, 8–13 July 2006. [Google Scholar]
  39. Wang, Z.; Busemeyer, J.; Atmanspacher, H.; Pothos, E. The Potential of Using Quantum Theory to Build Models of Cognition. Top. Cogn. Sci. 2013, 5, 672–688. [Google Scholar] [CrossRef] [PubMed]
  40. Giffin, A. Maximum Entropy: The Universal Method for Inference. Ph.D. Thesis, University at Albany (SUNY), Albany, NY, USA, 2008. [Google Scholar]
  41. Caticha, A. Toward an Informational Pragmatic Realism. Minds Mach. 2014, 24, 37–70. [Google Scholar] [CrossRef]
  42. Umegaki, H. Conditional expectation in an operator algebra, IV (entropy and information). Ködai Math. Sem. Rep. 1962, 14, 59–85. [Google Scholar] [CrossRef]
  43. Uhlmann, A. Relative entropy and the Wigner-Yanase-Dyson-Lieb concavity in an interpolation theory. Commun. Math. Phys. 1997, 54, 21–32. [Google Scholar] [CrossRef]
  44. Schumacher, B.; Westmoreland, M. Relative entropy in quantum information theory. In Proceedings of the AMS Special Session on Quantum Information and Computation, Washington, DC, USA, 19–21 January 2000. [Google Scholar]
  45. Suzuki, M. On the Convergence of Exponential Operators—The Zassenhaus Formula, BCH Formula and Systematic Approximants. Commun. Math. Phys. 1977, 57, 193–200. [Google Scholar] [CrossRef]
  46. Horn, A. Eigenvalues of sums of Hermitian matrices. Pac. J. Math. 1962, 12, 225–241. [Google Scholar] [CrossRef]
  47. Bhatia, R. Linear Algebra to Quantum Cohomology: The Story of Alfred Horn’s Inequalities. Am. Math. Mon. 2001, 108, 289–318. [Google Scholar] [CrossRef]
  48. Knutson, A.; Tao, T. Honeycombs and Sums of Hermitian Matrices. Not. AMS 2001, 48, 175–186. [Google Scholar]
  49. Aczél, J. Lectures on Functional Equations and Their Applications; Academic Press Inc.: New York, NY, USA, 1966; Volume 19, pp. 31–44, 141–145, 213–217, 301–302, 347–349. [Google Scholar]
  50. Darboux, G. Sur le théorème fondamental de la géométrie projective. Math. Ann. 1880, 17, 55–61. [Google Scholar] [CrossRef]

Share and Cite

MDPI and ACS Style

Vanslette, K. Entropic Updating of Probabilities and Density Matrices. Entropy 2017, 19, 664. https://doi.org/10.3390/e19120664

AMA Style

Vanslette K. Entropic Updating of Probabilities and Density Matrices. Entropy. 2017; 19(12):664. https://doi.org/10.3390/e19120664

Chicago/Turabian Style

Vanslette, Kevin. 2017. "Entropic Updating of Probabilities and Density Matrices" Entropy 19, no. 12: 664. https://doi.org/10.3390/e19120664

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop