Next Article in Journal
An Efficient Two-Stage Decoding Scheme for LDPC-CRC Concatenated Codes
Previous Article in Journal
Bridging Intuition and Data: A Unified Bayesian Framework for Optimizing Unmanned Aerial Vehicle Swarm Performance
Previous Article in Special Issue
Probability of Self-Location in the Framework of the Many-Worlds Interpretation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Born’s Rule from Contextual Relative-Entropy Minimization

College of Engineering, University of Connecticut, Unit 3037, 261 Glenbrook Rd., Storrs, CT 06269, USA
Entropy 2025, 27(9), 898; https://doi.org/10.3390/e27090898
Submission received: 27 July 2025 / Revised: 21 August 2025 / Accepted: 23 August 2025 / Published: 25 August 2025
(This article belongs to the Special Issue Quantum Foundations: 100 Years of Born’s Rule)

Abstract

We give a variational characterization of the Born rule. For each measurement context, we project a quantum state ρ onto the corresponding abelian algebra by minimizing Umegaki relative entropy; Petz’s Pythagorean identity makes the dephased state the unique local minimizer, so the Born weights p C ( i ) = Tr ( ρ P i ) arise as a consequence, not an assumption. Globally, we measure contextuality by the minimum classical Kullback–Leibler distance from the bundle { p C ( ρ ) } to the noncontextual polytope, yielding a convex objective Φ ( ρ ) . Thus, Φ ( ρ ) = 0 exactly when a sheaf-theoretic global section exists (noncontextuality), and Φ ( ρ ) > 0 otherwise; the closest noncontextual model is the classical I-projection of the Born bundle. Assuming finite dimension, full-rank states, and rank-1 projective contexts, the construction is unique and non-circular; it extends to degenerate PVMs and POVMs (via Naimark dilation) without change to the statements. Conceptually, the work unifies information-geometric projection, the presheaf view of contextuality, and categorical classical structure into a single optimization principle. Compared with Gleason-type, decision-theoretic, or envariance approaches, our scope is narrower but more explicit about contextuality and the relational, context-dependent status of quantum probabilities.

1. Introduction

Modern quantum theory still rests on an empirical prescription—the Born rule—that converts the formal wave function into concrete outcome frequencies. Nearly a century after Born’s original proposal, the rule remains the last standing axiom that resists unanimous reduction to deeper principles [1,2]. In this paper, we attempt to close that gap by showing that the Born probabilities arise uniquely from a single variational requirement: minimize the information-geometric distance to the non-contextual polytope across all measurement contexts. This derivation weaves together three previously disparate strands: (i) Umegaki--Petz relative entropy projections inside each maximal abelian sub-algebra, (ii) the sheaf-cohomological obstruction that defines contextuality, and (iii) a relational, observer-relative ontology. The result elevates the Born rule from an axiom to the least-disturbance bridge between incompatible classical standpoints, thereby reconciling quantum probability with the demands of contextuality, relativity of states, and categorical naturality.
Max Born’s 1926 [1] insight that | ψ | 2 yields statistical weights created an operational rule that Dirac soon canonized in the Principles [3]. Ever since, theorists have sought a derivation from first principles. Gleason’s measure-theoretic theorem secures the trace form on Hilbert spaces of dimension   3 , but only by postulating a non-contextual frame function that is itself stronger than what any single measurement requires [4]. The Kochen–Specker theorem later showed that such a globally non-contextual frame cannot exist at all [5,6]. Alternative programmes invoke special physical assumptions: Zurek’s envariance symmetry recovers equal-amplitude cases yet still needs continuity to reach arbitrary moduli [7,8,9]; the Deutsch-Wallace decision-theoretic route derives quantum credences from rational preferences inside the Everett picture [10,11,12]; Hartle’s frequency-operator spectra tie probabilities to infinite repetition limits [13]; Bayesian reconstructions exploit exchangeability in quantum de Finetti theorems [14]; Busch’s POVM-Gleason generalisation closes the qubit loophole by enlarging the effect space [15]; and operational reconstructions à la Hardy and Chiribella–D’Ariano–Perinotti start from abstract information-processing axioms [16,17]. Each story illuminates part of the landscape, yet all smuggle in extra structure—continuity, rationality, purification, or non-contextuality—whose physical inevitability remains debated.
The modern moral is that quantum probability is intrinsically contextual [6,18]. Sheaf theory makes this precise by treating a “measurement scenario” as a cover of contexts and identifying contextuality with the obstruction to a global section [19]. Cohomological refinements classify the obstruction and reveal hierarchies of contextual strength [20,21,22], while information-theoretic measures such as the relative-entropy of contextuality supply quantitative monotones [23]. Our work adopts this viewpoint wholesale: the non-contextual polytope is the reference body, and “distance” from it is the resource cost of contextuality.
Relative entropy furnishes our notion of statistical deviation. Umegaki introduced the quantum version in 1962 [24]; it is jointly convex, strictly convex in its first argument, and monotone under CPTP maps (data processing) [25]. Petz showed that conditional expectation onto a von Neumann subalgebra satisfies a Pythagorean identity, which implies that dephasing onto a context (MASA) uniquely minimizes S ( ρ σ ) over classical states in that context [26]. On the classical side, Csiszár characterized KL minimizers as I-projections under linear expectation constraints [27]. We use exactly these two facts: (i) per-context dephasing as the unique quantum projection, and (ii) exponential-family solutions for linear constraints.
Our construction is relational in Rovelli’s sense: states are attributes of interactions, not intrinsic properties [28,29,30]. In categorical quantum mechanics, each context corresponds to a special commutative dagger Frobenius algebra (SCFA) on H; its copy/delete maps ( δ , ε ) implement classical data, and the associated decoherence (dephasing) idempotent realizes the passage from quantum to classical within that context [31,32]. In the completely positive maps (CPM) semantics, probabilities are scalars obtained by composing states with effects; for SCFAs in FHilb these scalars are precisely the Born weights Tr ( ρ P i ) . Our variational principle respects this naturality: the local information projection is exactly the SCFA-induced dephasing, and the global step chooses the least-informative joint consistent with the noncontextual constraints. Therefore, the rule selected by the category is the same one singled out by the optimization—Born’s rule—in harmony with relational/process-theoretic accounts [28].
Building on the preliminaries, we proceed in four steps. (1) We quantify contextuality by the contextual divergence Φ ( ρ ) : the minimum (weighted) classical KL distance from the empirical bundle p M ( ρ ) = { p C ( ρ ) } C M to the noncontextual set NC . (2) For each context C (a MASA; see the glossary), Petz’s Pythagorean identity implies that the information projection of ρ onto the classical face S ( C ) is the dephasing E C ( ρ ) , which fixes the Born weights p C ( i ) = Tr ( ρ P i ) without assuming them. (3) Any globally consistent assignment that is locally entropy-optimal in every context must therefore use these Born weights; the global problem reduces to the classical I-projection of the Born bundle onto NC . When the cover is contextual, the minimizer deviates minimally from Born in order to satisfy global consistency, and Φ ( ρ ) > 0 . (4) We interpret the resulting probabilities as the least-informative classical summaries compatible with the relational structure of the cover, linking the sheaf-theoretic obstruction to an explicit optimization principle.
Appendix A extends the analysis to degenerate projection-valued measures (PVMs) and general positive operator-valued measures (POVMs) via Naimark dilation, providing the corresponding exponential-family (quantum Jeffrey) updates and showing that all statements are dilation-stable (Lüders appears only in the projective case). Appendix B gathers the convex-optimization machinery: the existence of a minimizer, a full Karush-Kuhn-Tucker (KKT) that handle zero-probability coordinates, and the uniqueness of the optimal marginals. Under informational completeness and noncontextuality, these marginals identify the generating state ρ uniquely.
In short, the Born trace rule is not postulated but forced by a two-stage variational principle: local least-disturbance (dephasing) plus a global I-projection onto NC . Contextuality is quantified by Φ ( ρ ) —the minimal information cost of fitting a single classical narrative—and the relational stance is built into the very form of the optimization. No other assignment simultaneously minimizes information loss and remains as classical as the contextual fabric of the scenario allows.
For readability we define all acronyms at first use and provide a consolidated ‘Abbreviations and Symbols’ table before the appendices.

2. Mathematical Preliminaries

2.1. Hilbert Space, Contexts, and Empirical Models

We work with a finite-dimensional Hilbert space H and fix once-and-for-all a cover M of measurement contexts, with each context C being a maximal abelian subalgebra of B ( H ) , i.e., a commuting set of projectors { P i C } summing to the identity. Crucially, M is chosen independently of any state, so we do not “tailor” contexts to ρ .
For a state ρ , each context C yields an empirical distribution p C ( i ; ρ ) = F ρ , P i C , C , assigning to outcome P i C the probability p C ( i ; ρ ) . In orthodox quantum theory F ( ρ , P i C , C ) = Tr ( ρ P i C ) , but here we do not assume the Born rule. Rather, we collect all these C p C ( · ; ρ ) into a presheaf of distributions over M : whenever two contexts C C overlap, the full distribution p C restricts to the marginal on C . Our goal is to show—via a simple variational principle—that the only way these context-wise shadows can consistently arise from a single density operator is if F collapses to the familiar trace form.
A state ρ is noncontextual for the cover M if there exists a single joint distribution g over all outcomes X = M whose marginal on each context C agrees with p C ( · ; ρ ) [33]. If no such g exists, the empirical model is contextual, reflecting the Kochen–Specker obstruction to a hidden-variable assignment consistent across all C [6]. In sheaf-cohomological language, contextuality is witnessed by a nontrivial class in the first Čech cohomology H ˇ 1 ( M , F ) : the local p C form a 1-cocycle that fails to glue into a global section [19,21]. A vanishing class is therefore both necessary and sufficient for a global classical model of the data.
Noncontextual behaviors form a convex polytope NC C M Δ C ; namely, all families { g C } admitting a global joint g on X with marginals g C [33]. Equivalently, NC is the convex hull of deterministic value assignments. In d 3 , most quantum empirical models lie outside NC , while for qubits one needs a Kochen–Specker configuration to see contextuality [6]. Rather than seek an exact (and generally impossible) global section, we will measure contextuality by the distance of { p C } from NC .

2.2. Umegaki Relative Entropy as Divergence Measure

To gauge how far a quantum empirical model { p C } lies outside the noncontextual polytope NC , we employ the Umegaki–Petz relative entropy [24]. For two states ρ , σ on H ,
S ( ρ σ ) = Tr ρ ( ln ρ ln σ ) ,
defined whenever supp ( ρ ) supp ( σ ) (and + otherwise). As the quantum analogue of classical KL, S ( ρ σ ) 0 with equality iff ρ = σ on the stated support domain. It is strictly convex in its first argument ρ S ( ρ σ ) and jointly convex in ( ρ , σ ) ; it obeys the data-processing inequality for every CPTP (for abbreviations refer to the end of this document before the appendices) map Λ ,
S Λ ( ρ ) Λ ( σ ) S ( ρ σ ) ,
so coarse-grainings never increase the divergence. It is unitarily invariant under simultaneous conjugation, S ( U ρ U U σ U ) = S ( ρ σ ) , but in general does not depend only on the spectra of ρ and σ unless they commute [34]. Though not a metric, its lower semicontinuity, convexity, and monotonicity under coarse-graining make it the canonical divergence for projecting quantum states onto commutative (classical) models; in those settings the objective reduces to a classical KL and, under full-support conditions, yields a unique minimizer.
Two features of the Umegaki–Petz relative entropy make it ideal for our variational framework. First, S ( τ ρ ) is strictly convex in its first argument and jointly convex in ( τ , ρ ) . Hence (a) in the POVM/overlap problems where we minimize S ( τ ρ ) over τ subject to linear expectation constraints, the optimizer τ is unique; and (b) in the per-context projection min σ S ( C ) S ( ρ σ ) , Petz’s identity reduces the objective to a classical KL D KL p C ( ρ ) q (with q the diagonal of σ ), which is strictly convex in q, yielding the unique solution E C ( ρ ) whenever p C ( ρ ) has full support [25,26]. See Appendix A for the POVM/degenerate case via Naimark dilation and exponential-family updates. Second, we use Petz’s Pythagorean identity for the conditional expectation (dephasing) onto C,
S ( ρ σ ) = S ( ρ E C ( ρ ) ) + S ( E C ( ρ ) σ ) ( σ S ( C ) ) ,
which splits the divergence into a context-dependent term plus a purely classical term. This decouples the optimization across contexts and then lets us glue the optimal marginals consistently.

2.3. Sheaf-Theoretic View of Noncontextuality and Divergence

In a fixed measurement scenario ( X , M ) , let X be a set of rank-1 projectors and M a cover by contexts C X (each a maximal commuting set), with outcomes O x = { 0 , 1 } . A state ρ defines an empirical presheaf C p C ( · ; ρ ) whose marginals agree on every overlap C C . Categorically, { p C } is a Čech 1-cocycle, and ρ is noncontextual exactly if this cocycle is a coboundary—i.e., there is a global section g with p C = g | C for all C. If no such g exists, the resulting nonzero class in H ˇ 1 ( M , F ) certifies contextuality [21].
We introduce a quantitative measure of contextuality using the divergence defined above. Recall that the Umegaki relative entropy S ( ρ σ ) is defined when supp ( ρ ) supp ( σ ) (and + otherwise); it is strictly convex in its first argument and jointly convex in ( ρ , σ ) , obeys data processing S ( Λ ( ρ ) Λ ( σ ) ) S ( ρ σ ) for all CPTP Λ , and is unitarily invariant under simultaneous conjugation, though not determined solely by the spectra unless ρ and σ commute. Intuitively, we ask: “How much must one alter ρ ’s empirical model to make it noncontextual?” This leads to the contextual divergence Φ ( ρ ) , defined as the minimal information divergence between the quantum model and any noncontextual model in NC . Formally, let p M ( ρ ) = { p C ( · ; ρ ) } C M denote the full bundle of contextual distributions for ρ . We define:
Φ ( ρ ) = min g NC S p M ( ρ ) g .
where here S p M ( ρ ) g denotes the aggregate classical divergence, for example,
C M μ C D KL p C ( ρ ) g C ,
tagging each outcome by its context to avoid double counting. We assume NC (otherwise interpret Φ ( ρ ) = + or use an ε -noisy relaxation).
Our derivation will enforce the principle that Φ ( ρ ) be minimized. In other words, we seek an assignment of probabilities to measurement outcomes that makes a given state ρ as nearly noncontextual as possible. Subject to the usual constraints of quantum probabilities, such as normalization, positivity, and the functional relations imposed by projectors, we will find that this variational principle singles out a unique assignment—one that turns out to coincide with the Born rule. Crucially, this conclusion will emerge without ever assuming the Born rule in advance as we have treated p C ( i ; ρ ) abstractly so far. Rather, the trace-form p C ( i ) = Tr ( ρ P i ) will appear as a consequence of minimizing information divergence under the structural constraints of locality and global consistency.

2.4. Categorical Framework and Classical Structures

Before the analytical proof, we recast the problem in categorical quantum mechanics [35], which makes explicit the structural ingredients—quantum states, measurement contexts, and probabilistic outcomes. We model our system as an object A in a dagger-compact symmetric monoidal category C , with processes as morphisms. We assume C supports abstract states, effects, and—for each measurement context C—a commutative †-Frobenius algebra on A that encodes the classical copy-and-delete structure for that basis.

2.4.1. Commutative Frobenius Algebra

A special commutative Frobenius algebra on an object A consists of
m : A A A , u : I A , δ : A A A , ϵ : A I ,
satisfying the usual Frobenius and unit laws. Intuitively, δ duplicates and ϵ discards classical data in A. Commutativity means m τ = m (inputs unordered), and the “special’’ condition m δ = id A ensures copying then merging returns the original.
In FHilb , each orthonormal basis { | i } of H yields such an algebra. The basis vectors arise as the unique comonoid homomorphisms (classical points)
δ i : I A , δ i ( 1 ) = | i ,
and their adjoints δ i : A I are the corresponding effects. Concretely, on basis vectors
δ ( | i ) = | i | i , ϵ ( | i ) = 1 ,
extended linearly, while m | i | j = δ i j | i , one convenient (unnormalized) choice of unit is u ( 1 ) = i | i .

2.4.2. States and Effects as Morphisms

A pure state is the morphism | ψ : I A , sending 1 to | ψ . In CPM one represents it instead as the density-operator morphism
ρ = | ψ ψ | : I A .
Each classical point δ i induces a projector P i = δ i δ i = | i i | , and the Born probability is obtained by composing with ρ :
I ρ A P i I = Tr ( P i ρ ) = i | ψ 2 .
Equivalently, one may insert the bra morphism explicitly:
I | ψ A P i A ψ | I = ψ | P i | ψ = i | ψ 2 .

2.4.3. Unified Effect

Define for context C the effect
! i C = δ i P i C : A I .
Then for a state ρ (pure or mixed),
Pr ( i ρ , C ) = ! i C ρ : I I ,
which in FHilb evaluates to ψ | P i C | ψ = Tr ( P i C ρ ) , recovering the Born rule.
Axiomatic scope: In this work we assume from the outset that our ambient category is dagger-compact, or equivalently that each †-SCFA carries a faithful Frobenius trace. All subsequent KL-minimisation and Born-rule emergence rest on that dagger/trace structure; no further inner-product or Gleason-type postulate is invoked.
Crucially, one can show from the Frobenius-algebra axioms (copying, deleting, and monoidal composition) that this is the only way to produce a well-defined real scalar from a state–outcome pair. Hence, once a classical context structure is assumed and probabilities are required to be scalar morphisms in a monoidal category, the usual Born rule is forced: compatibility with classical structures and functoriality uniquely picks out the Hilbert-space trace as the probability assignment.

2.4.4. Categorical Consistency Check

In FHilb , each context carries a commutative †-Frobenius algebra that supplies copy/delete for classical data. Composing a state with the effect and the counit yields a scalar that, in FHilb , evaluates to Tr ( ρ P i ) . Thus, the variationally selected weights coincide with the scalar produced by the standard categorical interface. We use this as a consistency check rather than an independent derivation.
In summary, the categorical formulation assures us that nothing mysterious is hiding in our choice of measurement contexts: each context C supplies a classical interface (copy/delete operations) through which quantum states produce scalar outcomes. The Born rule appears as the inevitable scalar morphism arising from composing a state with a context’s effect and the counit (discard) map. This provides a high-level consistency check for our approach: any variational or information-theoretic argument we make in the Hilbert-space formalism will align with the fundamental categorical structure that already encapsulates the Born rule. In particular, it means that if our optimization principle selects a unique candidate for p C ( i ; ρ ) , that candidate must correspond to Tr ( ρ P i C ) in the concrete model—otherwise it would contradict the established classical interface of FHilb . With this assurance, we now proceed to the core of the argument: identifying the optimal local classical approximations and understanding how (and whether) they can be “glued” into a global noncontextual model.

3. Quantifying Contextuality Locally and Globally

This section turns contextuality from a logical obstruction into a quantitative optimization problem and extracts the Born weights from a local variational principle. Locally (Section 3.1), we show that for each context C the Umegaki–Petz projection sends ρ to its dephasing E C ( ρ ) , thereby fixing the Born probabilities p C ( i ) = Tr ( ρ P i ) without assuming them. Globally (Section 3.2 and Section 3.3), we compare the Born bundle { p C ( ρ ) } C M to the noncontextual set NC and define the contextual divergence Φ ( ρ ) , which vanishes exactly on noncontextual models and otherwise quantifies the minimal information cost of enforcing a single classical narrative. We then prove a two-stage theorem: the only locally entropy-optimal assignments are the Born weights, and the best global fit is the classical I-projection of the Born bundle onto NC . Existence/uniqueness (and handling zero-probability entries) are deferred to Appendix B; POVM/degenerate contexts follow by Naimark dilation in Appendix A.

3.1. Optimal Classical Approximations in a Single Context

3.1.1. Setup and Notation

Fix a context C M given by a projective measurement { P i : i I C } on a finite–dimensional Hilbert space H , with P i P j = δ i j P i and i P i = 1 . Let r i : = rank ( P i ) . We write π i : = P i / r i (the maximally mixed state on the ith outcome subspace.) The classical states on context C form the convex set
S ( C ) : = { σ D ( H ) : σ = i I C q i π i for some probability vector q = ( q i ) i } .
For rank–1 contexts ( r i = 1 for all i), this reduces to S ( C ) = { σ = i q i P i } .

3.1.2. Conditional Expectation (Dephasing) onto C

Let E C : B ( H ) span { P i } be the trace-preserving conditional expectation onto the abelian von Neumann subalgebra generated by { P i } (the “dephasing” onto C). Concretely,
E C ( X ) = i I C Tr ( P i X ) r i P i = i I C Tr ( X π i ) P i ( X B ( H ) ) .
This map satisfies E C ( σ ) = σ for all σ S ( C ) and
Tr E C ( X ) A = Tr X A for all A span { P i } .

3.1.3. The Pythagorean Identity (Petz)

Write the Umegaki relative entropy
S ( ρ σ ) : = Tr ρ ( log ρ log σ )
whenever supp ( ρ ) supp ( σ ) (and + otherwise). Petz’s decomposition theorem gives the Pythagorean identity for the conditional expectation E C (see, e.g., [26]):
S ( ρ σ ) = S ρ E C ( ρ ) + S E C ( ρ ) σ for every σ S ( C ) .
The first term is independent of σ .

3.1.4. Consequences of Equation (5)

Equation (5) immediately yields that the unique minimizer of S ( ρ σ ) over σ S ( C ) is E C ( ρ ) , provided the usual interior (full-support) condition holds in this context (stated below). Moreover, since E C ( ρ ) S ( C ) and every σ S ( C ) commutes with E C ( ρ ) , the second term is classical:
S E C ( ρ ) σ = D KL p q ,
where
p i : = Tr E C ( ρ ) P i = Tr ( ρ P i ) , q i : = Tr ( σ P i ) ,
and D KL ( p q ) = i p i ( log p i log q i ) is the classical Kullback–Leibler divergence. The equality p i = Tr ( ρ P i ) follows from Equation (4) with A = P i ; it is thus a consequence of the variational setup rather than an assumption.
Proposition 1
(Optimal classical state in a context). For any density operator ρ and any context C with projectors { P i } ,
arg min σ S ( C ) S ( ρ σ ) = E C ( ρ ) = i I C Tr ( ρ P i ) r i P i = i I C p C ( i ; ρ ) π i ,
where p C ( i ; ρ ) : = Tr ( ρ P i ) . In the rank–1 case this reads E C ( ρ ) = i Tr ( ρ P i ) P i .
Proof (one line via Equation (5)).
By Equation (5), S ( ρ σ ) = S ( ρ E C ( ρ ) ) + S ( E C ( ρ ) σ ) for all σ S ( C ) . The second term is minimized if and only if σ = E C ( ρ ) . Uniqueness holds whenever p i > 0 for all i (strict convexity of D KL on the simplex interior). The displayed formula for E C ( ρ ) follows from Equation (3). □

3.1.5. Interpretation

E C ( ρ ) is the information projection of ρ onto the classical face S ( C ) : it preserves exactly the measurement statistics of ρ in context C and discards all phases (coherences) that C cannot detect. In particular, Tr E C ( ρ ) P i = Tr ( ρ P i ) for each i, so the usual trace formula for outcome weights appears naturally at the minimizer.

3.1.6. Remarks on Degeneracy, Support, and Uniqueness

  • Degenerate outcomes. When r i > 1 , every σ S ( C ) is block-constant, σ = i q i π i , and E C ( ρ ) = i p i π i with p i = Tr ( ρ P i ) . In this case Equation (6) holds verbatim and the proof does not change.
  • Support. If some p i = 0 then the minimizer need not be unique on the face { q i = 0 whenever p i = 0 } , but E C ( ρ ) is always a minimizer. If p i > 0 for all i (equivalently, E C ( ρ ) has full support in S ( C ) ) then the minimizer is unique.
  • No “chain-rule” needed. The argument uses only the Pythagorean identity Equation (5) for the conditional expectation E C ; it does not assume Born weights in advance and it avoids ill-defined terms such as S ( ρ ˜ i P i ) .

3.2. Consistency on Overlaps and the Contextual Obstruction

3.2.1. Local Compatibility from Dephasing

Fix a measurement cover M of contexts C X and, for each C M , write the optimal classical approximation (information projection) as
E C ( ρ ) = i I C p C ( i ; ρ ) π i , p C ( i ; ρ ) : = Tr ( ρ P i ) , π i : = P i rank P i ,
as established in the previous subsection. If C , C M share an outcome projector P C C , then
Tr E C ( ρ ) P = Tr ( ρ P ) = Tr E C ( ρ ) P ,
and, more generally, the marginals of E C ( ρ ) and E C ( ρ ) agree on the overlap C C . Thus, the family of classical distributions { p C ( ρ ) } C M obtained from the dephasings forms a compatible 0-cochain on the presheaf of outcome distributions: restrictions to overlaps coincide by construction.

3.2.2. Global Sections and Noncontextual Models

Let D ( U ) denote the simplex of probability distributions on the outcomes of a measurement set U X , with restriction maps given by marginalization. A global section is a distribution g D ( X ) whose marginals g C match the empirical data in every context:
g C = res X C ( g ) for all C M , and g C = p C ( ρ ) .
Following the sheaf-theoretic formulation, the empirical model { p C ( ρ ) } is noncontextual iff a global section exists; failure of existence is exactly contextuality. In our setting, this means that although every pair (indeed every finite family) of contexts agrees on its overlaps, there may be no single joint g on X that glues all contexts simultaneously.

3.2.3. Cohomological Witness (Čech Obstruction)

There is a canonical way to assign to a compatible 0-cochain a cohomology class [ z ] H ˇ 1 ( M , F ) (for a suitable abelian coefficient presheaf F derived from the support). If [ z ] 0 then no global section exists, certifying contextuality. This obstruction is sufficient but not necessary in full generality; vanishing of [ z ] does not guarantee noncontextuality in every scenario. We use it as a robust witness rather than a complete characterization.

3.2.4. Quantifying the Obstruction by an Optimal Global Glue

Denote by NC the noncontextual polytope: the set of all joint distributions g D ( X ) whose marginals { g C } are obtained from convex mixtures of deterministic global assignments (equivalently, the convex hull of { 0 , 1 } -valued global sections consistent with the functional relations). We quantify the failure to glue by the convex program
Φ ( ρ ) : = min g NC C M μ C D KL p C ( ρ ) g C , μ C > 0 , C μ C = 1 ,
where g C is the marginal of g onto C. Any minimizer g is the closest noncontextual model to the Born-rule bundle in the sense of (weighted) classical relative entropy.

3.2.5. Properties of the Optimization

  • Existence. NC is a compact polytope and the objective is lower-semicontinuous on its relative interior; hence a minimizer g exists.
  • Convexity and (near) uniqueness. The map g g C is linear, and D KL ( · · ) is strictly convex in its second argument on the simplex interior. Thus the objective in Equation (7) is convex in g. If all p C ( ρ ) have full support and the cover separates global assignments (so that the linear map g ( g C ) C M is injective on the face touched by g ), then the minimizer is unique. In degenerate/boundary cases, the set of minimizers is a face; a canonical choice is the maximum-entropy point on that face.
  • KKT (I-projection) form. Writing A C for the marginalization matrix onto C, the objective is C μ C D KL ( p C A C g ) . At an interior optimum g , there exist Lagrange multipliers ( λ , ν ) for the affine constraints ( s g s = 1 and g NC ) such that
    C M μ C A C p C ( ρ ) A C g + λ 1 ν = 0 ,
    with ν s g s = 0 (componentwise division; complementarity). Equivalently, g is the classical Csiszár I-projection of the Born-rule bundle onto NC . See Appendix B for the convex-optimization details (existence, KKT with zeros, uniqueness of optimal marginals).

3.2.6. Two Payoffs

  • Optimal local shadows. Each E C ( ρ ) is the unique (full-support) minimizer of S ( ρ σ ) in the classical face S ( C ) , so every context reproduces the Born statistics while discarding undetectable phases.
  • Quantitative global glue. Φ ( ρ ) measures the minimal total information loss required to reconcile all contexts within NC . When ρ is noncontextual, Φ ( ρ ) = 0 and the unique minimizer satisfies g C = p C ( ρ ) for all C. When ρ is contextual, Φ ( ρ ) > 0 and g deviates from p C ( ρ ) only insofar as needed to satisfy the global linear constraints coupling the contexts.

3.2.7. Technical Remarks

  • Boundary behavior. If some p C ( i ; ρ ) = 0 , KL imposes g C ( i ) = 0 at the minimizer, which can generate flat directions. Working on the common support or adding an ε -smoothing yields stable numerics; the limit ε 0 recovers the exact value.
  • Choice of weights. Uniform μ C captures symmetry; other choices can encode experimental frequencies or confidence levels. All results above hold for any strictly positive weights summing to 1.
  • Alternatives and cross-checks. Other quantitative notions include the contextual fraction and the relative entropy of contextuality; our Φ ( ρ ) fits the same resource-theoretic template and can be compared empirically across scenarios.

3.3. Born Rule as the Unique Variational Solution

3.3.1. Synthesis

For each context C M , the previous subsection showed that the unique information projection of ρ onto the classical face S ( C ) is the dephasing E C ( ρ ) = i I C Tr ( ρ P i ) rank P i P i , with context-wise probabilities
p C ( i ; ρ ) : = Tr ( ρ P i ) .
These are fixed by the variational principle alone (Petz’s Pythagorean identity), so we take the Born bundle p ( ρ ) : = { p C ( ρ ) } C M as the locally optimal data. To quantify global consistency, let NC be the noncontextual polytope and define the global contextual divergence
Φ ( ρ ) : = min g NC C M μ C D KL p C ( ρ ) g C , μ C > 0 , C μ C = 1 .
When ρ is noncontextual, there exists g NC with g C = p C ( ρ ) for all C and Φ ( ρ ) = 0 ; otherwise Φ ( ρ ) > 0 .
Theorem 1
(Two-stage variational characterization of the Born rule). Consider the joint optimization
min { σ C S ( C ) } , g NC C M μ C S ρ σ C + C M μ C D KL p C ( ρ ) g C .
Then:
(i)
For every C, the unique minimizer in the first sum is σ C = E C ( ρ ) (full-support case), hence the only context-wise probabilities that can occur at any global optimum are the Born weights p C ( i ; ρ ) = Tr ( ρ P i ) .
(ii)
With σ C fixed, the second sum reduces to Equation (8) and attains its minimum at a unique g NC (on the appropriate face when supports are not full). Consequently,
min E q u a t i o n ( 9 ) = C μ C S ρ E C ( ρ ) + Φ ( ρ ) ,
with the pair { σ C } , g optimal.
Proof sketch. 
For each C, Petz’s Pythagorean identity gives S ( ρ σ C ) = S ( ρ E C ( ρ ) ) + S ( E C ( ρ ) σ C ) . Because E C ( ρ ) , σ C S ( C ) commute, S ( E C ( ρ ) σ C ) = D KL p C ( ρ ) p C ( σ C ) 0 , with equality iff σ C = E C ( ρ ) . This proves (i) and yields Equation (10) once σ C are fixed to E C ( ρ ) . The second term is a convex problem in g over the compact polytope NC with a strictly convex objective on the interior, hence an optimizer g exists and is unique under the usual support and injectivity conditions. □

3.3.2. Meaning and Consequences

  • Local uniqueness. The Born weights are the only per-context probabilities compatible with any global variational optimum; any attempt to alter the context-wise diagonals increases the first term in Equation (9) and cannot improve the second, so total cost rises.
  • Global projection. The second stage projects the Born bundle onto NC in KL geometry: g is the classical I-projection of p ( ρ ) , and Φ ( ρ ) measures “how far” ρ is from noncontextuality for the chosen cover and weights.
  • Noncontextual vs. contextual cases. If ρ is noncontextual then g C = p C ( ρ ) for all C and Φ ( ρ ) = 0 . If ρ is contextual, g necessarily deviates from Born on at least one context and Φ ( ρ ) > 0 .
The derivation uses only density operators, projective contexts, Petz’s identity for conditional expectations, and classical KL geometry on NC . No Gleason-type or continuity postulates are required.
In summary, this section recasts contextuality as a two-stage variational problem. Locally, Petz’s identity forces the dephasing E C ( ρ ) as the unique information projection onto each context C, so the Born weights p C ( i ) = Tr ( ρ P i ) are obtained rather than assumed. Globally, we introduced the contextual divergence Φ ( ρ ) = min g NC C μ C D KL ( p C ( ρ ) g C ) , which is non-negative and vanishes iff the empirical model is noncontextual; otherwise, the closest noncontextual model is the classical I-projection of the Born bundle. Under full-support hypotheses the local minimizers are unique and the global optimal marginals are uniquely determined, providing a principled, quantitative baseline for the extensions and operational consequences developed in the next sections.

4. Transition and Update Rules for Changing Contexts

In the sheaf-theoretic view, contexts form a category Ctx whose objects are maximal abelian subalgebras C B ( H ) and whose morphisms are inclusions C C . A contravariant state presheaf St : Ctx o p Conv assigns each C the convex set S ( C ) of states block-diagonal in C, and each inclusion i the restriction i * given by the conditional expectation onto C. Any global state ρ induces a 0-cochain { σ C = E C [ ρ ] } , where E C is the trace-preserving decoherence map in context C. Abramsky–Brandenburger’s theorem [19,21] says contextuality is exactly the failure of this presheaf to admit a global section. Having shown that the Born rule uniquely fits a fixed cover of contexts, we now extend our variational principle to ask: how should one update these context-dependent state assignments when moving between contexts, while staying consistent on overlaps?
Problem 1
(Context Switch). Given a prior context C with σ C = E C [ ρ ] and a new context C , find σ C S ( C ) such that:
1. 
Overlap consistency: i * ( σ C ) = σ C C .
2. 
Minimal perturbation: σ C deviates as little as possible from ρ.
Condition (i) ensures the gluing condition: the local classical state on C must agree with the old state on any observable they share, so that no already-established facts are contradicted. Condition (ii) enforces a variational minimal-change principle: we only change what is necessary to accommodate the new context. These two requirements are captured by the quantum Jeffrey update, a quantum generalization of Jeffrey’s rule (and of Lüders’ rule for projective measurement) obtained via constrained relative entropy minimization:
Theorem 2
(Optimal Contextual Update). For prior state ρ on context C and target context C , the unique state σ C S ( C ) satisfying (i) and (ii) above is given by the minimal divergence projection:
σ C = arg min τ S ( C ) S ( τ ρ ) s . t . i * ( τ ) = σ C C .
here S ( τ ρ ) = Tr ( τ ln τ τ ln ρ ) is the Umegaki relative entropy. The solution of Equation (11) exists and is unique. Moreover, Equation (11) yields a functorial update: it is the right Kan extension of the presheaf state σ C along i : C C in the category of convex state spaces. Equivalently, successive context updates associate: if C C , then σ C obtained by Equation (11) in one step equals the result of first updating C C and then C C .
Proof sketch. 
The feasible set
τ S ( C ) : Tr ( τ P ) = Tr ( ρ P ) P D
is an affine submanifold of S ( C ) , and S ( τ ρ ) is strictly convex in τ [24,34]; hence a unique minimizer σ C exists by convex programming theory [36]. Introducing Lagrange multipliers λ P : P D for the linear constraints, one finds the stationary point by setting [37]
τ [ S ( τ ρ ) + P D λ P ( Tr ( τ P ) Tr ( ρ P ) ) ] = 0 .
This yields the quantum Bayes rule solution:
log σ C = log ρ P D λ P P , so that σ C = exp log ρ P D λ P P Tr [ exp ( log ρ P D λ P P ) ] .
The λ P are chosen such that Tr ( σ C P ) = Tr ( ρ P ) for all P D . In particular, if D is generated by a single projector P, e.g., a yes/no evidence, then
σ C = e log ρ λ P Tr [ e log ρ λ P ] ,
which reproduces Lüders’ rule in the special case of a projective measurement ( ρ P = P ρ P ). Equation (11) thus generalizes classical Jeffrey updating and Jaynes’ maximum entropy principle to the quantum setting. Formally, Equation (11) implements a universal lifting of the state presheaf along the inclusion i: it is the right Kan extension of σ C to C , guaranteeing that no information in D is lost and that σ C is the “least biased” extension consistent with D. This extension is natural: if i : C C , then writing i * : = Ran i for the right Kan extension,
σ C : = ( i i ) * ( σ C ) i * σ C ,
so the update is well defined (path-independent / context-functorial). □
Crucially, Equation (11) preserves the contextuality invariant. It enforces agreement on D = C C without adding hidden variables, simply lifting σ C to σ C within the same Čech cohomology class. Any 1-cocycle obstruction δ σ is left untouched—rebasing never “patches” the global gap.
Proposition 2
(Cohomology Invariance). Equation (11) update leaves any cohomological measure of contextuality (for example, the contextual fraction) unchanged. Moreover, Equation (11) satisfies the Petz recovery condition: there is a CPTP map R : C C with
ρ = R ( σ C ) on C ,
so no overlap data are lost—off-diagonals are dropped, but all D-statistics can be recovered. Thus, the Born rule remains the dynamic variational glue, continually enforcing Born-rule consistency on overlaps while “forgetting” only the contextual (non-commuting) parts.

5. Multi-Observer Coordination via Shared Contexts

In this section we generalize our single-observer variational update to the multi observer setting, showing how independently held context states can be glued into a single joint assignment whenever they agree on shared measurements. This is crucial because, in practice, different agents often have access to incompatible sets of observables yet must reconcile their beliefs into a coherent quantum description—precisely the problem captured by Abramsky–Brandenburger’s sheaf-theoretic contextuality obstruction [19,21]. By proving a precise compatibility theorem and constructing the unique entropic barycentre via a small SDP plus dual optimization, we provide both necessary-and-sufficient criteria and an explicit algorithm for two-party consensus. Crucially, this section demonstrates that the Born rule plays the role of a universal “glue”, preserving cohomological invariants across contexts while minimizing total informational disturbance.

5.1. Setting and Compatibility Criterion

Consider two agents, A and B, who model the same physical system on a finite dimensional Hilbert space H C d . Each agent restricts attention to a aximal abelian sub-algebra (MASA)
C A = Alg { P i } i = 1 d , C B = Alg { Q j } j = 1 d ,
and holds a context state
σ C A = i p i P i , σ C B = j q j Q j .
The MASAs overlap in the (possibly non-trivial) sub-algebra D = C A C B . Agreement on the overlap means σ C A | D = σ C B | D = : σ D . Define the feasible set
S A B = ρ 0 : Tr ρ = 1 , E C A ( ρ ) = σ C A , E C B ( ρ ) = σ C B ,
where E C is the Umegaki–Petz conditional expectation. A joint state exists exactly when S A B .
Proposition 3
(Two-context classical gluing). Let C A , C B be two contexts with overlap D = C A C B , and let p A , p B be the empirical distributions in these contexts. The following are equivalent:
(i)
There exists a joint distribution g on C A C B whose marginals are p A and p B (i.e., the model is noncontextual for the cover { C A , C B } ).
(ii)
Overlap agreement: p A | D = p B | D .
(iii)
The linear feasibility problem “find g 0 , g = 1 with A A g = p A and A B g = p B ” is feasible.
For a two-set cover the Čech 1-cohomology obstruction reduces to (ii), so there is no additional topological condition [19,21]. The test (ii)⇔(iii) is a small linear program (indeed, just checking marginals on D).
Proof sketch. 
(i)⇒(ii) is trivial by marginalization. (ii)⇒(iii): construct g on C A C B by any consistent coupling of the two marginals agreeing on D (or solve the stated LP). (iii)⇒(i) is by definition. For two contexts the nerve is acyclic, so the Čech obstruction coincides with overlap agreement. □
Proposition 4
(Two-context quantum realisability). Given projectors { P i A } on C A and { P j B } on C B and target marginals p A , p B , there exists a state ρ with Tr ( ρ P i A ) = p A ( i ) and Tr ( ρ P j B ) = p B ( j ) iff the SDP
find ρ 0 , Tr ρ = 1 s . t . Tr ( ρ P i A ) = p A ( i ) , Tr ( ρ P j B ) = p B ( j )
is feasible.

5.2. Entropic Consensus: The Constrained Minimizer

On the non-empty convex set S A B define
F ( ρ ) = S ρ σ C A + S ρ σ C B ,
where S ( ρ σ ) = Tr ρ ( log ρ log σ ) is the Umegaki relative entropy. F is strictly convex and coercive on positive density operators, hence possesses a unique minimizer.
Theorem 3
(Entropic barycentre). Assume Theorem 3 holds and S A B contains a full-rank state. Then
1. 
(there exists a unique τ A B S A B minimizing F;
2. 
τ A B satisfies
τ A B = exp 1 2 ( log σ C A + log σ C B ) Λ Tr exp [ ]
for the unique Λ D solving the linear system Tr ( τ A B P i ) = p i , Tr ( τ A B Q j ) = q j .
Proof sketch. 
Apply the KKT conditions to Equation (14) under the affine constraints Equation (13). The gradient ρ F = log ρ 1 2 ( log σ C A + log σ C B ) + I [25], together with Lagrange multipliers in D and the trace hyperplane, yields Equation (15). Strict convexity of F gives uniqueness; positivity of the exponential ensures τ A B is full rank, closing the Slater loop. Equation (15) is the matrix log-Euclidean/Karcher mean with linear constraints [38]. □

5.3. Structural Properties

  • Associativity or independence. The map
    ( σ C k ) k = 1 m arg min ρ k S ( ρ σ C k )
    is a right Kan extension in the 2-category of convex state spaces; Kan extensions compose, so multi-observer consensus is order-independent [19].
  • Minimal disturbance. Each agent’s new marginal equals its old context state: E C A ( τ A B ) = σ C A and E C B ( τ A B ) = σ C B . Information-geometrically, τ A B is the unique Bregman projection of the midpoint 1 2 ( σ C A , σ C B ) onto the linear family Equation (13) [39].
  • Cohomology is preserved. The barycentre does not alter the Čech class; if the original cover is contextual, no sequence of pairwise barycentres can remove the obstruction. Conversely, if iterative gluing cancels every cocycle the resulting global state witnesses non-contextuality (Abramsky hierarchy) [21].

5.4. Algorithmic Note

Solving Equation (15) numerically amounts to maximizing the strictly concave dual
g ( Λ ) = log Tr exp 1 2 ( log σ C A + log σ C B ) Λ i α i p i j β j q j γ ,
where Λ = i α i P i + j β j Q j + γ I . Newton or mirror-descent converges in time poly(d); each step requires a matrix exponential and a handful of traces. In low dimensions closed-form Klyachko inequalities allow an analytic feasibility check [40], but SDP solvers scale better in practice.
This section shows that the Born rule emerges not only as a static axiom but as a dynamic law: Least informational disturbance + overlap agreement ⇒ unique global density compatible with all contexts.
Any alternative rule would either break agreement on D or yield higher total divergence, violating universal optimality. Thus, the entropic barycentre furnishes a universal, natural transformation on the sheaf of states, governing belief updates for single agents and consensus among many. In categorical terms, quantum probability is the only way to glue local classical pictures into a coherent whole, representing exactly the content of the Abramsky-Brandenburger obstruction-theoretic analysis.

6. Worked Analytical Examples

To make the abstract variational machinery concrete, this section walks through four non-trivial cases—ranging from a single qubit to a three-qubit GHZ paradox—showing exactly how the Petz-projection/entropy-minimisation principle singles out Born-rule weights and how contextuality manifests in the gluing step. Each example is chosen to illuminate a different subtlety: complementarity, state-independent contextuality, Čech-cocycle obstruction, and quantitative resource cost.

6.1. Single Qubit in Complementary Contexts

6.1.1. Contexts

Take the Bloch state
ρ = 1 2 ( 1 + r · σ ) , r 1 ,
and the two MASAs
C Z = Alg { σ z } , C X = Alg { σ x } .

6.1.2. Local Petz Projections

Dephasing is simply
E C Z ( ρ ) = 1 2 1 + r z σ z , E C X ( ρ ) = 1 2 1 + r x σ x ,
each of which minimizes the Umegaki relative entropy within its context [41].

6.1.3. Born Weights Recovered

Reading off diagonals gives
p = 1 2 ( 1 + r z ) , p = 1 2 ( 1 r z ) in C Z , q = 1 2 ( 1 + r x ) , q = 1 2 ( 1 r x ) in C X ,
i.e., the usual p i = Tr ( P i ρ ) .

6.1.4. Gluing Check

Because C Z C X = 1 , overlaps are trivial and the Born probabilities always glue; hence a single qubit is non-contextual in this two-context scenario.

6.1.5. Jensen–Shannon Cost

The quantum JS distance between ρ and its Z-dephasing is
d QJS ( ρ , E C Z ρ ) = S ( ρ + E C Z ρ 2 ) 1 2 S ( ρ ) 1 2 S ( E C Z ρ ) ,
a closed-form function of r = r x 2 + r y 2 that vanishes iff ρ is already diagonal [42].

6.2. Two-Qubit Mermin–Peres Magic Square

The magic square provides a state-dependent contextuality proof with nine observables arranged in three incompatible row/column contexts as shown in Table 1 [43].

6.2.1. Contexts

Each row and each column forms a commuting triple, giving six MASAs C R i , C C j .

6.2.2. Local Minimizers

For any two-qubit state ρ the Petz projection onto, say, C R 1 zeros all off-diagonals in the joint eigenbasis of the three row-1 observables and reproduces Born weights ( ± 1 ) on the four common eigenstates.

6.2.3. Čech Cocycle

Overlaps such as C R 1 C C 1 = Alg σ z 1 carry incompatible assignments (their product signs differ by 1 ). Computing the Čech 1-cocycle shows [ g ] 0 , so no global section exists—contextuality in action.

6.2.4. Resource Cost

We quantify contextuality by the distribution–level relative entropy
Φ ( ρ ) : = min g NC C M μ C D KL p C ( ρ ) g C ,
where NC is the noncontextual polytope in the space of empirical models. In the standard CHSH cover, the maximally entangled Bell state yields Φ ( ρ Bell ) > 0 (its correlations lie outside NC ), so the resource cost is strictly positive [23].

6.3. Qutrit Kochen–Specker (18-Vector) Set

Peres’ minimal 18-projector construction yields a state-independent proof in d = 3 [44]. The measurement cover has 18 rank-1 projectors grouped into 9 orthonormal triads C k .

6.3.1. Local Born Weights

For any qutrit state ρ the Petz map dephases in each triad basis giving probabilities p k i = v k i | ρ | v k i .

6.3.2. Gluing Obstruction

Because each projector appears in exactly two contexts, assigning 0 , 1 values that sum to one per triad leads to a parity contradiction. The Čech cocycle therefore never vanishes, independent of ρ .

6.3.3. Analytic Metric Gap

Using the convex programe
C rel ( ρ ) = min σ S ( ρ σ ) s . t . Tr ( P k i σ ) = x k i , x k 1 + x k 2 + x k 3 = 1 ,
one finds C rel ( ρ ) log 4 3 for the maximally mixed state—a strictly positive, state-independent contextuality gap.

6.4. Three-Qubit GHZ Paradox

The GHZ state
| GHZ = 1 2 ( | 000 + | 111 )
exhibits maximal contradiction among four commuting stabilizer contexts:
C 1 = σ x σ x σ x , ; σ z σ z 1 , ; σ z 1 σ z , ; 1 σ z σ z ,
cyclically permuted to C 4 [45].

6.4.1. Local Projections

Dephasing ρ GHZ in each C i yields Born weights with perfect correlations (e.g., σ x 3 = + 1 while the product of the three σ z σ z 1 -type observables equals 1 ).

6.4.2. Čech Obstruction & No-Sign Problem

The four contexts overlap pairwise in non-trivial subalgebras. Computing the product of assigned eigenvalues around the Čech 2-cycle gives 1 , so no classical section exists.

6.4.3. Quantitative Contextuality

The relative entropy cost to the closest non-contextual distribution equals two bits for the perfect GHZ correlations:
C rel ( ρ GHZ ) = 2 bits ,
matching the theoretical maximum for three dichotomic observables [23].

6.5. Numerical Illustration: Contextuality vs. Entanglement in the Magic-Square Cover

To complement our analytic results, we carried out a synthetic experiment on the two-qubit “magic-square” measurement cover to track how the global contextuality cost grows as the state’s entanglement increases. We parametrize a family of pure states
| ψ ( θ ) = cos θ | 00 + sin θ | 11 , θ [ 0 , π 4 ] ,
whose local entanglement entropy
S ent ( θ ) = cos 2 θ log cos 2 θ sin 2 θ log sin 2 θ
runs from 0 bits (product state) to 1 bit (maximally entangled).

6.5.1. Procedure

  • Contexts. We use the standard Mermin–Peres square: three “row” MASAs { Z I , I Z } , { I X , X I } , { Z X , X Z } and three “column” MASAs { Z I , I X } , { I Z , X I } , { Z Z , X X } .
  • Joint probabilities. For each context C and each θ , we compute
    p s 1 , s 2 C ( θ ) = Tr P s 1 , s 2 C | ψ ( θ ) ψ ( θ ) | ,
    where P s 1 , s 2 C = 1 4 ( 1 + s 1 O 1 ) ( 1 + s 2 O 2 ) projects onto the joint eigenspace of the two commuting Pauli generators O 1 , O 2 with eigenvalues s 1 , s 2 { ± 1 } .
  • Contextuality proxy. As a proof-of-concept, we define
    Φ ˜ ( θ ) = C D KL p C ( θ ) p ( 1 ) C ( θ ) p ( 2 ) C ( θ ) ,
    i.e., the sum of per-context Kullback–Leibler divergences between each joint distribution and the product of its one-marginals. By construction Φ ˜ = 0 for product states and increases with inter-observable correlations.
  • Sweep and plot. We sampled θ at 60 evenly spaced points in [ 0 , π 4 ] , computed S ent ( θ ) and Φ ˜ ( θ ) , and plotted one against the other.

6.5.2. Results

The curve shown Figure 1 is strictly increasing and convex-looking. At θ = 0 , | ψ is separable and Φ ˜ 0 . As θ approaches π 4 , the two qubits develop stronger correlations in every context, driving Φ ˜ up to roughly 3 bits of summed mutual information.

6.5.3. Discussion

  • Although Φ ˜ is only a proxy for the true global cost Φ , it already captures the hallmark trend: no entanglement ⇒ no contextual correlations; more entanglement ⇒ more contextuality cost.
  • Replacing the product-of-marginals by the exact noncontextual assignments g C (via a small convex program) yields the rigorous Φ ( θ ) , which will follow the same monotonic shape but sit uniformly above Φ ˜ .
  • This numerical demonstration reinforces our variational framework: entanglement is a resource for contextuality, with the latter rising smoothly as one “turns on” quantum correlations in the magic-square cover.

6.6. Take-Aways

  • Complementarity (Section 6.1) shows that the variational principle reduces to ordinary dephasing when contexts do not overlap.
  • Magic-square contextuality (Section 6.2) demonstrates how Born-rule weights can be locally optimal yet globally obstructed.
  • State-independent KS (Section 6.3) underlines that the obstruction can survive every possible state, emphasizing the lattice, not the state.
  • GHZ paradox (Section 6.4) illustrates maximal contextual “distance” and provides a benchmark where the entropy-of-contextuality attains its upper bound.
  • Two-qubit magic-square simulation (Section 6.5) tracks a proxy contextuality cost versus entanglement, confirming that contextual divergence grows monotonically with entanglement.
Together these worked examples make the abstract sheaf-theoretic and information geometric ideas tangible, and confirm that the Born rule emerges as the unique least disturbance probability assignment in every non-trivial scenario we can analyze analytically.

7. Philosophical Reverberations

  • From axiom to rule-of-reason. Elevating the Born formula from a postulate to the unique minimizer of an information-geometric variational problem anchors quantum probability in the same rational-update logic that underlies classical Bayesian inference. As with Jaynes’ maximum-entropy principle, the “dice” nature seems to disappear; we merely adopt the least-disturbing classical portrait that any context allows. In this light the trace rule becomes a normative prescription on agents confronted with incompatible frames, resonating with the subjective-Bayesian spirit of QBism yet grounded in an objective optimization over state space [46].
  • Relational ontology made precise. Rovelli’s relational quantum mechanics asserts that physical quantities obtain values only relative to an interaction, not in vacuo [28]. Our framework realises that creed mathematically: a density matrix has meaning only inside a maximal abelian sub-algebra; probabilities are coordinates in that chart. No “view from nowhere” survives, because a global, chart-independent distribution is blocked by the Čech cocycle of contextuality.
  • Relational perspectivalism made quantitative. The sheaf-theoretic obstruction already denies a view-from-nowhere: there need not exist a single global section compatible with all contexts. The divergence Φ ( ρ ) strengthens this statement by assigning a magnitude to that failure. Relationality thus becomes a quantitative law: how far one must move to glue all local perspectives into a single classical narrative.
  • Contextuality as intrinsic curvature. Abramsky and Brandenburger first cast contextuality as the obstruction to a global section of a measurement sheaf [19]. We show that this obstruction is not merely logical but metric: the bundle of classical charts is twisted in such a way that any attempt to flatten it incurs a strictly positive entropy cost. In analogy with gauge theory, where curvature measures the failure of local trivializations to mesh, contextuality is the “field strength” of quantum probability. Philosophers who argue that gauge potentials encode real holism rather than surplus structure will recognise the parallel [47,48].
  • Epistemic–ontic unification. The same relative-entropy functional that tells an observer how to compress her expectations also quantifies the ontic impossibility of a non-contextual hidden-variable model. Hence the epistemic (agent-centred) and ontic (world-centred) aspects of quantum theory are not two realms but two facets of one geometric object. Spekkens’ operational contextuality criterion—originally couched in ontological-model language—fits seamlessly into this picture when rephrased as a distance to the non-contextual polytope [49].
  • Non-classicality hierarchies converge. Work equating Wigner-function negativity with contextuality suggests that many signatures of “quantumness” are different cuts of the same topological cloth [50]. By deriving probabilities from a divergence to the non-contextual set, our framework subsumes negativity, entanglement phases and measurement incompatibility into a single resource metric—hinting at a unified taxonomy of quantum resources.
  • Rehabilitating structural realism. If properties exist only as chart-dependent relational structures, then what is real are precisely those structural relations—class-to-class transition maps and their curvature. This echoes the structural realist stance that takes morphisms, not objects, as primitive. Quantum foundations thus align with modern philosophy of science, where laws manifest as constraints on possible relational structures rather than as intrinsic traits of isolated systems.
  • Prospects for a gauge-theoretic language of measurement. Viewing Born-rule assignment as a choice of local gauge, while contextuality plays the role of curvature, opens the door to exporting the rich toolkit of fibre-bundle mathematics into quantum foundations. Categories, connections and holonomies may become the natural dialect for future debates about “where the weirdness lives,” replacing the venerable but limited particle–wave and ontology–epistemology binaries.
Together, these reflections recast quantum mechanics as a geometrically ordered, relationally woven fabric in which chance and incompatibility arise not from hidden variables or observer caprice, but from the irreducible twist of the classical charts through which any observer must gaze.

8. Conclusions

In this work we show that the Born rule is not an independent postulate but the output of a two-stage variational principle. Locally, for each measurement context (a MASA), the Umegaki relative entropy together with Petz’s Pythagorean identity forces the dephasing E C ( ρ ) as the unique information projection of ρ onto the classical face S ( C ) ; the resulting diagonals are precisely the Born weights p C ( i ) = Tr ( ρ P i ) , obtained rather than assumed. Globally, we compare the Born bundle { p C ( ρ ) } to the noncontextual set NC and define the contextual divergence
Φ ( ρ ) = min g NC C μ C D KL ( p C ( ρ ) g C ) .
when ρ is noncontextual, the minimizer matches Born on every context and Φ ( ρ ) = 0 ; when ρ is contextual, no exact global section exists, and the minimizer deviates minimally from Born to satisfy the global consistency constraints.
Our finite-dimensional analysis rests on three pillars. (1) Quantification: logical (sheaf) obstruction becomes an operational cost via Φ ( ρ ) . (2) Local uniqueness: in each context the Petz identity reduces the problem to a classical KL on the diagonal, yielding a unique dephased minimizer under full support. (3) Global optimality: no alternative assignment—even when constrained to be noncontextual—can achieve a lower total divergence than the classical I-projection of the Born bundle. The framework extends to degenerate PVMs (block dephasing) and to POVMs via Naimark dilation: the constrained minimizer is the quantum exponential-family state exp ( log ρ i λ i E i ) / Z , with Lüders’ map appearing only in the projective case. A companion appendix gives the convex-optimization details (existence, full KKT conditions handling zeros, and uniqueness of the optimal marginals); under informational completeness and noncontextuality these marginals identify ρ uniquely.
In short, once one insists on least-disturbing classical shadows locally and a best possible classical glue globally, the trace-form probabilities are compelled. Contextuality then has a quantitative meaning: the minimal information cost Φ ( ρ ) of forcing a single classical narrative for inherently relational quantum data. No other assignment simultaneously minimizes information loss and remains as classical as the scenario allows.
Philosophically, our variational perspective recasts quantum probabilities as rational updates—the least-informative inferences compatible with each observer’s measurement frame—while embedding relational quantum mechanics and sheaf-cohomological contextuality in a common information-geometric language. Contextuality itself is revealed to be a kind of curvature in the fiber bundle of classical charts, and the Born rule the only flat connection that minimally disturbs the quantum state. Technically, this unifies disparate threads—categorical classical structures, resource-theoretic monotones, and operational reconstructions—under the umbrella of entropy-minimization, suggesting that negativity, incompatibility and entanglement may all be facets of one geometric resource.
Looking ahead, three directions seem especially promising. (i) Beyond finite dimension. Our proofs assume finite d and faithful states to ensure uniqueness and compactness. Extending to separable Hilbert spaces with normal, faithful states would replace Umegaki’s finite-dimensional S ( · · ) by its von Neumann–algebra analogue (Araki relative entropy) and use conditional expectations onto von Neumann subalgebras (via Petz/Takesaki). The POVM case is already structurally covered by Naimark dilation; the genuinely new work is handling domain issues (supports, l.s.c.) and measurability in the infinite-dimensional setting. (ii) Protocol-level implications. Because relative entropy governs optimal error exponents (quantum Stein), Φ ( ρ ) naturally lower-bounds the asymptotic penalty for simulating the Born bundle with any noncontextual model. This invites concrete bounds in device-independent randomness certification, calibrated classical-simulation overheads (sample complexity/regret), and conservative benchmarks related to contextuality-powered computation (e.g., magic-state distillation)—with the caveat that translating Φ to thresholds requires specifying the free operations and noise models. (iii) Geometry and “gauge.” The sheaf obstruction shows there is no global potential; our divergence Φ quantifies the failure to patch local I-projections. A careful “gauge-theoretic” reading—contexts as local trivializations, overlap data as transition functions, and Φ as an obstruction functional—is tempting. At present this is a suggestive analogy; making it precise would mean identifying a connection/holonomy picture compatible with KL/Bregman geometry rather than asserting literal curvature of space-time.
Above all, the lesson is methodological: if one demands locally least-disturbing classical shadows (the dephased E C ( ρ ) from minimizing Umegaki divergence) and then glues them by a global I-projection onto the noncontextual set, the resulting per-context probabilities are forced to be the Born weights p C ( i ) = Tr ( ρ P i ) . Contextuality is what remains—the quantified obstruction to a single classical narrative. This reverses the usual question. Rather than “why is quantum theory contextual?”, ask: given contextual data, what is the least-biased noncontextual approximation? The two-stage variational answer singles out the Born rule under our stated assumptions (finite d, faithful states, rank-1 PVMs; with degenerate PVMs/POVMs handled via dephasing/Naimark). We expect this reframing to inform both foundational debates and pedagogy by turning contextuality from a slogan into a calculus.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the fact that this is an ongoing research.

Acknowledgments

The core concepts, theoretical constructs, and novel arguments presented in this article are a synthesis and concretization of my original ideas. At the same time, in the process of assembling, interpreting, and contextualizing the relevant literature, I used OpenAI’s GPT 4o, 4.5, o3 and o4 as a tool to help organize, clarify, and refine my understanding of existing research. In addition, I utilized OpenAI, San Francisco, CA, USA reasoning models and sought their assistance in refining the presentation of the text and the mathematics. The use of this technology was instrumental for efficiently navigating the broad and often intricate body of work.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CPTPCompletely positive, trace-preserving map (quantum channel)
POVMPositive operator-valued measure
MASAMaximal abelian self-adjoint algebra (projective measurement context)
RQMRelational quantum mechanics
RQDRelational quantum dynamics
S ( ρ σ ) Umegaki relative entropy (quantum KL; strictly convex in first argument)
D KL ( p q ) Classical Kullback–Leibler divergence
D JS Q Quantum Jensen–Shannon divergence (symmetric; D JS Q is a metric)
S ( C ) Classical state space for context C (states diagonal in alg { P i } )
E C ( ρ ) Conditional expectation (dephasing) of ρ onto context C
Π C Conditional expectation map onto algebra C (pinching operator)
Φ ( ρ ) Contextual divergence (min. weighted KL to NC ; = 0 iff noncontextual)
μ C Positive weight for context C in the sum defining Φ ( C μ C = 1 )
p C ( ρ ) , g C Born probabilities p C ( i ; ρ ) = Tr ( ρ P i ) ; model marginals g C in context C
NC Noncontextual polytope (empirical models admitting a global joint/section)
M Measurement cover (family of contexts C X )
A C Marginalization map from global g to g C (so g C = A C g )
O E F Overlap algebra generated by POVMs E and F
M E Classical measurement channel of POVM E (outputs Tr ( ρ E i ) on basis | i )

Appendix A. Degenerate & POVM Contexts Survive Naimark Dilation

Appendix A.1. Preliminaries and Notation

Let E = { E i } i = 1 m be a POVM on a finite–dimensional Hilbert space H with i E i = 1 H . By Naimark’s theorem there exist an ancilla K, an isometry V : H H ^ : = H K , and a commuting family of orthogonal projections P ^ i such that E i = V P ^ i V [51]. Denote by C ^ = alg { P ^ i } B ( H ^ ) the resulting MASA and by
Π C ^ ( X ) = i P ^ i X P ^ i
the (trace-preserving) conditional expectation (a.k.a., pinching) onto C ^ [26,52]. Define the classical measurement channel
M E ( ρ ) = i = 1 m Tr ( ρ E i ) | i i | ,
whose adjoint is
M E diag ( x 1 , , x m ) = i x i E i .
We use the Umegaki relative entropy S ( τ ρ ) = Tr [ τ ( log τ log ρ ) ] , which is invariant under isometries, monotone under CPTP maps, and strictly convex in its first argument [25,53].

Appendix A.2. KL Projection with Fixed POVM Statistics

Appendix A.2.1. Problem

Given ρ D ( H ) , minimize the change to ρ measured by relative entropy while preserving the observed POVM statistics:
min τ D ( H ) S ( τ ρ ) s . t . M E ( τ ) = M E ( ρ ) Tr ( τ E i ) = Tr ( ρ E i ) i .
Theorem A1
(Minimum-change state under fixed POVM outcomes). The unique solution of Equation (A4) has the exponential-family form
τ = exp log ρ i = 1 m λ i E i Tr exp log ρ i = 1 m λ i E i , with Tr ( τ E i ) = Tr ( ρ E i ) i ,
for a (generally unique) vector of Lagrange multipliers ( λ i ) i . Special cases. (i) If E i = P i are orthogonal projectors (a PVM), then τ = i Tr ( ρ P i ) P i (conditional expectation onto the MASA). (ii) If all E i commute, τ belongs to the abelian algebra alg { E i } and Equation (A5) reduces to a classical exponential family.
Proof sketch. 
Form the Lagrangian L ( τ , λ ) = S ( τ ρ ) + i λ i Tr ( τ E i ) Tr ( ρ E i ) . Stationarity on the manifold of density operators yields the exponential form Equation (A5); strict convexity of S ( · ρ ) in its first argument gives uniqueness on the common support [27,54]. The PVM case follows because the constraints commute and the minimizer collapses to the conditional expectation (pinching) [26]. □

Appendix A.2.2. Remark (Instruments vs. Variational Projection)

The completely positive map
L E ( ρ ) = i E i ρ E i
is the post-measurement state of the canonical Lüders instrument for E. It generally does not solve Equation (A4) (nor preserve the E-statistics except in special cases); it coincides with τ only for PVMs (and certain commutative situations). See the SIC example in Appendix A.6 below.

Appendix A.3. Quantum Jeffrey Updates Between Contexts

Let E = { E i } and F = { F j } be POVMs. Write the overlap algebra
O E F = alg { E i , F j } .
There are two natural (and distinct) variational updates:
(A)
Preserve all expectations on the overlap algebra. Minimize S ( τ ρ ) subject to Tr ( τ X ) = Tr ( ρ X ) for all X in a generating set of O E F . Then
τ = exp ( log ρ Λ ) Tr exp ( log ρ Λ ) , Λ O E F chosen to match the constraints .
when O E F is abelian (e.g., PVMs), τ equals the conditional expectation (pinching) of ρ onto O E F [26].
(B)
Preserve the F–POVM outcome distribution. Minimize S ( τ ρ ) subject to Tr ( τ F j ) = Tr ( ρ F j ) for all j. The solution is the exponential family Equation (A5) with { E i } replaced by { F j } .
  • In either case, Naimark dilations can be used to realize the constraints with commuting projectors, but the minimizer is still of exponential form; it is not generally the Lüders channel j F j ρ F j unless F is a PVM. See also [55,56].

Appendix A.4. Global Contextuality Divergence Is Naimark-Stable

For a POVM cover { E α } α C define (as in the main text)
Φ ( ρ ) = min g NC α μ α D KL p α ( ρ ) g α ,
where p α ( ρ ) are the Born probabilities for context α . If V is any Naimark isometry dilating all contexts to projective measurements { P ^ α } , then
Φ E α ( ρ ) = Φ P ^ α V ρ V ,
because Born probabilities are preserved by dilation ( Tr ( ρ E i α ) = Tr ( V ρ V P ^ i α ) ) and Φ depends only on those probabilities. Hence all results stated for PVM covers apply verbatim to POVM covers.

Appendix A.5. Degenerate Projectors

If a projective context contains degenerate spectral projectors P k (rank > 1 ), the conditional expectation is the block pinching
E P ( ρ ) = k P k ρ P k ,
which is basis-independent within each block and remains the unique minimizer of S ( τ ρ ) over the block-diagonal algebra (strict convexity argument unchanged) [57,58].

Appendix A.6. Illustrative Toy Example: Qubit Tetrahedral SIC

For the symmetric-informationally-complete POVM E = { 1 4 ( 1 + n i · σ ) } i = 1 4 and a qubit state ρ = 1 2 ( 1 + r · σ ) , the Lüders channel Equation (A6) gives
L E ( ρ ) = i = 1 4 1 4 1 + n i · r 1 + n i · σ 2 = 1 2 1 + 1 3 r · σ ,
i.e., a 1 / 3 shrink of the Bloch vector [59]. This does not preserve the SIC outcome probabilities unless ρ is maximally mixed, so L E ( ρ ) is not the minimizer of Equation (A4). The true minimizer is the exponential-family state Equation (A5) with multipliers ( λ i ) determined by the four linear constraints Tr ( τ E i ) = Tr ( ρ E i ) .

Appendix B. Rigorous Variational Proof (Finite-Context Setting)

This appendix replaces the informal argument in Section 3 with a fully rigorous derivation that: (i) formulates the optimization over the noncontextual feasible set; (ii) establishes existence of a minimizer using lower-semicontinuity on a compact domain (no hand-waving); (iii) handles zero-probability coordinates via complete KKT conditions; (iv) proves uniqueness of the optimal marginals by strict convexity; and (v) identifies when the unique optimal marginals coincide with the Born distributions, and how informational completeness then yields the underlying state.

Appendix B.1. Setting and Notation

  • Hilbert space: H C d ; density matrix ρ D ( H ) .
  • Context cover: M = { C 1 , , C m } , each C j = { P j , 1 , , P j , d } a rank-1 PVM with i P j , i = 1 .
  • Born (context-wise) distributions: p C j ( i ) : = Tr ( ρ P j , i ) .
  • Deterministic global assignments: let Ω be the (finite) set of functions ω that assign to each context C j precisely one outcome index ω ( j ) { 1 , , d } and are context-consistent on overlaps (if P j , i = P k , , then ω ( j ) = i iff ω ( k ) = ). (This is the usual “global section” set in the sheaf model; Ω may be empty in strongly contextual scenarios. In that case, one can enlarge the cover (e.g., add symmetric white-noise coarse-graining) to restore feasibility; here we assume Ω so the noncontextual set is nonempty.)
  • Noncontextual polytope (global variable): Δ Ω : = { g R 0 Ω : ω Ω g ω = 1 } .
  • Context-wise marginals of g: for each j and i,
    ( A j g ) ( i ) : = ω Ω : ω ( j ) = i g ω , so g C j : = A j g Δ C j .
    Here A j is the marginalization matrix from R Ω to R d .
  • Weights: μ j > 0 with j μ j = 1 .
  • Objective (global “glue” cost):
    F ( g ) : = j = 1 m μ j D KL p C j A j g = j = 1 m i = 1 d μ j p C j ( i ) ln p C j ( i ) ( A j g ) ( i ) .
We minimize F over the noncontextual set Δ Ω . This enforces both per-context normalization and global consistency (all g C j are marginals of a single g).

Appendix B.2. Existence of a Minimizer

We view F as an extended-value function by the usual KL convention: p log p y is + when p > 0 and y = 0 , and 0 when p = 0 and y 0 .
Lemma A1
(Lower-semicontinuity and coercivity on the boundary). For each j, the map g D KL ( p C j A j g ) is lower-semicontinuous on Δ Ω ; moreover, it diverges to + along any sequence in Δ Ω for which ( A j g ) ( i ) 0 with p C j ( i ) > 0 .
Proof. 
The map g A j g is linear and continuous. On [ 0 , 1 ] , y p log ( p / y ) is lower-semicontinuous for each fixed p 0 , with the stated extended value; composition with a continuous map preserves lower semicontinuity; finite nonnegative sums preserve it. The divergence claim is immediate from p log ( p / y ) + as y 0 for p > 0 . □
Proposition A1
(Existence of an optimal noncontextual model). F attains its minimum on Δ Ω .
Proof. 
Δ Ω is a nonempty compact simplex. By Lemma A1, F is lower-semicontinuous on Δ Ω and not identically + (e.g., take g uniform on Ω , which yields strictly positive marginals). Hence F achieves its infimum by the Weierstrass theorem; see, e.g., Theorem 1.9 in [60] and Section 2.7 in [54]. □

Appendix B.3. Uniqueness of Optimal Marginals via Strict Convexity

Lemma A2
(Strict convexity in the marginals). For fixed p Δ C j , the function y D KL ( p y ) is strictly convex on { y Δ C j : y i > 0 whenever p i > 0 } . Consequently,
y : = ( y ( j ) ) j = 1 m j = 1 m μ j D KL p C j y ( j )
is strictly convex on the Cartesian product of those interiors.
Proof. 
The Hessian of y i p i log y i is diag ( p i / y i 2 ) , positive definite on the stated domain. □
Proposition A2
(Uniqueness of optimal marginals). Let g minimize F on Δ Ω . Then the vector of marginals A j g j = 1 m is unique. If two minimizers g ( 1 ) , g ( 2 ) exist, they have identical marginals A j g ( 1 ) = A j g ( 2 ) for all j (they may differ along the affine fiber { g : A j g = A j g j } ).
Proof. 
F depends on g only through y : = ( A j g ) j . The image set Y : = { ( A j g ) j : g Δ Ω } is convex and compact. By Lemma A2 the objective in y is strictly convex on the relevant domain, hence admits a unique minimizer y Y . Any two g that minimize F must map to the same y . □

Appendix B.4. KKT Characterization with Zeros on the Support

Write the Lagrangian with a single global variable g Δ Ω :
L ( g , λ , ν ) = j = 1 m μ j i = 1 d p C j ( i ) ln p C j ( i ) ( A j g ) ( i ) + λ ω Ω g ω 1 ω Ω ν ω g ω ,
where λ R enforces 1 g = 1 and ν ω 0 enforce g ω 0 .
  • Stationarity
For any optimal g in the (relative) interior of its support,
g L ( g , λ , ν ) = j = 1 m μ j A j p C j A j g + λ 1 ν = 0 ,
with the fraction taken componentwise. Coordinates i with p C j ( i ) = 0 simply do not contribute to the gradient.

Appendix B.4.1. Complementary Slackness and Feasibility

ν ω g ω = 0 ω , ν ω 0 , g ω 0 , ω g ω = 1 .
Moreover, F ( g ) < forces ( A j g ) ( i ) = 0 whenever p C j ( i ) = 0 ) is allowed, but ( A j g ) ( i ) > 0 is required whenever p C j ( i ) > 0 ) .

Appendix B.4.2. Interpretation (Csiszár I-Projection)

Equation (A14) is the optimality condition that the image A j g j be the (weighted) Csiszár I-projection of the Born bundle ( p C j ) j onto the convex set Y = { ( A j g ) j : g Δ Ω } ; see [27,54]. When Ω separates images (e.g., columns of the stacked matrix A : = [ A 1 A m ] are affinely independent over the face touched by g ), the minimizer g itself is unique; otherwise, the optimal fiber { g : A j g = A j g j } is a (possibly higher-dimensional) face. A canonical selector is the maximum-entropy point on that face.

Appendix B.5. When Does the Minimizer Equal the Born Distributions?

Proposition A3
(Characterization of equality). Let g minimize F on Δ Ω , and let p ( ρ ) : = ( p C j ) j . Then the following are equivalent:
(a)
A j g = p C j for all j (the optimal marginals equal the Born distributions in every context).
(b)
p ( ρ ) Y , i.e., the Born bundle is noncontextual (admits a global joint g Δ Ω ).
In this case the minimum value is F ( g ) = 0 and any optimizer is supported on the affine fiber { g : A j g = p C j j } .
Proof. 
(b)⇒(a): If p ( ρ ) Y , choose g with A j g = p C j ; then F ( g ) = 0 is the global minimum, and by uniqueness of optimal marginals (Proposition A2) any optimizer must have A j g = p C j . (a)⇒(b): Trivial, since ( A j g ) j = p ( ρ ) Y by feasibility of g . □
When ρ is contextual for the chosen cover (so p ( ρ ) Y ), Propositions A2 and A3 imply that A j g deviates from p C j on at least one context and F ( g ) > 0 .

Appendix B.6. Informational Completeness and Reconstruction of ρ

Definition A1
(Informational completeness (IC)). The cover M is informationally complete if span { P j , i : 1 j m , 1 i d } = B ( H ) . Equivalently, the linear map ρ Tr ( ρ P j , i ) j , i is injective.
Corollary A1
(Reconstruction under IC). If M is IC and p ( ρ ) Y (noncontextual case), then any minimizer satisfies A j g = p C j for all j, and these equalities determine ρ uniquely via linear inversion on the IC frame { P j , i } . In particular, the variational program picks out the unique quantum state consistent with the optimal marginals.
Remark A1.
(i) The results above remain valid for degenerate PVMs if one replaces P j , i by the normalized block states π j , i : = P j , i / rank P j , i in the definitions of A j and p C j ; the proofs are unchanged. (ii) For POVMs, the global-variable formulation still applies once A j is defined by the corresponding classical post-processing; the uniqueness of optimal marginals remains a consequence of strict convexity of D KL in its second argument.

References

  1. Born, M. Zur Quantenmechanik der Stoßvorgänge. Z. FüR Phys. 1926, 37, 863–867. [Google Scholar] [CrossRef]
  2. Dirac, P.A.M. The Principles of Quantum Mechanics; Clarendon Press: Oxford, UK, 1930. [Google Scholar] [CrossRef]
  3. Neumaier, A. The Born Rule–100 Years Ago and Today. Entropy 2025, 27, 415. [Google Scholar] [CrossRef] [PubMed]
  4. Gleason, A.M. Measures on the Closed Subspaces of a Hilbert Space. J. Math. Mech. 1957, 6, 885–893. [Google Scholar] [CrossRef]
  5. Budroni, C.; Cabello, A.; Gühne, O.; Kleinmann, M.; Åke Larsson, J. Kochen–Specker contextuality. Rev. Mod. Phys. 2022, 94, 045007. [Google Scholar] [CrossRef]
  6. Kochen, S.; Specker, E.P. The Problem of Hidden Variables in Quantum Mechanics. J. Math. Mech. 1967, 17, 59–87. [Google Scholar] [CrossRef]
  7. Zurek, W.H. Environment-assisted invariance, entanglement, and probabilities in quantum physics. Phys. Rev. Lett. 2003, 90, 120404. [Google Scholar] [CrossRef] [PubMed]
  8. Zurek, W.H. Probabilities from entanglement, Born’s rule from envariance. Phys. Rev. A 2005, 71, 052105. [Google Scholar] [CrossRef]
  9. Schlosshauer, M.; Fine, A. On Zurek’s derivation of the Born rule. arXiv 2003, arXiv:quant-ph/0312058. [Google Scholar] [CrossRef]
  10. Deutsch, D. Quantum Theory of Probability and Decisions. Proc. R. Soc. A Math. Phys. Eng. Sci. 1999, 455, 3129–3137. [Google Scholar] [CrossRef]
  11. Wallace, D. The Emergent Multiverse: Quantum Theory According to the Everett Interpretation; Oxford University Press: Oxford, UK, 2012. [Google Scholar] [CrossRef]
  12. Wallace, D. A formal proof of the Born rule from decision-theoretic assumptions. arXiv 2009, arXiv:0906.2718. [Google Scholar] [CrossRef]
  13. Das Gupta, P. Born Rule and Finkelstein–Hartle Frequency Operator Revisited. arXiv 2011, arXiv:1105.4499. [Google Scholar] [CrossRef]
  14. Caves, C.M.; Fuchs, C.A.; Schack, R. Unknown quantum states: The quantum de Finetti representation. J. Math. Phys. 2002, 43, 4537–4559. [Google Scholar] [CrossRef]
  15. Busch, P. Quantum States and Generalized Observables: A Simple Proof of Gleason’s Theorem. Phys. Rev. Lett. 2003, 91, 120403. [Google Scholar] [CrossRef] [PubMed]
  16. Hardy, L. Quantum Theory From Five Reasonable Axioms. arXiv 2001, arXiv:quant-ph/0101012. Available online: http://arxiv.org/abs/quant-ph/0101012 (accessed on 26 July 2025).
  17. Chiribella, G.; D’Ariano, G.M.; Perinotti, P. Informational derivation of quantum theory. Phys. Rev. A 2011, 84, 012311. [Google Scholar] [CrossRef]
  18. Shimony, A. Contextual hidden-variables theories and Bell’s inequalities. Br. J. Philos. Sci. 1984, 35, 25–45. [Google Scholar] [CrossRef]
  19. Abramsky, S.; Brandenburger, A. The Sheaf-Theoretic Structure of Non-Locality and Contextuality. New J. Phys. 2011, 13, 113036. [Google Scholar] [CrossRef]
  20. Carù, G. On the Cohomology of Contextuality. arXiv 2017, arXiv:1701.00656. [Google Scholar] [CrossRef]
  21. Abramsky, S.; Mansfield, S.; Barbosa, R.S. The Cohomology of Non-Locality and Contextuality. arXiv 2012, arXiv:1111.3620. [Google Scholar] [CrossRef]
  22. Raussendorf, R. Putting paradoxes to work: Contextuality in measurement-based quantum computation. arXiv 2022, arXiv:2208.06624. [Google Scholar] [CrossRef]
  23. Grudka, A.; Horodecki, K.; Horodecki, M.; Horodecki, P.; Horodecki, R.; Joshi, P.; Kłobus, W.; Wójcik, A. Quantifying contextuality. Phys. Rev. Lett. 2014, 112, 120401. [Google Scholar] [CrossRef]
  24. Umegaki, H. Conditional expectation in an operator algebra. IV. Entropy and information. Kodai Math. Semin. Rep. 1962, 14, 59–85. [Google Scholar] [CrossRef]
  25. Hiai, F.; Petz, D. The proper formula for relative entropy and its asymptotics in quantum probability. Commun. Math. Phys. 1991, 143, 99–114. [Google Scholar] [CrossRef]
  26. Petz, D. Sufficient subalgebras and the relative entropy of states of a von Neumann algebra. Commun. Math. Phys. 1986, 105, 123–131. [Google Scholar] [CrossRef]
  27. Csiszár, I. I-Divergence Geometry of Probability Distributions and Minimization Problems. Ann. Probab. 1975, 3, 146–158. [Google Scholar] [CrossRef]
  28. Rovelli, C. Relational Quantum Mechanics. Int. J. Theor. Phys. 1996, 35, 1637–1678. [Google Scholar] [CrossRef]
  29. Rovelli, C. Relational Quantum Mechanics. In The Stanford Encyclopedia of Philosophy, Spring 2025 ed.; Zalta, E.N., Nodelman, U., Eds.; Metaphysics Research Lab, Stanford University: Stanford, CA, USA, 2025. [Google Scholar]
  30. Zaghi, A. Integrated Information in Relational Quantum Dynamics (RQD). Appl. Sci. 2025, 15, 7521. [Google Scholar] [CrossRef]
  31. Heunen, C. Categories and Quantum Informatics: Monoidal Categories; Lecture Notes; University of Edinburgh: Edinburgh, UK, 2018. [Google Scholar]
  32. Heunen, C.; Vicary, J. Categorical Quantum Mechanics: An Introduction; Lecture Notes; Department of Computer Science, University of Oxford: Oxford, UK, 2019. [Google Scholar]
  33. Fine, A. Hidden Variables, Joint Probability, and the Bell Inequalities. Phys. Rev. Lett. 1982, 48, 291–295. [Google Scholar] [CrossRef]
  34. Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2000. [Google Scholar]
  35. Abramsky, S.; Coecke, B. Categorical Quantum Mechanics. In Handbook of Quantum Logic and Quantum Structures; Engesser, K., Gabbay, D.M., Lehmann, D., Eds.; Elsevier: Amsterdam, The Netherlands, 2009; pp. 261–323. [Google Scholar] [CrossRef]
  36. Boyd, S.; Vandenberghe, L. Convex Optimization, 1st ed.; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  37. Donald, M.J. On the relative entropy. Commun. Math. Phys. 1986, 105, 13–34. [Google Scholar] [CrossRef]
  38. Moakher, M. A Differential Geometric Approach to the Geometric Mean of Symmetric Positive-Definite Matrices. SIAM J. Matrix Anal. Appl. 2005, 26, 735–747. [Google Scholar] [CrossRef]
  39. Ji, Z. Classical and Quantum Iterative Optimization Algorithms Based on Matrix Legendre-Bregman Projections. arXiv 2022, arXiv:2209.14185. [Google Scholar] [CrossRef]
  40. Klyachko, A. Quantum marginal problem and representations of the symmetric group. arXiv 2004, arXiv:quant-ph/0409113. [Google Scholar] [CrossRef]
  41. Bardet, I.; Capel, A.; Rouzé, C. Approximate Tensorization of the Relative Entropy for Noncommuting Conditional Expectations. Ann. Henri Poincaré 2022, 23, 101–140. [Google Scholar] [CrossRef]
  42. Brièt, J.; Harremoës, P. Properties of classical and quantum Jensen–Shannon divergence. Phys. Rev. A 2009, 79, 052311. [Google Scholar] [CrossRef]
  43. La Cour, B.R. Quantum contextuality in the Mermin-Peres square: A hidden variable perspective. arXiv 2021, arXiv:2105.00940. [Google Scholar] [CrossRef]
  44. Cabello, A.; Estebaranz, J.M.; García-Alcaine, G. Bell–Kochen–Specker theorem: A proof with 18 vectors. Phys. Lett. A 1996, 212, 183–187. [Google Scholar] [CrossRef]
  45. Ren, C.; Su, H.; Xu, Z.; Wu, C.; Chen, J. Optimal GHZ Paradox for Three Qubits. Sci. Rep. 2015, 5, 13080. [Google Scholar] [CrossRef]
  46. Fuchs, C.A.; Mermin, N.D.; Schack, R. An Introduction to QBism with an Application to the Locality of Quantum Mechanics. Am. J. Phys. 2014, 82, 749–754. [Google Scholar] [CrossRef]
  47. Healey, R. Gauge Theories and Holisms. Stud. Hist. Philos. Sci. Part B Stud. Hist. Philos. Mod. Phys. 2004, 35, 619–642. [Google Scholar] [CrossRef]
  48. Rivat, S. Wait, Why Gauge? PhilSci-Archive Preprint: Pittsburgh, PA, USA, 2023. [Google Scholar]
  49. Spekkens, R.W. Contextuality for preparations, transformations, and unsharp measurements. Phys. Rev. A 2005, 71, 052108. [Google Scholar] [CrossRef]
  50. Spekkens, R.W. Negativity and contextuality are equivalent notions of nonclassicality. Phys. Rev. Lett. 2008, 101, 020401. [Google Scholar] [CrossRef]
  51. Pellonpää, J.P.; Designolle, S.; Uola, R. Naimark dilations of qubit POVMs and joint measurements. J. Phys. A Math. Theor. 2023, 56, 155303. [Google Scholar] [CrossRef]
  52. Uhlmann, A. Relative entropy and the Wigner–Yanase–Dyson–Lieb concavity in an interpolation theory. Commun. Math. Phys. 1977, 54, 21–32. [Google Scholar] [CrossRef]
  53. Olivares, S.; Paris, M.G.A. Quantum estimation via minimum Kullback entropy principle. Phys. Rev. A 2007, 76, 042120. [Google Scholar] [CrossRef]
  54. Cover, T.M.; Thomas, J.A. Elements of Information Theory, 2nd ed.; Wiley Series in Telecommunications and Signal Processing; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar] [CrossRef]
  55. Koßmann, G.; Schwonnek, R. Optimising the relative entropy under semi definite constraints – A new tool for estimating key rates in QKD. arXiv 2024, arXiv:2404.17016. [Google Scholar] [CrossRef]
  56. Fedida, S. Einstein causality of quantum measurements in the Tomonaga–Schwinger picture. arXiv 2025, arXiv:2506.14693. [Google Scholar] [CrossRef]
  57. Quantum Computing Stack Exchange Community. Does Neumark’s/Naimark’s Extension Theorem Only Apply to Rank-1 POVMs? Quantum Computing Stack Exchange Q&A, Question ID 26018; Stack Exchange Inc.: New York, NY, USA, 2021. [Google Scholar]
  58. Quantum Computing Stack Exchange Community. Characterise, via Naimark’s Theorem, the POVM Corresponding to a PVM in a Dilated Space; Quantum Computing Stack Exchange Q&A, Question ID 26029; Stack Exchange Inc.: New York, NY, USA, 2021. [Google Scholar]
  59. Singh, J.; Arvind; Goyal, S.K. Implementation of discrete positive operator valued measures on linear optical systems using cosine–sine decomposition. Phys. Rev. Res. 2022, 4, 013007. [Google Scholar] [CrossRef]
  60. Rockafellar, R.T. Convex Analysis; Number 28 in Princeton Mathematical Series; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar] [CrossRef]
Figure 1. Proxy contextuality cost Φ ˜ ( ρ ) versus entanglement entropy S ( ρ A ) for the two-qubit Schmidt family | ψ ( θ ) = cos θ | 00 + sin θ | 11 . The monotonic rise from zero (product state) to a few bits (maximally entangled) confirms that contextual divergence increases smoothly with entanglement in the magic-square cover.
Figure 1. Proxy contextuality cost Φ ˜ ( ρ ) versus entanglement entropy S ( ρ A ) for the two-qubit Schmidt family | ψ ( θ ) = cos θ | 00 + sin θ | 11 . The monotonic rise from zero (product state) to a few bits (maximally entangled) confirms that contextual divergence increases smoothly with entanglement in the magic-square cover.
Entropy 27 00898 g001
Table 1. Tensor–product combinations of Pauli and identity operators on two qubits.
Table 1. Tensor–product combinations of Pauli and identity operators on two qubits.
Row 1Row 2Row 3
Col 1 σ z 1 1 σ z σ z σ z
Col 2 1 σ x σ x 1 σ x σ x
Col 3 σ z σ x σ x σ z σ y σ y
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zaghi, A. Born’s Rule from Contextual Relative-Entropy Minimization. Entropy 2025, 27, 898. https://doi.org/10.3390/e27090898

AMA Style

Zaghi A. Born’s Rule from Contextual Relative-Entropy Minimization. Entropy. 2025; 27(9):898. https://doi.org/10.3390/e27090898

Chicago/Turabian Style

Zaghi, Arash. 2025. "Born’s Rule from Contextual Relative-Entropy Minimization" Entropy 27, no. 9: 898. https://doi.org/10.3390/e27090898

APA Style

Zaghi, A. (2025). Born’s Rule from Contextual Relative-Entropy Minimization. Entropy, 27(9), 898. https://doi.org/10.3390/e27090898

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop