Extending Kolmogorov’s Axioms for a Generalized Probability Theory on Collections of Contexts

Kolmogorov’s axioms of probability theory are extended to conditional probabilities among distinct (and sometimes intertwining) contexts. Formally, this amounts to row stochastic matrices whose entries characterize the conditional probability to find some observable (postselection) in one context, given an observable (preselection) in another context. As the respective probabilities need not (but, depending on the physical/model realization, can) be of the Born rule type, this generalizes approaches to quantum probabilities by Aufféves and Grangier, which in turn are inspired by Gleason’s theorem.


Kolmogorov-type conditional probabilities among distinct contexts
A physical system or a mathematical entity may permit not only one "view" on it but may allow "many" such views.-Think of a single crystal luster whose light, depending on the viewpoint, may appear very different.
Schrödinger [1, p. 15 & 95] quoted the Vedantic analogy of a "many-faceted crystal which, while showing hundreds of little pictures of what is in reality a single existent object, does not really multiply that object. . . .A comparison used in Hinduism is of the many almost identical images which a many-faceted diamond makes of some one object such as the sun."Another example is the coordinatization or coding and encryption of a vector with respect to different bases -thereby in physical terms appearing as "coherent superpositions" (aka linear combinations) of the respective vectors of these bases.Still another example is the representation of an entity by isomorphic graphs.This idea is grounded in epistemology and in issues related to the (empirical) cognition of ontology and might appear both trivial and sophistic at first glance.Nevertheless it may be difficult to find means or formal models exhibiting multiple contextual views of one and the same entity.][4][5] A "view" or (used synonymously) "frame" 6 or "context" will be in full generality and thus informally (glancing at heuristics from quantum mechanics and partition logic) characterized as some domain or set of observables or properties which are (i) largest or maximal in the sense that any extension yields redundancies, (ii) yet at the same time in the finest resolution in the sense that the respective observables or properties are "no composite" of "more elementary" ones, (iii) mutually exclusive in the sense that one property or observation excludes another, different property or observation, as well as (iv) contains only simultaneously measurable, compatible observables or properties.
In what follows I shall develop a conceptual framework for very general probabilities on such collections of contexts.This amounts to an extension of Kolomogorov probabilities which are defined in a single context, to a multi-context situation.Scrutinized separately, every single context has "legit local" classical Kolomogorov probabilities.In addition to those local structures and measures, (intertwined) multi-context configurations and their probabilities have to be "joined", "woven", "meshed" or "stitched" together to result in consistent and coherent "global" multi-aspect views and probabilities.
In particular, one needs to cope with possible overlaps of contexts in common, intertwining, observables.][11] This text is organized as follows: first, Kolmogorov's axioms or principles for probabilities are generalized to arbitrary event structures not necessarily dominated by the quantum formalism.Then these principles will be applied to quantum bistochasticity, as well as partition logics which of-fer an abundance of alternate configurations.Some "exotic" probabilities as well as possible generalizations by Cauchy's functional equations are briefly discussed.Throughout this article, only finite contexts will be considered.

Generalization of Kolmogorov's axioms to arbitrary event structures
Suppose that, as it is assumed for classical Kolmogorov probabilities, the elements c 1 within any given single, individual finite context C = {c 1 , . . .c n } are mutually exclusive, compatible, and exhaustive; that is, the context contains a "maximal" set of mutually exclusive, compatible elements.Kolmogorov's axioms demand that (i) probabilities are non-negative; (ii) additivity of mutually exclusive events or outcomes P (c i ) + P (c j ) = P (c i ∪ c j ); (iii) the probability of the tautology formed by the union of all elements in the context, adds up to one, that is, ci∈C P (c i ) = P ci∈C c i = 1.Inspired by the multi-context quantum case discussed later the following generalization to two-or, by induction, to a multi-context configuration is suggested: Suppose two arbitrary contexts C 1 = {e 1 , . . .e n } and C 2 = {f 1 , . . .f m }.The conditional probabilities P (f j |e i ), with 1 ≤ j ≤ m and 1 ≤ i ≤ n, which alternatively can be considered as either measuring the Bayesian degree of reasonable expectation representing a state of knowledge or as quantification of a personal belief 12 or the frequency of occurrence of "f j given e i ", can be arranged into a (n × m)-matrix whose entries are P (f j |e i ), that is, Assume as axiom the following criterion: the conditional probabilities of the elements of the second context with respect to an arbitrary element e k ∈ C 1 of the first context C 1 are non-negative, additive, and, if this sum is extended over the entire second context C 2 , adds up to one: ( That is, the row sum taken within every single row of [P (C 2 |C 1 )] adds up to one.This presents a generalization of Kolmogorov's axioms, as it allows cases in which both contexts do not coincide.It just reduces to the classical axioms for single contexts if, instead of a single element e k ∈ C 1 of the first context C 1 , the union of elements of this entire context C 1 -and thus the tautology ei∈C1 e i -is inserted into (2).
We shall mostly be concerned with cases for which n = m; that is, the associated matrix is a row (aka right) stochastic (square) matrix.Formally, such a matrix A has nonnegative entries a ij ≥ 0 for i, j = 1, . . ., n whose row sums add up to one: n j=1 a ij = 1 for i = 1, . . ., n. If, in addition to the row sums, also the column sums add up to one -that is, if It is instructive to ponder why intuitively those conditional probabilities should be arranged in right-but not in bistochastic matrices.Suppose a (physical or another model) system is in a state characterized by some element e j ∈ C 1 of the first context C 1 .Then, if one takes the (union of elements of the) entire other context C 2 -thereby exhausting all possible outcomes of the second "view" -the conditional probability for this system to be in any element of C 2 given e j ∈ C 1 should add up to one because this includes all that can be (aka happen or exist) with respect to the second "view".Indeed, if this conditional probability would not add up to one, say if it adds up to something strictly smaller or larger than one, then either some elements would be missing in, or be "external" to, the context C 2 , which cannot occur since by assumption contexts are "maximal".
On the other hand, if a particular element f i ∈ C f of the second context C 2 remains fixed and the column sum ej ∈C2 P (f i |e j ) extends over all e j ∈ C 2 then there is no convincing reason why this column sum should add up to one.Indeed, as will be argued later, while quantum mechanics results in bistochastic matrices, generalized urn models resulting in partitions of (hidden) variables that will not induce bistochasticity.

Cauchy's functional equation encoding additivity
One way of looking at generalized global probabilities from "stitching" local classical Kolmogorov probabilities is to maintain the essence of the axioms -namely positivity, probability one (aka certainty) for tautologies, and, in particular, additivity.Additivity requires that, for mutually exclusive compatible events c i and c j within a given context, their probabilities can be expressed in terms of Cauchy-type functional equation P (c i ) + P (c j ) = P (c i ∪ c j ).With "reasonable" side assumptions, this amounts to the linearity of probabilities in the argument. 13,14][17][18][19][20] The general case may involve other, hitherto unknown, arguments besides scalars and entities related to vector (or Hilbert) spaces.The discussion will not be extended to potential inputs and sources for generalized probabilities as the main interest is in developing a generalizing probability theory in the multi-context setting, but clearly these questions remain pertinent.

Examples of application of the generalized Kolmogorov axioms 4.1. Quantum bistochasticity
The multi-context quantum case has been studied in great detail with emphasis on motivating and deriving the Born rule 21,22 from elementary foundations.Recall that a context has been defined as the "largest" or "maximal" domain of both mutually exclusive as well as simultaneously measurable, compatible observables.In quantum mechanics "simultaneously measurability" transforms into compatibility and commutativity; that is, such observables are not complementary and can be jointly measured without restrictions."Mutual exclusivity" is defind in terms of orthogonality of the respective observables.The spectral theorem asserts mutual orthogonality of unit eigenvectors |e i and the associated orthogonal projection operators E i formed by the dyadic product E i = |e i e i |.A context can be equivalently represented by (i) an orthonormal basis, (ii) the respective one-dimensional orthogonal projection operators associated with the basis elements, or (iii) a single maximal operator (aka maximal observable) whose spectral sum is non-degenerate. 9,23n essential assumption entering Gleason's derivation 6 of the Born rule for quantum probabilities is the validity of classical probability theory whenever the respective observables are compatible.Formally, this amounts to the validity of Kolmogorov probability theory for mutually commuting observables; and in particular, to the assumption of Kolmogorov's axioms within contexts.
Already Gleason pointed out 6 that it is quite straightforward to find an ad hoc probability satisfying this aforementioned assumption, which is based on the Pythagorean property: suppose (i) a quantized system is in a pure state |ψ formalized by some unit vector, and (ii) some "measurement frame" formalized by an orthonormal basis C = {|e 1 , . . ., |e n }.Then the probabilities of outcomes of observable propositions associated with the orthogonal projection operators formed by the dyadic products |e i e i | of the vectors of the orthonormal basis can be obtained by taking the absolute square of the length of those projections of |ψ onto |e i along the remaining basis vectors, which amounts to taking the scalar products | ψ|e i | 2 .Since the vector associated with the pure state as well as all the vectors in the orthonormal system are of length one, and since these latter vectors (of the orthonormal system) are mutually orthogonal, the sum all these terms, taken over all the basis elements, needs to add up to one.The respective absolute squares are bounded between zero and one.In effect, the orthonormal basis "grants a view" of the pure quantum state.The absolute square can be rewritten in terms of a trace (over some arbitrary orthonormal basis) into the standard form known as the Born rule of quantum probabilities: , where E ψ and E i are the orthogonal projection operators representing the state |ψ and the (unit) vectors of the orthonormal basis |e i , respectively, and C ′ = {|g 1 , . . ., |g n } is an arbitrary orthonormal basis, so that a resolution of the identity is It is also well known that, at least from a formal perspective, unit vectors in quantum mechanics serve a dual role: On the one hand, they represent pure states.On the other hand, by the associated one-dimensional orthogonal projection operator, they represent an observable: the proposition that the system is in such a pure state. 24,25Suppose now that we exploit this dual role by expanding the pure prepared state into a full orthonormal basis, of which its vector must be an element.(For dimensions greater than two such an expansion will not be unique as there is a continuous infinity of ways to achieve this.)Once the latter basis is fixed it can be used to obtain a "view" on the former (measurement) basis; and a completely symmetric situation/configuration is attained.We might even go so far as to say that which basis is associated with the "observed object" and with the "measurement apparatus," respectively, is purely a matter of convention and subjective perspective.
Therefore, as has been pointed out earlier, an orthogonal projection operator serves a dual role: on the one hand it is a formalization of a dichotomic observable -more precisely, an elementary yes-no proposition E = |x x| associated with the claim that "the quantized system is in state |x .And on the other hand it is the formal representation of a pure quantum state |y , equivalent to the operator F = |y y|.By the Born rule the conditional probabilities are symmetric with respect to exchange of |x and |y : let C ′ = {|g 1 , . . ., |g n } be some arbitrary orthonormal basis of C n , then P (E|F) = Trace (EF) = Trace (FE) = P (F|E); or, more explicitly, . Therefore, the respective conditional probabilities form a doubly stochastic (bistochastic) square matrix.This result is a special case of a more general result on quadratic forms on the set of eigenvectors of normal operators. 26onsider two orthonormal bases aka two contexts.Their respective conditional probabilities can be arranged into a matrix form: The ith row jth column component corresponds to the conditional probability associated with the probability of occurrence of the jth element (observable) of the second context, given the ith element (observable) of the first context.By taking into account that cyclically interchanging factors inside a trace does not change its value this matrix needs to be not only row (right) stochastic but doubly stochastic (bistochastic); 21,22 that is, the sum is taken within every single row and every single column adds up to one.

Quasi-classical partition logics
In what follows we shall study sets of partitions of a given set.They have models 27 based (i) on the finite automata initial state identification problem 28 as well as (ii) on generalized urns. 29,30Partition logics are quasiclassical and value-definite in so far as they allow a separating set of "classical" two-valued states [9, Theorem 0]; and yet they feature complementarity.Many of these logics are doubles of quantum logics, such as for spin-state measurements; and thereby their graphs also allow faithful orthogonal representations; 31 and yet some of them have no quantum analog.Therefore, they neither form a proper subset of all quantum logics nor do they contain all logical structures encountered in quantum logics (they are neither continuous nor can they have a non-separating or nonexisting set of two-valued states).However, partition logics overlaps significantly with quantum logics, as they bear strong similarities with the structures arising in quantum theory.
If some (partition) logic which is a pasting [32][33][34] of contexts has a sepa-rating set of two-valued states [9, Theorem 0] then there is a constructive, algorithmic 35 way of finding a "canonical" partition logic, 27 and, associated with it, all classical probabilities on it: first, find all the two-valued states on the logic, and assign consecutive number to these states.Then, for any atom (element of a context), find the index set of all two-valued states which are 1 on this atom.Associate with each one, say, the ith, of the two valued states a nonnegative weight i → λ i , and require that the (convex) sum of these weights i λ i = 1 is 1.Since all two-valued states are included, the Kolmogorov axioms guarantee that the sum of measures/weights within each of the contexts in the logic exactly adds up to one.It will be argued that in this case, and unlike for quantum conditional probabilities, the conditional probabilities, in general, do not form a bistochastic matrix.

Two non-intertwining two-atomic contexts
In the Babylonian spirit [36, p. 172] consider some anecdotal examples which have quantum doubles.The first one will be analogous to a spin-1 2 state measurement.

Two intertwining three-atomic contexts
The L 12 "firefly" logic depicted in Fig. 2 labels the atoms (aka elementary propositions) obtained by an "inverse construction" using all five two-valued measures thereon. 27,38By design, it will be very similar to the earlier logic with four atoms.With the identifications e 1 ≡ {1, 2}, e 2 ≡ {3, 4}, e 3 = f 3 ≡ {5}, f 1 ≡ {1, 3}, and f 2 ≡ {2, 4} we obtain all classical probabilities by identifying i → λ i > 0. The respective conditional probabilities are as well as The conditional probabilities of the firefly logic, as depicted in Fig. 2(a), and enumerated in Eq. ( 6) form a right stochastic matrix.As mentioned earlier, given any particular outcome f i of the second context corresponding to some respective row in the matrix (6), the row-sum of the conditional probabilities of all the conceivable mutually exclusive outcomes of the first context C 1 = {e 1 , e 2 , e 3 } must be one.However, the "transposed" statement is not true: the column-sum of the conditional probabilities of a particular element e j with respect to all the mutually exclusive outcomes of the second context C 2 = {f 1 , f 2 , f 3 }, needs not be one.Take, for example, the singular distribution case such that λ 1 = 1, and therefore, by positivity and convexity, λ i =1 = 0, that is, λ 2 = λ 3 = λ 4 = λ 5 = 0.This configuration, depicted in Fig. 2(c), results in the following, partial (undefined components are indicated by the symbol " 0 0 ") right stochastic matrix (7) derived from (6): In such a case, in terms of, say, a generalized urn model, the observable proposition {2, 4} associated with the plaintext "looked upon in the first color (in this case blue), the ball drawn from the urn shows the symbols 2 or 4" will never occur; regardless of which ball type associated with the other context {1, 2}, {3, 4}, or {5} one would have (counterfactually) drawn because the generalized urn is only loaded with balls of one type, namely the first type, with the symbol "{1, 2}" painted on them in the first color, and the symbols "{1, 3}" painted on them in the second color.(Instead of labels indicating the elements of the partition one may choose other symbols, such as {1, 3} ≡ a ≡ {1, 2}, {2, 4} ≡ b ≡ {3, 4}, and c ≡ {5} in the respective colors. 27,39) Ultimately one may say that it is the discontinuity of the two-valued measures which "prevents" the quasiclassical conditional probabilities to be arranged in a bistochastic matrix.A similar quantum realization could, for instance, be obtained by the three-dimensional faithful orthogonal representation 37 {1, 2} ≡ 1, 0, 0 ⊺ , {3, 4} ≡ 0, 1, 0 ⊺ , {5} ≡ 0, 0, 1

Different intrinsically operational state preparation
A different approach to partition logic would be to insist that only intrinsical -that is, for any embedded observer having access to means and methods available "from within" the system -operational state preparations should be allowed.In such a scenario it is operationally impossible for an observer with access to only one context -in the generalized urn model only one color -to single out the particular type of two-valued measure (aka ball).Thereby effectively any state preparation is reduced to the elements of the partition in the respective context (aka color).
Therefore, in the earlier firefly model depicted in Fig. 2, the intrinsic operational resolution is among the subsets resulting from the unions of two-valued states in {1, 2}, {3, 4}, and {5} in the first context (aka color); and among {1, 3}, {2, 4}, and {5} in the second context (aka color), as opposed to the single two-valued state discussed earlier in.Stated differently, an observer accessing a generalized urn in the first context (aka color) is not capable to differentiate between the first and the second two-valued measure (aka ball type), and would produce a mixture among them if asked to prepare the state {1, 2}.Similarly, the observer would not be able to differentiate between the third and the fourth two-valued measure (aka ball type), and would thus produce a mixture between those when preparing the state {3, 4}.However, the ball type {5} is recognized and prepared without ambiguity.Indeed, if one assumes equidistribution (uniform mixtures [40, Assumption 1]) of measures (aka ball types), a very similar situation as in quantum mechanics [cf Fig. 2(d), Eq. ( 8)] would result as 5 and one would thus "recover" the matrix in Eq. ( 8).
Pointedly stated there is an epistemic issue of state preparation: if one demands that the state has to be prepared by the distinctions accessible from a single context (aka color in the generalized urn model), then there is no way to prepare or access "ontologic states", say, selecting balls of type 1 (first two-valued measure) only.The difference is subtle: in the "ontic" state case one can resolve (and has access to) every single two-valued measure (aka ball type).In the "epistemic," intrinsic, operational state case one is limited to the operational procedures available -for example, one cannot "take off the colored glasses" in Wright's generalized urn model.That is, the resolution of balls is limited to whatever types can be differentiated in that color.
Whenever such a scenario is considered the respective matrices representing all conditional probabilities may be very different from the previous scenarios.Indeed, one may suspect that, with the assumption of preservation of equidistributed uniform mixtures across context changes, the respective matrices are bistochastic (at least for equidistributed urns) because of a certain type of "epistemic continuity:" the sum of the conditional probabilities for any particular outcome of the second context, relative to all other outcomes of the first context, should add up to unity.Despite the aforementioned 11 two-valued states there exists another dispersionless state on cyclic pastings of an odd number of contexts; namely, a state being equal to 1 2 on all intertwines/bi-connections. 29,41This state and its associated probability distribution are neither realizable by quantum nor by classical probability distributions.In this case the conditional probabilities of any two distinct contexts C i and C j , for 1 ≤ i, j ≤ 5 are (10)

Three-colorable dense points on the sphere
There exist dense subsets of the unit sphere in three dimensions which require just three colors for associating different colors within every mutually orthogonal triple of (unit) vectors [42][43][44] forming an orthonormal basis.By identifying two of these colors with the value "0", and the remaining color with the value "1" one obtains a two-valued measure on this "reduced" sphere.The resulting conditional probabilities are discontinuous.

Extrema of conditional probabilities in row and doubly stochastic matrices
The row stochastic matrices representing conditional probabilities form a polytope in R n 2 whose vertices are the n n matrices T i , i = 1, . . ., n n , with exactly one entry 1 in each row [45, p. 49].Therefore, a row stochastic matrix can be represented as the convex sum n n i=1 λ i T i , with nonnegative λ i ≥ 0 and n n i=1 λ i = 1.For conditional probabilities yielding doubly stochastic matrices, such as, for instance, the quantum case, the Birkhoff theorem 26 yields more restricted linear bounds: it states that any doubly stochastic (n×n)-matrix is the convex hull of m ≤ (n − 1) 2 + 1 ≤ n! permutation matrices.That is, if A ≡ a ij is a doubly stochastic matrix such that a ij ≥ 0 and n i=1 a ij = n i=1 a ji = 1 for 1 ≤ i, j ≤ n, then there exists a convex sum decomposition A = m≤(n−1) 2 +1≤n!k=1 λ k P k in terms of m ≤ (n − 1) 2 + 1 linear independent permutation matrices P k such that λ k ≥ 0 and m≤(n−1) 2 +1≤n!k=1 λ k = 1.

Summary
I have attempted to sketch a generalized probability theory for multicontext configurations of observables which may or may not be embeddable into a single classical Boolean algebra.Complementarity and distinct contexts require an extension of the Kolmogorov axioms.This has been achieved by an additional axiom ascertaining that the conditional probabilities of observables in one context, given the occurrence of observables in another context, form a stochastic matrix.Various models have been discussed.In the case of doubly stochastic matrices, linear bounds have been derived from the convex hull of permutation matrices.

Fig. 1 .
Fig. 1.Greechie orthogonality diagram of a logic consisting of two nonintertwining contexts.(a) The associated (quasi)classical partition logic representations obtained by an inverse construction using all two-valued measures thereon; 27 (b) a faithful orthogonal representation 37 rendering a quantum double.

Fig. 2 .
Fig. 2. Greechie orthogonality diagram of the L 12 "firefly" logic.(a) The associated (quasi)classical partition logic representation obtained through in inverse construction using all two-valued measures thereon; 27 (b) a faithful orthogonal representation 37 rendering a quantum double; (c) "classical" two-valued measure number 1; (d) a pure quantum state prepared as 1, 0, 0 ⊺ .A red square and gray and green circles indicate value assignments 1, 1 2 and 0, respectively.

[
P (C i |C j )]