# Maximum Entropy and Probability Kinematics Constrained by Conditionals

## Abstract

**:**

## 1. Introduction

## 2. Jeffrey’s Updating Principle and the Principle of Maximum Entropy

[pme] Keep the information entropy of your probability distribution maximal within the constraints that the evidence provides (in the synchronic case), or your cross-entropy minimal (in the diachronic case).

[jup] In a diachronic updating process, keep the ratio of probabilities constant as long as they are unaffected by the constraints that the evidence poses.

## 3. Jeffrey Conditioning

_{j}}

_{j=1}, …, n, a partition of Ω. Let κ be an m × n matrix for which each column contains exactly one 1, otherwise 0. Let P = P

_{prior}and $\widehat{P}={P}_{\mathrm{posterior}}.$ Then {ω

_{i}}i=1, …, m, for which

_{ij}= 0, ${\theta}_{ij}^{*}={\theta}_{j}$ otherwise. Let β be the vector of prior probabilities for {θ

_{j}}

_{j}

_{=1,}…, n(P (θ

_{j}) = β

_{j}) and $\widehat{\beta}$ the vector of posterior probabilities $(\widehat{P}({\theta}_{j})={\widehat{\beta}}_{j})$; likewise for α and $\widehat{\alpha}$ corresponding to the prior and posterior probabilities for {ω

_{i}}

_{i}

_{=1},…, m, respectively.

_{i}do not range over the θ

_{j}. In the meantime, here is an example to illustrate (2).

A token is pulled from a bag containing 3 yellow tokens, 2 blue tokens, and 1 purple token. You are colour blind and cannot distinguish between the blue and the purple token when you see it. When the token is pulled, it is shown to you in poor lighting and then obscured again. You come to the conclusion based on your observation that the probability that the pulled token is yellow is 1/3 and that the probability that the pulled token is blue or purple is 2/3. What is your updated probability that the pulled token is blue?

## 4. Wagner Conditioning

You encounter the native of a certain foreign country and wonder whether he is a Catholic northerner (θ_{1}), a Catholic southerner (θ_{2}), a Protestant northerner (θ_{3}), or a Protestant southerner (θ_{4}). Your prior probability p over these possibilities (based, say, on population statistics and the judgment that it is reasonable to regard this individual as a random representative of his country) is given by p(θ_{1}) = 0.2, p(θ_{2}) = 0.3, p(θ_{3}) = 0.4, and p(θ_{4}) = 0.1. The individual now utters a phrase in his native tongue which, due to the aural similarity of the phrases in question, might be a traditional Catholic piety (ω_{1}), an epithet uncomplimentary to Protestants (ω_{2}), an innocuous southern regionalism (ω_{3}), or a slang expression used throughout the country in question (ω_{4}). After reflecting on the matter you assign subjective probabilities u(ω_{1}) = 0.4, u(ω_{2}) = 0.3, u(ω_{3}) = 0.2, and u(ω_{4}) = 0.1 to these alternatives. In the light of this new evidence how should you revise p? (See [18] (p.252) and [22] (p197).)

## 5. A Natural Generalization of Jeffrey and Wagner Conditioning

_{j}}

_{j}

_{=1,}…, n and, …, m now refer to independent partitions of Ω, i.e., (1) need not be true. Besides the marginal {ω

_{i}}

_{i}

_{=1}probabilities P (θ

_{j}) = β

_{j}, $\widehat{P}({\theta}_{j})={\widehat{\beta}}_{j}$, P (ω

_{i}) = α

_{i}, $\widehat{P}({\omega}_{i})={\widehat{\alpha}}_{i}$, we therefore also have joint probabilities μ

_{ij}= P (ω

_{i}∩ θ

_{j}) and ${\widehat{\mu}}_{ij}=\widehat{P}({\omega}_{i}\cap {\theta}_{i})$.

_{mj})

_{j}

_{=1},…, n is special because it represents the probability of ω

_{m}, which is the negation of the events deemed possible after the observation. In the Linguist problem, for example, ω

_{5}is the event (initially highly likely, but impossible after the observation of the native’s utterance) that the native does not make any of the four utterances. The native may have, after all, uttered a typical Buddhist phrase, asked where the nearest bathroom was, complimented your fedora, or chosen to be silent. κ will have all 1s in the last row. Let ${\widehat{\kappa}}_{ij}={\kappa}_{ij}$ for i=1, …, m − 1 and j = 1, …, n; and ${\widehat{\kappa}}_{mj}=0$ for j = 1, …, n. κ^ equals κ except that its last row are all 0s, and ${\widehat{\alpha}}_{m}=0$. Otherwise the 0s are distributed over κ (and equally over $\widehat{\kappa}$) so that no row and no column has all 0s, representing the logical relationships between the ω

_{i}s and the θ

_{j}s (κ

_{ij}= 0 if and only if $\widehat{P}({\omega}_{i}\cap {\theta}_{j})={\mu}_{ij}=0$). We set $P({\omega}_{m})=x(\widehat{P}({\omega}_{m})=0)$, where x depends on the specific prior knowledge. Fortunately, the value of x cancels out nicely and will play no further role. For convenience, we define

_{m}= 1 and ζ

_{i}= 0 for i ≠ m. The best way to visualize such a problem is by providing the joint probability matrix M = (μ

_{ij}) together with the marginals α and β in the last column/row, here for example as for the Linguist problem with m = 5 and n = 4 (note that this is not the matrix M, which is m × n, but M expanded with the marginals in improper matrix notation):

_{ij}≠ 0 where κ

_{ij}= 1. Ditto, mutatis mutandis, for $\widehat{M}$, $\widehat{\alpha}$, $\widehat{\beta}$. To make this a little less abstract, Wagner’s Linguist problem is characterized by the triple (κ, β, $\widehat{\alpha}$),

_{i}= e

^{ζiλm}, s

_{j}= e

^{−}

^{1}

^{−ξj}, ${\widehat{r}}_{i}={e}^{-1-{\widehat{\mathrm{\lambda}}}_{i}}$ represent factors arising from the Lagrange multiplier method (ζ was defined in (5)) operator ◦ is the entry-wise Hadamard product in linear algebra. r, s, $\widehat{r}$ are the vectors containing the r

_{i}, s

_{j}, ${\widehat{r}}_{i}$, respectively. R, S, $\widehat{R}$ are the diagonal matrices with R

_{il}= r

_{i}δ

_{il}, S

_{kj}= s

_{j}δ

_{kj}, ${\widehat{R}}_{il}={\widehat{r}}_{i}{\delta}_{il}$ (δ is Kronecker delta).

## 6. Conclusion

## Conflicts of Interest

## A. Appendix: PME generalizes Jeffrey Conditioning

#### A.1. Standard Conditioning

_{i}(all y

_{i}≠ 0) be a finite type II prior probability distribution summing to 1, i ∈ I. Let ŷ

_{i}be the posterior probability distribution derived from standard conditioning with ŷ

_{i}= 0 for all i ∈ I′ and ŷ

_{i}≠ 0 for all i ∈ I″, I′∪I″ = I. I′ and I″ specify the standard event observation. Standard conditioning requires that

_{i}sum to 1. The Lagrange function is (writing in vector form ŷ = (ŷ

_{i})

_{i}

_{∈}

_{I}″)

_{i}and setting the result to zero gives us

#### A.2. Jeffrey Conditioning

_{i}, i = 1, …, n and ω

_{j}, j = 1, …, m be finite partitions of the event space with the joint prior probability matrix (y

_{ij}) (all y

_{ij}≠ 0). Let κ be defined as in Section 3, with (1) true (remember that in Section 5, (1) is no longer required). Let P be the type II prior probability distribution and $\widehat{P}$ the posterior probability distribution.

_{ij}be the posterior probability distribution derived from Jeffrey conditioning with

_{ij}), the Lagrange function is (writing in vector form ŷ = (x

_{11}, …, x

_{n}

_{1}, …, x

_{nm})

^{⊤}and λ = (λ

_{1}, …, λ

_{m})

^{⊤})

_{j}normalized by

## References

- Jeffrey, R. The Logic of Decision; Gordon and Breach: New York, NY, USA, 1965. [Google Scholar]
- Majerník, V. Marginal Probability Distribution Determined by the Maximum Entropy Method. Rep. Math. Phys.
**2000**, 45, 171–181. [Google Scholar] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: Hoboken, NJ, USA, 2006; Volume 6. [Google Scholar]
- Debbah, M.; Müller, R. MIMO Channel Modeling and the Principle of Maximum Entropy. IEEE Trans. Inf. Theory
**2005**, 51, 1667–1690. [Google Scholar] - Van Fraassen, B.; Hughes, R.I.G.; Harman, G. A Problem for Relative Information Minimizers, Continued. Br. J. Philos. Sci.
**1986**, 37, 453–463. [Google Scholar] - Jaynes, E.T. Optimal Information Processing and Bayes’s Theorem: Comment. Am. Stat.
**1988**, 42, 280–281. [Google Scholar] - Zellner, A. Optimal Information Processing and Bayes’s Theorem. Am. Stat.
**1988**, 42, 278–280. [Google Scholar] - Palmieri, F.; Domenico, C. Objective Priors from Maximum Entropy in Data Classification. Inf. Fusion.
**2013**, 14, 186–198. [Google Scholar] - Shannon, C. A Mathematical Theory of Communication. Bell Syst. Tech. J.
**1948**, 27. [Google Scholar] - Kullback, S. Information Theory and Statistics; Dover: London, UK, 1959. [Google Scholar]
- Kullback, S.; Leibler, R. On Information and Sufficiency. Ann. Math. Stat.
**1951**, 22, 79–86. [Google Scholar] - Guia¸su, S. Information Theory with Application; McGraw-Hill: New York, NY, USA, 1977. [Google Scholar]
- Seidenfeld, T. Entropy and Uncertainty. In Advances in the Statistical Sciences: Foundations of Statistical Inference; Springer: Berlin, Germany, 1986; pp. 259–287. [Google Scholar]
- Kampé de Fériet, J.; Forte, B. Information et probabilité. Comptes rendus de l’Académie des sciences
**1967**, A 265, 110–114. [Google Scholar] - Ingarden, R.S.; Urbanik, K. Information Without Probability. Colloq. Math.
**1962**, 9, 131–150. [Google Scholar] - Khinchin, A. Mathematical Foundations of Information Theory; Dover: New York, NY, USA, 1957. [Google Scholar]
- Kolmogorov, A. Logical Basis for Information Theory and Probability Theory. IEEE Trans. Inf. Theory
**1968**, 14, 662–664. [Google Scholar] - Wagner, C. Generalized Probability Kinematics. Erkenntnis
**1992**, 36, 245–257. [Google Scholar] - Teller, P. Conditionalization and Observation. Synthese
**1973**, 26, 218–258. [Google Scholar] - Howson, C.; Franklin, A. Bayesian Conditionalization and Probability Kinematics. Br. J. Philos. Sci.
**1994**, 45, 451–466. [Google Scholar] - Wagner, C. Probability Kinematics and Commutativity. Phil. Sci.
**2002**, 69, 266–278. [Google Scholar] - Spohn, W. The Laws of Belief: Ranking Theory and Its Philosophical Applications; Oxford University: Oxford, UK, 2012. [Google Scholar]
- Dempster, A. Upper and Lower Probabilities Induced by a Multi-Valued Mapping. Ann. Math. Stat.
**1967**, 38, 325–339. [Google Scholar] - Jaynes, E.T. Information Theory Statistical Mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] - Csiszár, I. Information-Type Measures of Difference of Probability Distributions and Indirect Observations. Stud. Sci. Math. Hung.
**1967**, 2, 299–318. [Google Scholar] - Paris, J. The Uncertain Reasoner’s Companion: A Mathematical Perspective; Cambridge University Press: Cambridge, UK, 2006. [Google Scholar]
- Caticha, A.; Giffin, A. Updating Probabilities. Proceedings of MaxEnt 2006, the 26th International Workshop on Bayesian Inference and Maximum Entropy Methodsin Science and Engineering, CNRS, Paris, France, 8–13 July 2006; University at Albany: Albany, NY, USA, 2006. [Google Scholar]
- Friedman, K.; Abner, S. Jaynes’s Maximum Entropy Prescription and Probability Theory. J. Stat. Phys.
**1971**, 3, 381–384. [Google Scholar] - Skyrms, B. Updating, Supposing, and Maxent. Theory Decis.
**1987**, 22, 225–246. [Google Scholar] - Uffink, J. Can the Maximum Entropy Principle Be Explained as a Consistency Requirement? Stud. Hist. Philos. Sci.
**1995**, 26, 223–261. [Google Scholar] - Walley, P. Statistical Reasoning with Imprecise Probabilities; Chapman and Hall: London, UK, 1991. [Google Scholar]
- Halpern, J. Reasoning About Uncertainty; MIT: Cambridge, MA, USA, 2003. [Google Scholar]
- Joyce, J. A Defense of Imprecise Credences in Inference and Decision Making. Phil. Perspect.
**2010**, 24, 281–323. [Google Scholar] - Jaynes, E.T. Where Do We Stand on Maximum Entropy. In The Maximum Entropy Formalism; Levine, R.D., Tribus, M., Eds.; MIT: Cambridge, MA, USA, 1978; pp. 15–118. [Google Scholar]
- Williams, P. Bayesian Conditionalisation and the Principle of Minimum Information. Br. J. Philos. Sci.
**1980**, 31, 131–144. [Google Scholar] - Zubarev, D.; Vladimir, M.; Gerd, R. Statistical Mechanics of Nonequilibrium Processes; Akademie: Berlin, Germany, 1996. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lukits, S.
Maximum Entropy and Probability Kinematics Constrained by Conditionals. *Entropy* **2015**, *17*, 1690-1700.
https://doi.org/10.3390/e17041690

**AMA Style**

Lukits S.
Maximum Entropy and Probability Kinematics Constrained by Conditionals. *Entropy*. 2015; 17(4):1690-1700.
https://doi.org/10.3390/e17041690

**Chicago/Turabian Style**

Lukits, Stefan.
2015. "Maximum Entropy and Probability Kinematics Constrained by Conditionals" *Entropy* 17, no. 4: 1690-1700.
https://doi.org/10.3390/e17041690