# Foundations of Inference

^{1}

^{2}

^{*}

## Abstract

**:**

**PACS**02.50.Cw

**MSC**06A05

## 1. Introduction

## 2. Setting the Scene

**state**out of a finite set of mutually exclusive states (as in Figure 1, left). Since we and our tools are finite, a finite set of states, albeit possibly very large in number, suffices for all practical modeling.

**atoms**. Atoms are combined through logical

`OR`to form compound statements comprising the

**elements**of a

**Boolean lattice**(Figure 1, right), which is isomorphic to a Boolean lattice of sets (Figure 1, center). Although carrying different interpretations, the mathematical structures are identical. Set inclusion “⊂” is equivalent to logical implication “⇒”, which we abstract to lattice order “<”. It is a matter of choice whether to include the null set ∅, equivalent to the logical absurdity ⊥. The set-based view is ontological in character and associated with Kolmogorov, while the logic-based view is epistemological in character and associated with Cox.

**Figure 1.**The Boolean lattice of potential states (

**center**) is constructed by taking the ${2}^{N}$ powerset of an antichain of N mutually exclusive atoms (in this case ${a}_{1},{a}_{2},{a}_{3}$,

**left**). This lattice is isomorphic to the Boolean lattice of logical statements ordered by logical implication (

**right**).

**valuation**, to elements $\mathtt{x}$. (

`Typewriter`font denotes lattice elements $\mathtt{x}$, whereas their associated valuations (real numbers) x are shown in italic.) We require valuations to be faithful to the lattice, in the sense that

**measure theory**, and the generalization of combination (of disjoint elements) to the lattice join (of arbitrary elements) is straightforward. The wide applicability of these underlying symmetries explains the wide utility of measure theory, which might otherwise be mysterious.

Operation | Symbol | Quantification | (Eventual form) |
---|---|---|---|

ordering | < | < | |

combination | ⊔ | ⊕ | (addition) |

direct product | × | ⊗ | (multiplication) |

chaining | , | ⊙ | (multiplication) |

**direct-product**operator ⊗ quantifies the composition of values:

**divergence**of measure $\mathbf{p}$ from measure $\mathbf{q}$.

**probability**, to predicate–context

**intervals**$[\mathtt{x},\mathtt{t}]$. Such intervals can be

**chained**(concatenated) so that $[\mathtt{x},\mathtt{z}]=\left[\right[\mathtt{x},\mathtt{y}],[\mathtt{y},\mathtt{z}\left]\right]$, with ⊙ representing the chaining of values.

**product rule**of probability calculus. When applied to probabilities, the divergence formula reduces to the

**information**, also known as the Kullback–Leibler formula, with

**entropy**being a variant.

#### 2.1. The Order-Theoretic Perspective

**antichain**, illustrated in Figure 1(left) as three states ${a}_{1}$, ${a}_{2}$, and ${a}_{3}$ situated side-by-side. Our state of knowledge about the world (more precisely, of our model of it—we make no ontological claim) is often incomplete so that we can at best say that the world is in one of a set of potential

**states**, which is a subset of the set of all possible states. In the case of total ignorance, the set of potential states includes all possible states. In contrast, perfect knowledge about our model is represented by singleton sets consisting of a single state. We refer to the singleton sets as

**atoms**, and note that they are exclusive in the sense that no two can be true.

**Boolean lattice**(Figure 1, center), with the bottom element optional. By conceiving of a

**statement**about our model of the world in terms of a set of potential states, we have an order-isomorphism from the Boolean lattice of potential states ordered by set inclusion to the Boolean lattice of statements ordered by logical implication (Figure 1, right). This isomorphism maps each set of potential states to a statement, while mapping the algebraic operations of set union ∪ and set intersection ∩ to the logical

`OR`and

`AND`, respectively.

**join**∨ and

**meet**∧. This immediately broadens the scope from Boolean to more general

**distributive lattices**— the first fruit of our minimalist approach. For additional details on partially ordered sets and lattices in particular, we refer the interested reader to the classic text by Birkhoff [3] or the more recent text by Davey & Priestley [4].

**fidelity**. The converse is not true: the total order imposed by quantification must be consistent with but can extend the partial order of the lattice structure.

**join**lattice operator ∨ for elements that (possibly having atoms in common) need not be disjoint, for which the sum rule generalizes to its standard inclusion/exclusion form [5], which involves the meet ∧ for any atoms in common.

**measure theory**.

**Logical deduction**is traditionally based on a Boolean lattice and proceeds “upwards” along a chain (as in the arrows sketched in Figure 1). Given some statement $\mathtt{x}$, one can deduce that $\mathtt{x}$ implies $\mathtt{x}\mathtt{OR}\mathtt{y}$ since $\mathtt{x}\mathtt{OR}\mathtt{y}$ includes $\mathtt{x}$. Similarly, $\mathtt{x}\mathtt{AND}\mathtt{y}$ implies $\mathtt{x}$ since $\mathtt{x}$ includes $\mathtt{x}\mathtt{AND}\mathtt{y}$. The ordering relationships among the elements of the lattice are encoded by the zeta function of the lattice [6]

**Inference**, or

**logical induction**, is the inverse of deduction and proceeds “downwards” along a chain, losing logical certainty as knowledge fragments. Our aim is to quantify this loss of certainty, in the expectation of deriving probability calculus. This requires generalization of the binary zeta function $\zeta (\mathtt{x},\mathtt{y})$ to some real-valued function $p(x\mid y)$ which will turn out to be the standard probability of x

`GIVEN`y. However, a firm foundation for inference must be devoid of a choice of arbitrary generalizations. By viewing quantification in terms of an order-preserving map between the partial order (Boolean lattice) and a total order (chain) subject to compelling symmetries alone, we obtain a firm foundation for inference, devoid of further assumptions of questionable merit.

**hypothesis space**of all possible statements that one can make about a particular model of the world. Quantification of join using + is the

**sum rule**of probability calculus, and is required by adherence to the symmetries we list. It fixes the valuations assigned to composite elements in terms of valuations assigned to the atoms. Those latter valuations assigned to the atoms remain free, unconstrained by the calculus. That freedom allows the calculus to apply to inference in general, with the mathematically-arbitrary atom valuations being guided by insight into a particular application.

#### 2.2. Commentary

## 3. Symmetries

**fidelity**.

**Figure 2.**One system might, for example, be playing-card suits $\mathtt{x}\in \{\u2660,\u2661,\u2663,\u2662\}$, while another independent system might be music keys $\mathtt{t}\in \{\u266d,\u266e,\u266f\}$. The direct-product combines the spaces of$\mathtt{x}$ and $\mathtt{t}$ to form the joint space of$\mathtt{x}\times \mathtt{t}$ with atoms like $\u2661\times \u266e$.

## 4. Axioms

## 5. Measure

#### 5.1. Disjoint arguments

#### 5.2. Arbitrary Arguments

`OR`and ∧ is logical

`AND`.

#### 5.3. Independence

## 6. Variation

**bold-face**font.)

#### 6.1. Divergence and Distance

## 7. Probability Calculus

#### 7.1. Chained Arguments

#### 7.2. Arbitrary Arguments

**probability**, hereafter denoted Pr.

#### 7.3. Probability as a Ratio

**generalized intervals**, consisting of arbitrary pairs $[\mathtt{x},\mathtt{t}]$ which need not be in a chain. The bivaluation form Equation 53 still holds but now represents a general

**degree of implication**between arbitrary elements.

## 8. Information and Entropy

#### 8.1. Information

#### 8.2. Entropy

- S is a continuous function of its arguments.
- If there are n equal choices, so that ${p}_{k}=1/n$, then S is monotonically increasing in n.
- If a choice is broken down into subsidiary choices, thenS adds according to probabilistic expectation, meaning$S({p}_{1},{p}_{2},{p}_{3})=S({p}_{1},{p}_{2}+{p}_{3})+({p}_{2}+{p}_{3})S({p}_{2},{p}_{3})$.

## 9. Conclusions

#### 9.1. Summary

#### 9.2. Commentary

`NOT`), which in applications other than inference may well not be present. Neither have we assumed any additive or multiplicative behavior (as did Kolmogorov [2], de Finetti [12], and Dupré & Tipler [13]). On the contrary, we find that sum and product rules follow from elementary symmetry alone.

**Figure 4.**Cartoon graphic of the symmetries invoked, and where they lead. Ordering is drawn as upward arrows.

## Acknowledgements

## References

- Cox, R.T. Probability, frequency, and reasonable expectation. Am. J. Phys.
**1946**, 14, 1–13. [Google Scholar] [CrossRef] - Kolmogorov, A.N. Foundations of the Theory of Probability, 2nd English ed.; Chelsea: New York, NY, USA, 1956. [Google Scholar]
- Birkhoff, G. Lattice Theory; American Mathematical Society: Providence, RI, USA, 1967. [Google Scholar]
- Davey, B.A.; Priestley, H.A. Introduction to Lattices and Order; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Klain, D.A.; Rota, G.-C. Introduction to Geometric Probability; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Knuth, K.H. Deriving Laws from Ordering Relations. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Erickson, G.J., Zhai, Y., Eds.; Jackson: Hole, WY, USA, 2003. [Google Scholar]
- Halmos, P.R. Measure Theory; Springer: Berlin/Heidelberg, Germany, 1974. [Google Scholar]
- Gull, S.F.; Skilling, J. Maximum entropy method in image processing. IEE Proc. 131F, 646–659. [CrossRef]
- Von Mises, R. Probability, Statistics, and Truth; Dover: Mineola, NY, USA, 1981. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statist.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Shannon, C.F. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - De Finetti, B. Theory of Probability, Vol. I and Vol. II; John Wiley and Sons: New York, NY, USA, 1974. [Google Scholar]
- Dupré, M.J.; Tipler, F.J. New axioms for rigorous Bayesian probability. Bayesian Anal.
**2009**, 4, 599–606. [Google Scholar] [CrossRef] - Aczél, J. Lectures on Functional Equations and Their Applications; Academic Press: New York, NY, USA, 1966. [Google Scholar]
- Goyal, P.; Knuth, K.H.; Skilling, J. Origin of complex quantum amplitudes and Feynman’s rules. Phys. Rev. A
**2010**, 81, 022109. [Google Scholar] [CrossRef]

## A. Appendix A: Associativity Theorem

`x`,

`y`,

`z`,..., or disjoint lattice elements more generally, are to be assigned valuations $x,y,z,\cdots \phantom{\rule{0.166667em}{0ex}}$. If valuations coincide (though other marks may differ), such atoms are said to be of the same type. We allow arbitrarily many atoms of arbitrarily many types. Our proof is constructive, with combinations built as sequences of atoms appended one at a time, $\phantom{\rule{4pt}{0ex}}\mathtt{x}\bigsqcup \mathtt{y}\bigsqcup \cdots \phantom{\rule{4pt}{0ex}}$ having valuation $\phantom{\rule{4pt}{0ex}}x\oplus y\oplus \cdots \phantom{\rule{0.166667em}{0ex}}$. The consequent stand-alone derivation is rather long, but avoids making what would in our finite environment be an unnatural assumption of continuity. We also avoid assuming that an inverse to combination exists.

**Theorem:**

#### A.1. **Proof:**

#### A.2. One Type of Atom

#### Illustration

#### A.3. Induction to More Than One Type of Atom

**Figure 6.**A new value, displaced away from the existing grid, must lie within some interval. Any assignment outside the strict interior would be wrongly ordered, while any value inside could be reverted to some other selection by order-preserving regrade.

#### A.3.1 Repetition Lemma

#### A.3.2 Separation

**Figure 7.**The interval encompassing the new value lies above set $\mathcal{A}$ and below set $\mathcal{C}$.

#### A.3.3 Assignment When $\mathcal{B}$ Has Members

#### Illustration

#### A.3.4 Assignment When $\mathcal{B}$ Has no Members

#### Illustration

**Figure 9.**Multiples of a new atom can always be assigned linear values $\delta ,2\delta ,3\delta ,\cdots \phantom{\rule{0.166667em}{0ex}}$. An individual multiple can be assigned anywhere within the corresponding interval, but the linear assignment can always be chosen.

#### Accuracy

#### A.3.5 End of Inductive Proof

#### A.4. Axioms are Minimal

## B. Appendix B: Product Theorem

**Theorem:**

**product Equation**

#### B.1. **Proof:**

## C. Appendix C: Variational Theorem

**Theorem:**

**variational equation**

#### C.1. **Proof:**

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Knuth, K.H.; Skilling, J.
Foundations of Inference. *Axioms* **2012**, *1*, 38-73.
https://doi.org/10.3390/axioms1010038

**AMA Style**

Knuth KH, Skilling J.
Foundations of Inference. *Axioms*. 2012; 1(1):38-73.
https://doi.org/10.3390/axioms1010038

**Chicago/Turabian Style**

Knuth, Kevin H., and John Skilling.
2012. "Foundations of Inference" *Axioms* 1, no. 1: 38-73.
https://doi.org/10.3390/axioms1010038