- freely available
- re-usable

*Axioms*
**2012**,
*1*(1),
38-73;
doi:10.3390/axioms1010038

^{1}

^{2}

## Abstract

**:**We present a simple and clear foundation for finite inference that unites and significantly extends the approaches of Kolmogorov and Cox. Our approach is based on quantifying lattices of logical statements in a way that satisfies general lattice symmetries. With other applications such as measure theory in mind, our derivations assume minimal symmetries, relying on neither negation nor continuity nor differentiability. Each relevant symmetry corresponds to an axiom of quantification, and these axioms are used to derive a unique set of quantifying rules that form the familiar probability calculus. We also derive a unique quantification of divergence, entropy and information.

**PACS**02.50.Cw

**MSC**06A05

## 1. Introduction

The quality of an axiom rests on it being both convincing for the application(s) in mind, and compelling in that its denial would be intolerable.

We present elementary symmetries as convincing and compelling axioms, initially for measure, subsequently for probability, and finally for information and entropy. Our aim is to provide a simple and widely comprehensible foundation for the standard quantification of inference. We make minimal assumptions—not just for aesthetic economy of hypotheses, but because simpler foundations have wider scope.

It is a remarkable fact that algebraic symmetries can imply a unique calculus of quantification. Section 2 gives the background and outlines the procedure and major results. Section 3 lists the symmetries that are actually needed to derive the results, and the following Section 4 writes each required symmetry as an axiom of quantification. In Section 5, we derive the sum rule for valuation from the associative symmetry of ordered combination. This sum rule is the basis of measure theory. It is usually taken as axiomatic, but in fact it is derived from compelling symmetry, which explains its wide utility. There is also a direct-product rule for independent measures, again derived from associativity. Section 6 derives from the direct-product rule a unique quantitative divergence from source measure to destination.

In Section 7 we derive the chain product rule for probability from the associativity of chained order (in inference, implication). Probability calculus is then complete. Finally, Section 8 derives the Shannon entropy and information (a.k.a. Kullback–Leibler) as special cases of divergence of measures. All these formulas are uniquely defined by elementary symmetries alone.

Our approach is constructivist, and we avoid unnecessary formality that might unduly confine our readership. Sets and quantities are deliberately finite since it is methodologically proper to axiomatize finite systems before any optional passage towards infinity. R.T. Cox [1] showed the way by deriving the unique laws of probability from logical systems having a mere three elementary “atomic” propositions. By extension, those same laws applied to Boolean systems with arbitrarily many atoms and ultimately, where appropriate, to well-defined infinite limits. However, Cox needed to assume continuity and differentiability to define the calculus to infinite precision. Instead, we use arbitrarily many atoms to define the calculus to arbitrarily fine precision. Avoiding infinity in this way yields results that cover all practical applications, while avoiding unobservable subtleties.

Our approach unites and significantly extends the set-based approach of Kolmogorov [2] and the logic-based approach of Cox [1], to form a foundation for inference that yields not just probability calculus, but also the unique quantification of divergence and information.

## 2. Setting the Scene

We model the world (or some interesting aspect of it) as being in a particular **state** out of a finite set of mutually exclusive states (as in Figure 1, left). Since we and our tools are finite, a finite set of states, albeit possibly very large in number, suffices for all practical modeling.

As applied to inference, each state of the world is associated, via isomorphism, with a statement about the world. This results in a set of mutually exclusive statements, which we call **atoms**. Atoms are combined through logical `OR` to form compound statements comprising the **elements** of a **Boolean lattice** (Figure 1, right), which is isomorphic to a Boolean lattice of sets (Figure 1, center). Although carrying different interpretations, the mathematical structures are identical. Set inclusion “⊂” is equivalent to logical implication “⇒”, which we abstract to lattice order “<”. It is a matter of choice whether to include the null set ∅, equivalent to the logical absurdity ⊥. The set-based view is ontological in character and associated with Kolmogorov, while the logic-based view is epistemological in character and associated with Cox.

**Figure 1.**The Boolean lattice of potential states (

**center**) is constructed by taking the ${2}^{N}$ powerset of an antichain of N mutually exclusive atoms (in this case ${a}_{1},{a}_{2},{a}_{3}$,

**left**). This lattice is isomorphic to the Boolean lattice of logical statements ordered by logical implication (

**right**).

Quantification proceeds by assigning a real number $m\left(\mathtt{x}\right)=x$, called a **valuation**, to elements $\mathtt{x}$. (`Typewriter` font denotes lattice elements $\mathtt{x}$, whereas their associated valuations (real numbers) x are shown in italic.) We require valuations to be faithful to the lattice, in the sense that

Combination of two atoms (or disjoint compounds) into their compound is written as the operator ⊔, for example $\mathtt{z}=\mathtt{x}\bigsqcup \mathtt{y}$. Our first step is to quantify the combination of disjoint elements through an operator ⊕ that combines values (Table 1 below lists such operators and their eventual identifications).

We find that the symmetries underlying ⊔ place constraints on ⊕ that effectively require it to be addition +. At this stage, we already have the foundation of **measure theory**, and the generalization of combination (of disjoint elements) to the lattice join (of arbitrary elements) is straightforward. The wide applicability of these underlying symmetries explains the wide utility of measure theory, which might otherwise be mysterious.

Operation | Symbol | Quantification | (Eventual form) |
---|---|---|---|

ordering | < | < | |

combination | ⊔ | ⊕ | (addition) |

direct product | × | ⊗ | (multiplication) |

chaining | , | ⊙ | (multiplication) |

We can consider the atoms ${\mathtt{a}}_{1},{\mathtt{a}}_{2},{\mathtt{a}}_{3},\cdots ,{\mathtt{a}}_{N}$ and ${\mathtt{b}}_{1},{\mathtt{b}}_{2},\cdots ,{\mathtt{b}}_{M}$ from separate problems as $NM$ composite atoms ${\mathtt{c}}_{ij}={\mathtt{a}}_{i}\times {\mathtt{b}}_{j}$ in an equivalent composite problem. The **direct-product** operator ⊗ quantifies the composition of values:

It is common in science to acquire numerical assignments by optimizing a variational potential. By requiring consistency with the numerical assignments of ordinary multiplication, we find that there is a unique variational potential $H(\mathbf{p}\mid \mathbf{q})$, of “$plogp$” form, known as the (generalized Kullback–Leibler) Bregman **divergence** of measure $\mathbf{p}$ from measure $\mathbf{q}$.

Inference involves the relationship of one logical statement (predicate $\mathtt{x}$) to another (context $\mathtt{t}$), initially in a situation where $\mathtt{x}\Rightarrow \mathtt{t}$ so that the context includes subsidiary predicates. To quantify inference, we assign real numbers $p(\mathtt{x}\mid \mathtt{t})$, ultimately recognised as **probability**, to predicate–context **intervals** $[\mathtt{x},\mathtt{t}]$. Such intervals can be **chained** (concatenated) so that $[\mathtt{x},\mathtt{z}]=\left[\right[\mathtt{x},\mathtt{y}],[\mathtt{y},\mathtt{z}\left]\right]$, with ⊙ representing the chaining of values.

**product rule**of probability calculus. When applied to probabilities, the divergence formula reduces to the

**information**, also known as the Kullback–Leibler formula, with

**entropy**being a variant.

#### 2.1. The Order-Theoretic Perspective

The approach we employ can be described in terms of order-preserving (monotonic) maps between order-theoretic structures. Here we present our approach, described above, from this different perspective.

Order-theoretically, a finite set of exclusive states can be represented as an **antichain**, illustrated in Figure 1(left) as three states ${a}_{1}$, ${a}_{2}$, and ${a}_{3}$ situated side-by-side. Our state of knowledge about the world (more precisely, of our model of it—we make no ontological claim) is often incomplete so that we can at best say that the world is in one of a set of potential **states**, which is a subset of the set of all possible states. In the case of total ignorance, the set of potential states includes all possible states. In contrast, perfect knowledge about our model is represented by singleton sets consisting of a single state. We refer to the singleton sets as **atoms**, and note that they are exclusive in the sense that no two can be true.

The space of all possible sets of potential states is given by the partially-ordered set obtained from the powerset of the set of states ordered by set inclusion. For an antichain of mutually exclusive states, the powerset is a **Boolean lattice** (Figure 1, center), with the bottom element optional. By conceiving of a **statement** about our model of the world in terms of a set of potential states, we have an order-isomorphism from the Boolean lattice of potential states ordered by set inclusion to the Boolean lattice of statements ordered by logical implication (Figure 1, right). This isomorphism maps each set of potential states to a statement, while mapping the algebraic operations of set union ∪ and set intersection ∩ to the logical `OR` and `AND`, respectively.

The perspective provided by order theory enables us to focus abstractly on the structure of a Boolean lattice with its generic algebraic operations **join** ∨ and **meet**∧. This immediately broadens the scope from Boolean to more general **distributive lattices** — the first fruit of our minimalist approach. For additional details on partially ordered sets and lattices in particular, we refer the interested reader to the classic text by Birkhoff [3] or the more recent text by Davey & Priestley [4].

Quantification proceeds by assigning valuations $m\left(\mathtt{x}\right)=x$ to elements $\mathtt{x}$, to form a real-valued representation. For this to be faithful, we require an order-preserving (monotonic) map between the partial order of a distributive lattice and the total order of the chains that are to be found within. Thus $\mathtt{x}<\mathtt{y}$ is to imply that $x<y$, a relationship that we call **fidelity**. The converse is not true: the total order imposed by quantification must be consistent with but can extend the partial order of the lattice structure.

We write the combination of two atoms into a compound element (and more generally any two disjoint compounds into a compound element) as ⊔, for example $\mathtt{z}=\mathtt{x}\bigsqcup \mathtt{y}$. Derivation of the calculus of quantification starts with this disjoint combination operator, where we find that its symmetries place constraints on its representation ⊕ that allow us the convention of ordinary addition “$\oplus =+$”. This basic result generalizes to the standard **join** lattice operator ∨ for elements that (possibly having atoms in common) need not be disjoint, for which the sum rule generalizes to its standard inclusion/exclusion form [5], which involves the meet ∧ for any atoms in common.

There are two mathematical conventions concerning the handling the nothing-is-true null element ⊥ at the bottom of the lattice known as the absurdity. Some mathematicians opt to include the bottom element on aesthetic grounds, whereas others opt to exclude it because of its paradoxical interpretation [4]. If it is included, its quantification is zero. Either way, fidelity ensures that other elements are quantified by positive values that are positive (or, by elementary generalization, zero). At this stage, we already have the foundation of **measure theory**.

**Logical deduction** is traditionally based on a Boolean lattice and proceeds “upwards” along a chain (as in the arrows sketched in Figure 1). Given some statement $\mathtt{x}$, one can deduce that $\mathtt{x}$ implies $\mathtt{x}\mathtt{OR}\mathtt{y}$ since $\mathtt{x}\mathtt{OR}\mathtt{y}$ includes $\mathtt{x}$. Similarly, $\mathtt{x}\mathtt{AND}\mathtt{y}$ implies $\mathtt{x}$ since $\mathtt{x}$ includes $\mathtt{x}\mathtt{AND}\mathtt{y}$. The ordering relationships among the elements of the lattice are encoded by the zeta function of the lattice [6]

**Inference**, or **logical induction**, is the inverse of deduction and proceeds “downwards” along a chain, losing logical certainty as knowledge fragments. Our aim is to quantify this loss of certainty, in the expectation of deriving probability calculus. This requires generalization of the binary zeta function $\zeta (\mathtt{x},\mathtt{y})$ to some real-valued function $p(x\mid y)$ which will turn out to be the standard probability of x `GIVEN` y. However, a firm foundation for inference must be devoid of a choice of arbitrary generalizations. By viewing quantification in terms of an order-preserving map between the partial order (Boolean lattice) and a total order (chain) subject to compelling symmetries alone, we obtain a firm foundation for inference, devoid of further assumptions of questionable merit.

By considering atoms (singleton sets, which are the join-irreducible elements of the Boolean lattice) as precise statements about exclusive states, and composite lattice elements (sets of several exclusive states) as less precise statements involving a degree of ignorance, the two perspectives of logic and sets, on which the Cox and Kolmogorov foundations are based, become united within the order-theoretic framework.

In summary, the powerset comprises the **hypothesis space** of all possible statements that one can make about a particular model of the world. Quantification of join using + is the **sum rule** of probability calculus, and is required by adherence to the symmetries we list. It fixes the valuations assigned to composite elements in terms of valuations assigned to the atoms. Those latter valuations assigned to the atoms remain free, unconstrained by the calculus. That freedom allows the calculus to apply to inference in general, with the mathematically-arbitrary atom valuations being guided by insight into a particular application.

#### 2.2. Commentary

Our results—the sum rule and divergence for measures, and the sum and product rules with information for probabilities—are standard and well known (their uniqueness perhaps less so). The matter we address here is which assumptions are necessary and which are not. A Boolean lattice, after all, is a special structure with special properties. Insofar as fewer properties are needed, we gain generality. Wider applicability may be of little value to those who focus solely on inference. Yet, by showing that the basic foundations of inference have wider scope, we can thereby offer extra—and simpler—guidance to the scientific community at large.

Even within inference, distributive problems may have relationships between their atoms such that not all combinations of states are allowed. Rather than extend a distributive lattice to Boolean by padding it with zeros, the tighter framework immediately empowers us to work with the original problem in its own right. Scientific problems (say, the propagation of particles, or the generation of proteins) are often heavily conditional, and it could well be inappropriate or confusing to go to a full Boolean lattice when a sparser structure is a more natural model.

We also confirm that commutativity is not a necessary assumption. Rather, commutativity of measure is imposed by the associativity and order required of a scalar representation. Conversely, systems that are not commutative (matrices under multiplication, for example) cannot be both associative and ordered.

## 3. Symmetries

Here, we list the relevant symmetries on which our axioms are based. All are properties of distributive lattices, and our descriptions are styled that way so that a reader wary of further generality does not need to move beyond this particular, and important, example. However, one may note that not all the properties of a distributive lattice (such as commutativity of the join) are listed, which implies that these results are applicable to a broader class of algebraic structures that includes distributive lattices.

Valuation assignments rank statements via an order-preserving map which we call **fidelity** .

In the specific case of Boolean lattices of logical statements, the binary ordering relation, represented generically by <, is equivalent to logical implication (⇒) between different statements, or equivalently, proper subset inclusion (⊂) in the powerset representation. Combination preserves order from the right and from the left

Independent systems can be considered together (Figure 2).

**Figure 2.**One system might, for example, be playing-card suits $\mathtt{x}\in \{\u2660,\u2661,\u2663,\u2662\}$, while another independent system might be music keys $\mathtt{t}\in \{\u266d,\u266e,\u266f\}$. The direct-product combines the spaces of$\mathtt{x}$ and $\mathtt{t}$ to form the joint space of$\mathtt{x}\times \mathtt{t}$ with atoms like $\u2661\times \u266e$.

The direct-product operator × is taken to be (right-)distributive over ⊔

Finally, we consider a totally ordered set of logical statements that form a chain $\mathtt{x}<\mathtt{y}<\mathtt{z}<\mathtt{t}$. We focus on an interval on the chain, which is defined by an ordered pair of logical statements $[\mathtt{x},\mathtt{t}]$. Adjacent intervals can be chained, as in $\left[[\mathtt{x},\mathtt{y}],[\mathtt{y},\mathtt{z}]\right]=[\mathtt{x},\mathtt{z}]$, and chaining is associative

These and these alone are the symmetries we need for the axioms of quantification. They are presented as a cartoon in the “Conclusions” section below.

## 4. Axioms

We now introduce a layer of quantification. Our axioms arise from the requirement that any quantification must be consistent with the symmetries indicated above. Therefore, each symmetry gives rise to an axiom. We seek scalar valuations to be assigned to elements of a lattice, while conforming to the above symmetries (#0—#5)for disjoint elements.

Fidelity (symmetry #0) requires us to choose an increasing measure so that, without loss of generality, we may set $m\left(\perp \right)=0$ and thereafter

To conform to the distributive symmetry #3, we require ⊗ as set up in Equation 3 to obey

To conform to the associative symmetry #5, we require ⊙ as set up in Equation 4 to obey

## 5. Measure

Preliminary to investigating probability, we attend to the foundation of measure.

#### 5.1. Disjoint arguments

According to the scalar associativity theorem (Appendix A), an operator ⊕ obeying axioms 1 and 2 exists and can without loss of generality be taken to be addition +, giving the sum rule.

Commutativity $x\oplus y=y\oplus x$, though not explicitly assumed, is an unsurprising property. In accordance with fidelity (axiom 0), element values are strictly positive $x>0$. In this form, positive-valued valuation $m\left(\mathtt{x}\right)=x$ of lattice elements is known as a measure. If the null element is included as the bottom of the lattice, it has zero value.

Whilst we are free to adopt additivity as a convenient convention, we are also free to adopt any order-preserving regrade Θ for which the rule would be

Measure theory (see for example [7]) is usually introduced with additivity (countably additive or σ-additive) and non-negativity as “obvious” basic assumptions, with emphasis on the technical control of infinity in unbounded applications. Here we emphasize the foundation, and discover the reason why measure theory is constructed as it is. The symmetries of combination require it. Any other formulation would break these basic properties of associativity and order, and would not yield a widely useful theory.

#### 5.2. Arbitrary Arguments

For elements $\mathtt{x}$ and $\mathtt{y}$ that need not be disjoint, their join ∨ is defined as comprising all their constituent atoms counted once only, and the meet ∧ as comprising those atoms they have in common. In inference, ∨ is logical `OR` and ∧ is logical `AND`.

By putting $\mathtt{x}=\mathtt{u}\bigsqcup \mathtt{v}$ and $\mathtt{y}=\mathtt{v}\bigsqcup \mathtt{w}$ for disjoint $\mathtt{u},\mathtt{v},\mathtt{w}$, we reach the general “inclusion/exclusion” sum rule for arbitrary $\mathtt{x}$ and $\mathtt{y}$

#### 5.3. Independence

From the associativity of direct product (axiom 4), the associativity theorem (Appendix A again) assures the existence of an additivity relationship of the form

The product theorem (Appendix B) shows Θ to be logarithmic, with Equation 23 reading

## 6. Variation

Variational principles are common in science—minimum energy for equilibrium, Hamilton’s principle for dynamics, maximum entropy for thermodynamics, and so on—and we seek one for measures. The aim is to discover a variational potential $H\left(\mathbf{m}\right)$ whose constrained minimum allows the valuations $\mathbf{m}=({m}_{1},{m}_{2},\cdots ,{m}_{N})$ of N atoms to be assigned subject to appropriate constraints of the form $f\left(\mathbf{m}\right)=\mathrm{constant}$. (The vectors which appear in this section are shown in **bold-face** font.)

The variational potential is required to be general, applying to arbitrary constraints. Just like values themselves, constraints on individual atom values can be combined into compound constraints that influence several values: indeed the constraints could simply be imposition of definitive values. Such combination allows a Boolean lattice, entirely analogous to Figure 1, to be developed from individual atomic constraints. The variational potential H is to be a valuation on the measures resulting from these constraints, combination being represented by some operator ${\scriptstyle \u25ef}$ so that

Adding extra constraints always increases H, otherwise the variational requirement would be broken, so H must be faithful to chaining in the lattice.

Under perturbation, the minimization requirement is

One now invents supposedly constant “Lagrange multiplier” coefficients ${\lambda}_{1},{\lambda}_{2},\cdots $ and considers what appears at first to be the different problem of solving

Let the application be two-dimensional, x-by-y, in the sense of applying to values $m(\mathtt{x}\times \mathtt{y})$ of elements on a direct-product lattice. Suppose we have x-dependent constraints that yield $m\left(\mathtt{x}\right)={m}_{x}$ on one factor (say the card suits in Figure 2 above), and similar y-dependent constraints that yield $m\left(\mathtt{y}\right)={m}_{y}$ on the other factor (say music keys in Figure 2). Both factors being thus controlled, their direct-product is implicitly controlled by the those same constraints. Here, we already know the target value $m(\mathtt{x}\times \mathtt{y})={m}_{x}{m}_{y}$ from the direct-product rule Equation 28. Hence the variational assignment for the particular value $m(\mathtt{x}\times \mathtt{y})$ derives from

The coefficient ${C}_{i}$ represents the intrinsic importance of atom ${\mathtt{a}}_{i}$ in the summation, but usually the atoms are a priori equivalent so that the C’s take a common value. The scaling of a variational potential is arbitrary (and is absorbed in the Lagrange multipliers), so we may set $C=1$, ensuring that H has a minimum rather than a maximum. Alternatively, $C=-1$ would ensure a maximum. However, the settings of A and B depend on the application.

#### 6.1. Divergence and Distance

One use of H is as a quantifier of the divergence of destination values $\mathbf{w}$ from source values $\mathbf{u}$ that existed before the constraints that led to $\mathbf{w}$ were applied. For this, we set $C=1$ to get a minimum, ${B}_{i}=-log{u}_{i}$ to place the unconstrained minimizing $\mathbf{w}$ at $\mathbf{u}$, and ${A}_{i}={u}_{i}$ to make the minimum value zero. This form is

In general, H obeys neither commutativity nor the triangle inequality, $H(\mathbf{w}\mid \mathbf{u})\ne H(\mathbf{u}\mid \mathbf{w})$ and $H(\mathbf{w}\mid \mathbf{u})\nleqq H(\mathbf{w}\mid \mathbf{v})+H(\mathbf{v}\mid \mathbf{u})$. Hence it cannot be a geometrical “distance”, which is required to have both those properties. In fact, there is no definition of geometrical measure-to-measure distance that obeys the basic symmetries, because H is the only candidate, and it fails.

Here again we see our methodology yielding clear insight. “From–to” can be usefully quantified, but “between” cannot. A space of measures may have connectedness, continuity, even differentiability, but it cannot become a metric space and remain consistent with its foundation.

In the limit of many small values, H admits a continuum limit

## 7. Probability Calculus

In inference, we seek to impose on the hypothesis space a quantified degree of implication $p(\mathtt{x}\mid \mathtt{t})$, to represent the plausibility of predicate $\mathtt{x}$ conditional on current knowledge that excludes all hypotheses outside the stated context $\mathtt{t}$. This is accomplished via a bivaluation, which is a functional that takes a pair of lattice elements to a real number. This bivaluation should depend on both $\mathtt{x}$ (obviously) and $\mathtt{t}$ (otherwise it would be just the measure assigned to $\mathtt{x}$). The natural conjecture is that probability should be identified with a normalized measure, and we proceed to prove this—measures can have arbitrary total but probabilities will (according to standard convention) sum to unity.

At the outset, though, we simply wish to set up a bivaluation for predicate $\mathtt{x}$ within context $\mathtt{t}$.

#### 7.1. Chained Arguments

Within given context $\mathtt{t}$, we require $p(\mathtt{x}\mid \mathtt{t})$ to have the order and associative symmetries #1 and #2 that define a measure. Consequently, p obeys the sum rule

Associativity of chaining (axiom 5) for $\mathtt{a}<\mathtt{b}<\mathtt{c}<\mathtt{d}$ is represented by

The solution (Appendix B again) shows Θ to be logarithm, so that ⊙ was multiplication and

#### 7.2. Arbitrary Arguments

The chain-product rule, which as written above is valid for any chain, can be generalized to accommodate arbitrary elements. This is accomplished by noting that $\mathtt{x}\wedge \mathtt{y}=\mathtt{x}$ in a chain where $\mathtt{x}<\mathtt{y}$, so that $p(\mathtt{x}\wedge \mathtt{y}\mid \mathtt{y})=p(\mathtt{x}\mid \mathtt{y})$. The general form

The special case $p(\mathtt{t}\mid \mathtt{t})=1$ is obtained by setting $\mathtt{y}=\mathtt{z}=\mathtt{t}$ in the chain-product rule. For any $\mathtt{x}\le \mathtt{t}$, ordering requires $p(\mathtt{x}\mid \mathtt{t})\le p(\mathtt{t}\mid \mathtt{t})=1$, so that the range of values is $0\le p\le 1$ and we recognize p as **probability**, hereafter denoted Pr.

Probability calculus is now proved:

From the commutativity $Pr(\mathtt{x}\wedge \mathtt{y}\mid \mathtt{t})=Pr(\mathtt{y}\wedge \mathtt{x}\mid \mathtt{t})$ associated with ∧, we obtain Bayes’ Theorem

#### 7.3. Probability as a Ratio

The equations of probability calculus (range, sum rule, and chain-product rule) can all be subsumed in the single expression

This is, essentially, the original discredited frequentist definition (see [9]) of probability, as the ratio of number of successes to number of trials. However, it is here retrieved at an abstract level, which bypasses the catastrophic difficulties of literal frequentism when faced with isolated non-reproducible situations. Just as ordinary addition is forced for measures in $[0,\infty )$, so ordinary proportions in $[0,1]$ are forced for probability calculus.

Whereas the sum rule for measure and probability generalizes to the inclusion/exclusion form for general elements which need not be disjoint, so does the ratio form of probability allow generalization from intervals [3] to **generalized intervals**, consisting of arbitrary pairs $[\mathtt{x},\mathtt{t}]$ which need not be in a chain. The bivaluation form Equation 53 still holds but now represents a general **degree of implication** between arbitrary elements.

## 8. Information and Entropy

Here, we take special cases of the variational potential H, appropriate for probability distributions instead of arbitrary measures.

#### 8.1. Information

Within a given context, probability is a measure, normalized to unit mass. The divergence H of destination probability $\mathbf{p}$ from source probability $\mathbf{q}$ then simplifies to

If the final destination is a fully determined state, with a single p equal to 1 while all the others are necessarily 0, then we have the extreme case

In the limit of many small values, H admits a continuum limit

#### 8.2. Entropy

The variational potential

Entropy happens to be the expectation value of the information gained by deciding on one particular cell instead of any of the others in a partition.

S is a continuous function of its arguments.

If there are n equal choices, so that ${p}_{k}=1/n$, then S is monotonically increasing in n.

If a choice is broken down into subsidiary choices, then

S adds according to probabilistic expectation, meaning

$S({p}_{1},{p}_{2},{p}_{3})=S({p}_{1},{p}_{2}+{p}_{3})+({p}_{2}+{p}_{3})S({p}_{2},{p}_{3})$.

Information and entropy are near synonyms, and are often used interchangeably. As seen here, though, entropy S is different from H. It is a property of just one partitioned probability distribution, it has a maximum not a minimum, and it does not have a continuum limit. Its least value, attained when a single probability is 1 and all the others are 0, is zero. Its value generally diverges upwards as the partitioning deepens, whereas H usually tends towards a continuum limit.

## 9. Conclusions

#### 9.1. Summary

We start with a set $\{{\mathtt{a}}_{1},{\mathtt{a}}_{2},{\mathtt{a}}_{3},\cdots ,{\mathtt{a}}_{N}\}$ of “atomic” elements which in inference represent the most fundamental exclusive statements we can make about the states (of our model) of the world. Atoms combine to form a Boolean lattice which in inference is called the hypothesis space of statements. This structure has rich symmetry, but other applications may have less and we have selected only what we needed, so that our results apply more widely and to distributive lattices in particular. The minimal assumptions are so simple that they can be drawn as the cartoon below (Figure 4).

Axiom 1 represents the order property that is required of the combination operator ⊔. Axiom 2 says that valuation must conform to the associativity of ⊔. These axioms are compelling in inference. By the associativity theorem (Appendix A — see the latter part for a proof of minimality) they require the valuation to be a measure $m\left(\mathtt{x}\right)$, with ⊔ represented by addition (the sum rule). Any 1:1 regrading is allowed, but such change alters no content so that the standard linearity can be adopted by convention. This is the rationale behind measure theory.

The direct product operator × that represents independence is distributive (axiom 3) and associative (axiom 4), and consequently independent measures multiply (the direct-product rule). There is then a unique form of variational potential for assigning measures under constraints, yielding a unique divergence of one measure from another.

Probability $Pr(x\mid t)$ is to be a bivaluation, automatically a measure over predicate $\mathtt{x}$ within any specified context $\mathtt{t}$. Axiom 5 expresses associativity of ordering relations (in inference, implications) and leads to the chain-product rule which completes probability calculus. The variational potential defines the information (Kullback–Leibler) carried by a destination probability relative to its source, and also yields the Shannon entropy of a partitioned probability distribution.

#### 9.2. Commentary

We have presented a foundation for inference that unites and significantly extends the approaches of Kolmogorov [2] and Cox [1], yielding not just probability calculus, but also the unique quantification of divergence and information. Our approach is based on quantifying finite lattices of logical statements in such a way that quantification satisfies minimal required symmetries. This generalizes algebraic implication, or equivalently subset inclusion, to a calculus of degrees of implication. It is remarkable that the calculus is unique.

Our derivations have relied on a set of explicit axioms based on simple symmetries. In particular, we have made no use of negation (`NOT`), which in applications other than inference may well not be present. Neither have we assumed any additive or multiplicative behavior (as did Kolmogorov [2], de Finetti [12], and Dupré & Tipler [13]). On the contrary, we find that sum and product rules follow from elementary symmetry alone.

**Figure 4.**Cartoon graphic of the symmetries invoked, and where they lead. Ordering is drawn as upward arrows.

We find that associativity and order provide minimal assumptions that are convincing and compelling for scalar additivity in all its applications. Associativity alone does not force additivity, but associativity with order does. Positivity was not assumed, though it holds for all applications in this paper.

Commutativity was not assumed either, though commutativity of the resulting measure follows as a property of additivity. Associativity and commutativity do not quite force additivity because they allow degenerate solutions such as $a\oplus b=max(a,b)$. To eliminate these, strict order is required in some form, and if order is assumed then commutativity does not need to be. Hence scalar additivity rests on ordered sequences rather than the disordered sets for which commutativity would be axiomatic.

Aczél [14] assumes order in the form of reducibility, and he too derives commutativity. However, his analysis assumes the continuum limit already attained, which requires him to assume continuity.

Yet there can be no requirement of continuity, which is merely a convenient convention. For example, re-grading could take the binary representations of standard arguments ($101.{011}_{2}$ representing $5\frac{3}{8}$) and interpret them in base-3 ternary (with $101.{011}_{3}$ representing $10\frac{4}{27}$), so that $\Theta \left(10\frac{4}{27}\right)=5\frac{3}{8}$. Valuation becomes discontinuous everywhere, but the sum rule still works, albeit less conveniently. Indeed, no finite system can ever demonstrate the infinitesimal discrimination that defines continuity, so continuity cannot possibly be a requirement of practical inference.

At the cost of lengthening the proofs in the appendices, we have avoided assuming continuity or differentiability. Yet we remark that such infinitesimal properties ought not influence the calculus of inference. If they did, those infinitesimal properties would thereby have observable effects. But detecting whether or not a system is continuous at the infinitesimal scale would require infinite information, which is never available. So assuming continuity and differentiability, had that been demanded by the technicalities of mathematical proof (or by our own professional inadequacy), would in our view have been harmless. As it happens, each appendix touches on continuity, but the arguments are appropriately constructed to avoid the assumption, so any potential controversy over infinite sets and the rôle of the continuum disappears.

Other than reversible regrading, any deviation from the standard formulas must inevitably contradict the elementary symmetries that underlie them, so that popular but weaker justifications (e.g., de Finetti [12]) in terms of decisions, loss functions, or monetary exchange can be discarded as unnecessary. In fact, the logic is the other way round: such applications must be cast in terms of the unique calculus of measure and probability if they are to be quantified rationally. Indeed, we hold generally that it is a tactical error to buttress a strong argument (like symmetry) with a weak argument (like betting, say). Doing that merely encourages a skeptic to sow confusion by negating the weak argument, thereby casting doubt on the main thesis through an illogical impression that the strong argument might have been circumvented too.

Finally, the approach from basic symmetry is productive. Goyal and ourselves [15] have used just that approach to show why quantum theory is forced to use complex arithmetic. Long a deep mystery, the sum and product rules of complex arithmetic are now seen as inevitably necessary to describe the basic interactions of physics. Elementary symmetry thus brings measure, probability, information and fundamental physics together in a remarkably unified synergy.

## Acknowledgements

The authors would like to thank Seth Chaikin, Janos Aczél, Ariel Caticha, Julian Center, Philip Goyal, Steve Gull, Jeffrey Jewell, Vassilis Kaburlasos, Carlos Rodríguez, and a thoughtful anonymous reviewer. KHK was supported in part by the College of Arts and Sciences and the College of Computing and Information of the University at Albany, NASA Applied Information Systems Research Program (NASA NNG06GI17G) and the NASA Applied Information Systems Technology Program (NASA NNX07AD97A). JS was supported by Maximum Entropy Data Consultants Ltd.

## References

- Cox, R.T. Probability, frequency, and reasonable expectation. Am. J. Phys.
**1946**, 14, 1–13. [Google Scholar] [CrossRef] - Kolmogorov, A.N. Foundations of the Theory of Probability, 2nd English ed.; Chelsea: New York, NY, USA, 1956. [Google Scholar]
- Birkhoff, G. Lattice Theory; American Mathematical Society: Providence, RI, USA, 1967. [Google Scholar]
- Davey, B.A.; Priestley, H.A. Introduction to Lattices and Order; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
- Klain, D.A.; Rota, G.-C. Introduction to Geometric Probability; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
- Knuth, K.H. Deriving Laws from Ordering Relations. In Bayesian Inference and Maximum Entropy Methods in Science and Engineering; Erickson, G.J., Zhai, Y., Eds.; Jackson: Hole, WY, USA, 2003. [Google Scholar]
- Halmos, P.R. Measure Theory; Springer: Berlin/Heidelberg, Germany, 1974. [Google Scholar]
- Gull, S.F.; Skilling, J. Maximum entropy method in image processing. IEE Proc. 131F, 646–659. [CrossRef]
- Von Mises, R. Probability, Statistics, and Truth; Dover: Mineola, NY, USA, 1981. [Google Scholar]
- Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Statist.
**1951**, 22, 79–86. [Google Scholar] [CrossRef] - Shannon, C.F. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656. [Google Scholar] [CrossRef] - De Finetti, B. Theory of Probability, Vol. I and Vol. II; John Wiley and Sons: New York, NY, USA, 1974. [Google Scholar]
- Dupré, M.J.; Tipler, F.J. New axioms for rigorous Bayesian probability. Bayesian Anal.
**2009**, 4, 599–606. [Google Scholar] [CrossRef] - Aczél, J. Lectures on Functional Equations and Their Applications; Academic Press: New York, NY, USA, 1966. [Google Scholar]
- Goyal, P.; Knuth, K.H.; Skilling, J. Origin of complex quantum amplitudes and Feynman’s rules. Phys. Rev. A
**2010**, 81, 022109. [Google Scholar] [CrossRef]

## A. Appendix A: Associativity Theorem

Atoms `x`, `y`, `z`,..., or disjoint lattice elements more generally, are to be assigned valuations $x,y,z,\cdots \phantom{\rule{0.166667em}{0ex}}$. If valuations coincide (though other marks may differ), such atoms are said to be of the same type. We allow arbitrarily many atoms of arbitrarily many types. Our proof is constructive, with combinations built as sequences of atoms appended one at a time, $\phantom{\rule{4pt}{0ex}}\mathtt{x}\bigsqcup \mathtt{y}\bigsqcup \cdots \phantom{\rule{4pt}{0ex}}$ having valuation $\phantom{\rule{4pt}{0ex}}x\oplus y\oplus \cdots \phantom{\rule{0.166667em}{0ex}}$. The consequent stand-alone derivation is rather long, but avoids making what would in our finite environment be an unnatural assumption of continuity. We also avoid assuming that an inverse to combination exists.

We merely assume order (axiom 1)

**Theorem:**

#### A.1. **Proof:**

The form quoted in the theorem is easily seen to satisfy both axioms 1 and 2, which demonstrates existence of a calculus ⊕ of quantification. The remaining question is whether this calculus is unique.

We start by building sequences from just one type of atom before introducing successively more types to reach the general case. In this way, we lay down successively finer grids. Whenever another atom is introduced to generate a new sequence, that new sequence’s value inevitably lies somewhere at, between, or beyond previously assigned values. If it lies within an interval, we are free to choose it to be anywhere convenient. Such choice loses no generality, because the original value could be recovered by order-preserving regrade of the assignments. Values can be freely and reversibly regraded in and only in any way that preserves their order. Any such mapping preserves axiom 1, but reversal of ordering would allow the axiom to be broken.

Most points of the continuum escape this approach and are never accessed, so we do not allow ourselves continuum properties such as continuity. We build our finite system from the bottom up, using only those values that we actually need.

By interchanging x and y in axiom 1, the same relationship holds when “<” is replaced throughout by “>”, and replacement by “=” holds trivially. So, in effect, the axiom makes a three-fold assertion

#### A.2. One Type of Atom

Consider a set of disjoint atoms $\{{\mathtt{a}}_{1},{\mathtt{a}}_{2},{\mathtt{a}}_{3},\dots ,{\mathtt{a}}_{r},{\mathtt{a}}_{r+1},\dots ,{\mathtt{a}}_{N}\}$, each of which is associated with the same value so that $m\left({\mathtt{a}}_{i}\right)=a$ for all $i\in [1,N]$. We will append such atoms one at a time, using the combination operator ⊔ to construct compound elements

In principle, we could have any of

We proceed with atoms restricted to positive style, leaving the extension to negative (if required) until the end. Chaining a sequence of positive $\mathtt{a}$’s with another $\mathtt{a}$ yields, successively, the same nature of relationship between $m\left(1\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$ and $m\left(2\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$, then $m\left(2\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$ and $m\left(3\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$, and by induction $m\left(r\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$ and $m\left(r+1\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)$. Hence successive multiples are ranked by cardinality, and can continue indefinitely.

#### Illustration

We are not forced to adopt this linear scale, and a user’s original assignments may well not have used it. We can allow other increasing series, such as $m\left(r\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)={r}^{3}a$, but we could not use a non-increasing series like $m\left(r\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)=asin\left(r\right)$ without some values being the wrong way round. The only acceptable grades preserve order so that they can be monotonically reverted to the adopted integer scale (Figure 5).

#### A.3. Induction to More Than One Type of Atom

Suppose that sequences of atoms drawn from up to k types $\{\mathtt{a},\cdots ,\mathtt{c}\}$ are quantified as the grid of values

We now append an extra type $\mathtt{d}$ of atom, and investigate values of the extended function

**Figure 6.**A new value, displaced away from the existing grid, must lie within some interval. Any assignment outside the strict interior would be wrongly ordered, while any value inside could be reverted to some other selection by order-preserving regrade.

#### A.3.1 Repetition Lemma

To proceed, we need the repetition lemma, that if

Suppose the lemma does hold for n. Prefix Equation 74 with “$n{r}_{0}$ of $\mathtt{a}$ and ...and $n{t}_{0}$ of $\mathtt{c}$”, and postfix with $nu$ of $\mathtt{d}$.

#### A.3.2 Separation

We define the relevant intervals for the new sequences $\mu ({r}_{0},\cdots ,{t}_{0};u)$ by listing the previous values Equation 71 that lie below (set $\mathcal{A}$), at (set $\mathcal{B}$), and above (set $\mathcal{C}$) the new targets (Figure 7).

**Figure 7.**The interval encompassing the new value lies above set $\mathcal{A}$ and below set $\mathcal{C}$.

This decomposition must hold consistently across all new sequences, for all u. Values for any particular target multiplicity u lie in subsets of $\mathcal{A},\mathcal{B},\mathcal{C}$ with u fixed appropriately. It is convenient to denote provenance with a suffix (1 for $\mathcal{A}$, 2 for $\mathcal{B}$, 3 for $\mathcal{C}$), so that these definitions can be alternatively written as

Taking $((r-{r}_{0})a+\cdots +(t-{t}_{0})c)/u$ as the statistic, all members of $\mathcal{A}$ lie beneath all members of $\mathcal{B}$, which in turn lie beneath all members of $\mathcal{C}$. We can now assign the value of $\mu ({r}_{0},\cdots ,{t}_{0};u)$ for some target multiple u. The treatment differs somewhat according to whether or not $\mathcal{B}$ is empty.

#### A.3.3 Assignment When $\mathcal{B}$ Has Members

If $\mathcal{B}$ is non-empty, we now show that all its members share a common value. Let two members be $\{r,\cdots ,t;u\}$ and $\{{r}^{\prime},\cdots ,{t}^{\prime};{u}^{\prime}\}$ (the suffix “2” is temporarily redundant), so that, by definition,

#### Illustration

Suppose for simplicity that only one type of atom has previously been assigned ($k=1$), according to the integer scale $m\left(r\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{a}\right)=ra$ with $a=1$. Suppose that the new atom $\mathtt{d}$ has value $d=\frac{5}{3}$, rationally related to a. By 3-fold repetition, this means that $m\left(3\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)$ lies exactly at 5, and is a member of set $\mathcal{B}$. Again by 3-fold repetition, $m\left(1\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)$ cannot lie at or below 1 because that would wrongly imply $m\left(3\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)\le 3$. Similarly, it cannot lie at or above 2 because that would imply $m\left(3\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)\ge 6$. So $m\left(1\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)$ necessarily lies between 1 (which lies in set $\mathcal{A}$) and 2 (which lies in set $\mathcal{C}$) and can without loss of generality be assigned $\frac{5}{3}$. Similarly, $m\left(2\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)$ necessarily lies between 3 and 4 and can without loss of generality be assigned $\frac{10}{3}$, and so on (Figure 8).

These assignments obey axioms 1 and 2, and we now have $\mathtt{a}$ and $\mathtt{d}$ on the same linear scale.

#### A.3.4 Assignment When $\mathcal{B}$ Has no Members

When $\mathcal{B}$ is empty, the strict inequalities Equation 86 separating $\mathcal{A}$ and $\mathcal{C}$ imply that partitioning between them can be accomplished by some real δ.

#### Illustration

Suppose that three types of atom have previously been assigned ($k=3$), according to

According to Equation 93 with ${r}_{0}={s}_{0}={t}_{0}=0$, the value of $\delta =m\left(u\phantom{\rule{4.pt}{0ex}}\mathrm{of}\phantom{\rule{4.pt}{0ex}}\mathtt{d}\right)/u$ is constrained by all the members of $\mathcal{A}$, $\mathcal{B}$ and $\mathcal{C}$.

**Figure 9.**Multiples of a new atom can always be assigned linear values $\delta ,2\delta ,3\delta ,\cdots \phantom{\rule{0.166667em}{0ex}}$. An individual multiple can be assigned anywhere within the corresponding interval, but the linear assignment can always be chosen.

By the time these sets have expanded to cover up to 10 copies of $\mathtt{d}$, the surviving interval is

#### Accuracy

The gap between $\mathcal{A}$ and $\mathcal{C}$ might allow δ to be uncertain. We assume that δ is bounded below, otherwise the appended atoms of type $\mathtt{d}$ never have measurable effect. This implies the existence of u such that $u\delta >na$ for any multiple n, no matter how large. We also assume that δ is bounded above, otherwise even a single $\mathtt{d}$ atom always overwhelms everything else. This implies the existence of a greatest $r\ge n$ such that $ra<u\delta $ for that u. Taking other types of atom to be absent for simplicity, we have

This proves that δ can be found to arbitrarily high accuracy by allowing sufficiently high multiples. Denote the limiting value of δ by d. This value $m\left(\mathtt{d}\right)=d$ of a single atom of type $\mathtt{d}$ is now fixed to unlimited accuracy, but has no rational relationship to the previous values $a,\cdots ,c$.

#### A.3.5 End of Inductive Proof

Whether or not $\mathcal{B}$ had members, the assignment

Atom types in the above expression are often different, but do not need to be, and the formula represents the quantification of a general sequence. Embedded in it, and equivalent to it, is the sum rule $x\oplus y=x+y$ for the values $m\left(\mathtt{x}\right)=x$ and $m\left(\mathtt{y}\right)=y$ of arbitrary sequences. Any order-preserving regrade Θ is also permitted, but no order-breaking transform is permitted.

This completes the inductive proof for atoms of positive style. The proof holds equally well for atoms of negative style, for which the values are negative. Meanwhile, Equation 68 shows that atoms of null style have zero value. So, even if the atoms may have arbitrary style, Equation 104 offers the only consistent combination rule. The result thus holds for atom values of arbitrary sign and arbitrary magnitude, though the nature of the constructive proof requires atom multiplicities to be non-negative. ☐

#### A.4. Axioms are Minimal

Axioms 1a, 1b, 2 are individually required.

Proof:

We construct operators ${\scriptstyle \u25ef}$ (“not quite ⊕”) which deny each axiom in turn, while not being a monotonic strictly increasing regrade of addition.

Without axiom 1a (postfix ordering), the definition

Without axiom 1b (prefix ordering), the definition

Without axiom 2 (associativity), the definition

## B. Appendix B: Product Theorem

**Theorem:**

The solution of the functional **product Equation**

#### B.1. **Proof:**

The quoted solution is easily seen to satisfy the product equation, which demonstrates existence. The remaining question is whether the solution is unique.

First, we take the special case $\xi =\eta $, so that $\zeta -\xi $ and $\zeta -\eta $ take a common value a. This gives a 2-term recurrence

To complete the proof, take a second special case where $\zeta -\xi $ and $(\zeta -\eta )/2$ take a common value b. This gives a 3-term recurrence

This combines with the 2-term formula to make

Although this strongly suggests that Ψ will be exponential, that is not yet fully proved because offsets $mb-na$ with even m are only a subset of the reals. There could be one scaling for arguments θ of the form $mb-na$, another for the form $\sqrt{2}+mb-na$, yet another for $\pi +mb-na$, and so on. Fortunately, $b/a$ is irrational, so the offset $mb-na$ can approach any real value x arbitrarily closely. Express x as $x=mb-na+\u03f5$ with m and n chosen to make ϵ arbitrarily small. Then

This obeys the original product equation without further restriction and is the general solution, with corollary ${e}^{A\xi}+{e}^{A\eta}={e}^{A\zeta}$ defining $\zeta (\xi ,\eta )$ and confirming that $a={A}^{-1}log2$ and $b={A}^{-1}log\left(\frac{1+\surd 5}{2}\right)$ were appropriate constants. ☐

The sought inverse, in terms of the constants A and C, is

## C. Appendix C: Variational Theorem

**Theorem:**

The solution of the functional **variational equation**

#### C.1. **Proof:**

The quoted solution is easily seen to satisfy the variational equation, with corollaries that the functions λ and μ are logarithmic, which demonstrates existence. The remaining question is whether the solution is unique.

Write $log{m}_{x}=u$, $log{m}_{y}=v$, and rewrite the functions as ${\lambda}^{*}\left(u\right)$, ${\mu}^{*}\left(v\right)$ and ${H}^{\prime}\left(m\right)=h(logm)$.

To show that, we blur functions $\varphi (u,v)$ by convolving them with the following unit-mass ellipse, chosen to blur u, v and $u+v$ equally, according to

Finally, the definition $dH/dm=h(logm)=B+f(logm)$ yields

This obeys the original variational equation with corollaries $\lambda \left(x\right)={B}_{1}+Clog\left(x\right)$ and $\mu \left(x\right)={B}_{2}+Clog\left(x\right)$ where ${B}_{1}+{B}_{2}=B$, and is the general solution. ☐

© 2012 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).