# Objective Bayesianism and the Maximum Entropy Principle

^{*}

## Abstract

**:**

## 1. Introduction

**Probability:**The strengths of an agent’s beliefs should satisfy the axioms of probability. That is, there should be a probability function, ${P}_{E}:S\mathcal{L}\u27f6[0,1]$, such that for each sentence θ of the agent’s language $\mathcal{L}$, ${P}_{E}(\theta )$ measures the degree to which the agent with evidence E believes sentence θ. (Here, $\mathcal{L}$ will be construed as a finite propositional language and $S\mathcal{L}$ as the set of sentences of $\mathcal{L}$, formed by recursively applying the usual connectives.)**Calibration:**The strengths of an agent’s beliefs should satisfy constraints imposed by her evidence E. In particular, if the evidence determines just that physical probability (aka chance), ${P}^{*}$, is in some set ${\mathbb{P}}^{*}$ of probability functions defined on $S\mathcal{L}$, then ${P}_{E}$ should be calibrated to physical probability insofar as it should lie in the convex hull, $\mathbb{E}=\langle {\mathbb{P}}^{*}\rangle $, of the set ${\mathbb{P}}^{*}$. (We assume throughout this paper that chance is probabilistic, i.e., that ${P}^{*}$ is a probability function.)**Equivocation:**The agent should not adopt beliefs that are more extreme than is demanded by her evidence E. That is, ${P}_{E}$ should be a member of $\mathbb{E}$ that is sufficiently close to the equivocator function, ${P}_{=}$, which gives the same probability to each $\omega \in \Omega $, where the state descriptions or states, ω, are sentences describing the most fine-grained possibilities expressible in the agent’s language.

The problem is that the three norms are justified in rather different ways. The probability norm is motivated by avoiding sure loss. The calibration norm is motivated by avoiding sure long-run loss or by avoiding positive expected loss. The equivocation norm is motived by minimising worst-case expected loss. In particular, the loss function appealed to in the justification of the equivocation norm differs from that invoked by the justifications of the probability and calibration norms.All our lives, we are in a sense betting. Whenever we go to the station, we are betting that a train will really run, and if we had not a sufficient degree of belief in this, we should decline the bet and stay at home.(p. 183 in [5])

## 2. Belief over Propositions

#### 2.1. Normalisation

**Definition 1 (Normalised belief function on propositions).**Let $M={max}_{\pi \in \Pi}{\sum}_{F\in \pi}\mathit{bel}(F)$. Given a belief function, $\mathit{bel}:\mathcal{P}\Omega \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, that is not zero everywhere, its normalisation, $B:\mathcal{P}\Omega \u27f6[0,1]$, is defined by setting $B(F)=\mathit{bel}(F)/M$ for each $F\subseteq \Omega $. We shall denote the set of normalised belief functions by $\mathbb{B}$, so:

**Lemma 1 (Compactness).**$\mathbb{B}$ and $\langle \mathbb{B}\rangle $ are compact.

**Proof:**$\mathbb{B}\subset {\mathbb{R}}^{|\mathcal{P}\Omega |}$ is bounded, where ⊂ denotes strict subset inclusion. Now, consider a sequence, ${({B}_{t})}_{t\in \mathbb{N}}\in \mathbb{B}$ which converges to some $B\in {\mathbb{R}}^{|\mathcal{P}\Omega |}.$ Then, for all $\pi \in \Pi $, we find ${\sum}_{F\in \pi}B(F)\le 1.$ Assume that $B\notin \mathbb{B}$. Thus for all $\pi \in \Pi $, we have ${\sum}_{F\in \pi}B(F)<1.$ However, then there has to exist a ${t}_{0}\in \mathbb{N}$, such that for all $t\ge {t}_{0}$ and all $\pi \in \Pi $, ${\sum}_{F\in \pi}{B}_{t}(F)<1$. This contradicts ${B}_{t}\in \mathbb{B}.$ Thus, $\mathbb{B}$ is closed and, hence, compact.

**Proposition 1.**$P\in \mathbb{P}$ if and only if $P:\mathcal{P}\Omega \u27f6[0,1]$ satisfies the axioms of probability:

**P1:**- $P(\Omega )=1$ and $P(\varnothing )=0$.
**P2:**- If $F\cap G=\varnothing $, then $P(F)+P(G)=P(F\cup G)$.

**Proof:**Suppose $P\in \mathbb{P}$. $P(\Omega )=1$, because $\{\Omega \}$ is a partition. $P(\varnothing )=0$, because $\{\Omega ,\varnothing \}$ is a partition and $P(\Omega )=1$. If $F,G\subseteq \Omega $ are disjoint, then $P(F)+P(G)=P(F\cup G)$, because $\{F,G,\overline{F\cup G}\}$ and $\{F\cup G,\overline{F\cup G}\}$ are both partitions, so $P(F)+P(G)=1-P(\overline{F\cup G})=P(F\cup G)$.

**Example 1 (Contrasting $\mathbb{B}$ with $\mathbb{P}$).**Using Equation (1), we find ${\sum}_{F\subseteq \Omega}P(F)=\frac{|\mathcal{P}\Omega |}{2}\ge {\sum}_{F\subseteq \Omega}B(F)$ for all $P\in \mathbb{P}$ and $B\in \mathbb{B}.$ For probability functions, $P\in \mathbb{P}$, probability is evenly distributed among the propositions of fixed size in the following sense:

#### 2.2. Entropy

**Definition 2 (g-entropy).**Given a weighting function $g:\Pi \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, the generalised entropy or g-entropy of a normalised belief function is defined as:

**Definition 3 (Inclusive weighting function).**A weighting function $g:\Pi \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$ is inclusive if for all $F\subseteq \Omega $, there is some partition π containing F such that $g(\pi )>0$.

**Lemma 2.**The function $-log:[0,1]\to [0,\infty ]$ is continuous in the standard topology on ${\mathbb{R}}_{\ge 0}\cup \{+\infty \}.$

**Proof:**To obtain the standard topology on ${\mathbb{R}}_{\ge 0}\cup \{+\infty \}$, take as open sets infinite unions and finite intersections over the open sets of ${\mathbb{R}}_{\ge 0}$ and sets of the form, $(r,\infty ]$, where $r\in \mathbb{R}.$ In this topology on $[0,\infty ],$ a set $M\subseteq {\mathbb{R}}_{\ge 0}$ is open if and only if it is open in the standard topology in ${\mathbb{R}}_{\ge 0}.$ Hence, $-log$ is continuous in this topology on $(0,1].$

**Proposition 2.**g-entropy is non-negative and, for inclusive g, strictly concave on $\langle \mathbb{B}\rangle $.

**Proof:**$B(F)\in [0,1]$ for all F, so $logB(F)\phantom{\rule{-0.166667em}{0ex}}\le \phantom{\rule{-0.166667em}{0ex}}0$, and $g(\pi ){\sum}_{F\in \pi}B(F)logB(F)\phantom{\rule{-0.166667em}{0ex}}\le \phantom{\rule{-0.166667em}{0ex}}0$. Hence, ${\sum}_{\pi \in \Pi}-g(\pi ){\sum}_{F\in \pi}B(F)logB(F)\ge 0,$ i.e., g-entropy is non-negative.

**Corollary 1.**For inclusive g, if g-entropy is maximised by a function ${P}^{\u2020}$ in convex $\mathbb{E}\subseteq \mathbb{P}$, it is uniquely maximised by ${P}^{\u2020}$ in $\mathbb{E}$.

**Corollary 2.**For inclusive g, g-entropy is uniquely maximised in the closure, $\left[\mathbb{E}\right]$, of $\mathbb{E}$.

**Figure 1.**Plotted are the partition entropy, the standard entropy and the proposition entropy under the constraints, $P({\omega}_{1})+P({\omega}_{2})+P({\omega}_{3})+P({\omega}_{4})=1$, $P({\omega}_{1})+2.75P({\omega}_{2})+7.1P({\omega}_{3})=1.7,\phantom{\rule{0.166667em}{0ex}}P({\omega}_{4})=0$, as a function of $P({\omega}_{2}).$ The dotted lines indicate the respective maxima, which obtain for different values of $P({\omega}_{2}).$

#### 2.3. Loss

**Definition 4 (Loss function).**A loss function is a function $L:\mathcal{P}\Omega \times \langle \mathbb{B}\rangle \u27f6(-\infty ,\infty ]$.

**L1.**- $L(F,B)=0$ if $B(F)=1$.
**L2.**- $L(F,B)$ strictly increases as $B(F)$ decreases from one towards zero.
**L3.**- $L(F,B)$ depends only on $B(F)$.

**L4.**- Losses are additive when the language is composed of independent sublanguages: if $\mathcal{L}={\mathcal{L}}_{1}\cup {\mathcal{L}}_{2}$ for ${\mathcal{L}}_{1}{\u2aeb}_{B}{\mathcal{L}}_{2}$, then $L({F}_{1}\times {F}_{2},B)={L}_{1}({F}_{1},{B}_{\downharpoonright {\mathcal{L}}_{1}})+{L}_{2}({F}_{2},{B}_{\downharpoonright {\mathcal{L}}_{2}})$, where ${L}_{1},{L}_{2}$ are loss functions defined on ${\mathcal{L}}_{1},{\mathcal{L}}_{2}$, respectively.

**Theorem 1.**If loss functions are assumed to satisfy L1–4, then $L(F,B)=-klogB(F)$ for some constant, $k>0$, that does not depend on $\mathcal{L}$.

**Proof:**We shall first focus on a loss function, L, defined with respect to a language, $\mathcal{L}$, that contains at least two propositional variables.

#### 2.4. Score

**Definition 5 (g-score).**Given a loss function, L, and an inclusive weighting function, $g:\Pi \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, the g-expected loss function or g-scoring rule or, simply, g-score is ${S}_{g}^{L}:\mathbb{P}\times \langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$, such that

**Definition 6 (Strictly proper g-score).**A scoring rule, ${S}_{g}^{L}:\mathbb{P}\times \langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$, is strictly proper, if for all $P\in \mathbb{P}$, the function ${S}_{g}^{L}(P,\xb7):\langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$ has a unique global minimum at $B=P$.

**Definition 7 (g-divergence).**For a weighting function, $g:\Pi \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, the g-divergence is the function, ${d}_{g}:\mathbb{P}\times \langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$, defined by:

**Lemma 3 (Log sum inequality).**For ${x}_{i},{y}_{i}\in {\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0},i,j=1,\dots ,k$,

**Proposition 3.**The following are equivalent:

- ${d}_{g}(P,B)\ge 0$ with equality iff $B=P$.
- g is inclusive.

**Proof:**First we shall see that if g is inclusive, then ${d}_{g}(P,B)\ge 0$ with equality iff $B=P$.

- (i)
- $\varnothing \subset F\subset \Omega $. Take some $P\in \mathbb{P}$ such that $P(F)>0.$ Now, define $B(F):=0,$ and $B({F}^{\prime}):=P({F}^{\prime})$ for all other ${F}^{\prime}.$ Then, $B(\Omega )=1$ and ${\sum}_{G\in \pi}B(G)\le 1$ for all other $\pi \in \Pi $, so $B\in \mathbb{B}\subseteq \langle \mathbb{B}\rangle $. Furthermore, ${d}_{g}(P,P)={d}_{g}(P,B)=0$.
- (ii)
- $F=\varnothing $ or $F=\Omega $. Define $B(\varnothing ):=B(\Omega ):=0.5$ and $B(F):=P(F)$ for all $\varnothing \subset F\subset \mathcal{P}\Omega .$ Then, $B(\varnothing )+B(\Omega )=1$ and ${\sum}_{G\in \pi}B(G)\le 1$ for all other $\pi \in \Pi $, so $B\in \mathbb{B}\subseteq \langle \mathbb{B}\rangle $. Furthermore, ${d}_{g}(P,P)={d}_{g}(P,B)=0$.

**Corollary 3.**The logarithmic g-score is strictly proper.

**Proof:**Recall that in the context of a g-score, g is inclusive.

**Proposition 4.**The logarithmic g-score ${S}_{g}^{log}(P,B)$ is non-negative and convex as a function of $B\in \langle \mathbb{B}\rangle $. Convexity is strict, i.e., ${S}_{g}^{log}(P,\lambda {B}_{1}+(1-\lambda ){B}_{2})<\lambda {S}_{g}^{log}(P,{B}_{1})+(1-\lambda ){S}_{g}^{log}(P,{B}_{2})$ for $\lambda \in (0,1)$, unless ${B}_{1}$ and ${B}_{2}$ agree everywhere, except where $P(F)=0$.

**Proof:**The logarithmic g-score is non-negative, because $B(F),P(F)\in [0,1]$ for all F; so, $logB(F)\phantom{\rule{-0.166667em}{0ex}}\le \phantom{\rule{-0.166667em}{0ex}}0$, $P(F)logB(F)\phantom{\rule{-0.166667em}{0ex}}\le \phantom{\rule{-0.166667em}{0ex}}0$, and $g(\pi )>0$.

#### 2.5. Minimising the Worst-Case Logarithmic g-Score

**Definition 8 (König [18], p. 56).**For $F:\mathbb{X}\times \mathbb{Y}\to [-\infty ,\infty ]$, we call $I\subset \mathbb{R}$ a border interval of F, if and only if I is an interval of the form $I=({sup}_{x\in \mathbb{X}}{inf}_{y\in \mathbb{Y}}F(x,y),+\infty ).$ $\Lambda \subset \mathbb{R}$ is called a border set of F if and only if $inf\Lambda ={sup}_{x\in \mathbb{X}}{inf}_{y\in \mathbb{Y}}F(x,y).$

**Lemma 4 (König’s Minimax).**Let $\mathbb{X},\mathbb{Y}$ be topological spaces, $\mathbb{Y}$ be compact and Hausdorff and let $F:\mathbb{X}\times \mathbb{Y}\to [-\infty ,\infty ]$ be lower semicontinuous. Then, if Λ is some border set, I some border interval of F and if at least one of the following conditions holds:

- for all $\lambda \in \Lambda $, all members of ${s}_{\lambda}$ and ${\tau}_{\lambda}$ are connected;
- for all $\lambda \in \Lambda $, all members of ${s}_{\lambda}$ are connected and all $\lambda \in I$ all ${t}_{\lambda}$ are connected;
- for all $\lambda \in \Lambda $, all members of ${\sigma}_{\lambda}$ and ${t}_{\lambda}$ are connected;
- for all $\lambda \in \Lambda $, all members of ${\sigma}_{\lambda}$ are connected and all $\lambda \in I$ all ${\tau}_{\lambda}$ are connected;

**Lemma 5.**${S}_{g}^{log}:\mathbb{E}\times \langle \mathbb{B}\rangle \to [0,\infty ]$ is lower semicontinuous.

**Proof:**It suffices to show that $\{(P,B)\in \mathbb{E}\times \langle \mathbb{B}\rangle |{S}_{g}^{log}(P,B)\le r\}$ is closed for all $r\in \mathbb{R}.$ For $r\in \mathbb{R}$ consider a sequence ${({P}_{t},{B}_{t})}_{t\in \mathbb{N}}$ with ${lim}_{t\to \infty}({P}_{t},{B}_{t})=(P,B)$, such that ${S}_{g}^{log}({P}_{t},{B}_{t})\le r$ for all t. Then:

**Proposition 5.**For all $\mathbb{E}$:

**Proof:**It suffices to verify that the conditions of Lemma 4 are satisfied.

**Theorem 2.**As usual, $\mathbb{E}$ is taken to be convex and g inclusive. We have that:

**Proof:**We shall prove the following slightly stronger equality, allowing B to range in $\langle \mathbb{B}\rangle $, instead of $\mathbb{B}$:

**Theorem 3.**Suppose ${\mathbb{P}}^{*}\subseteq \mathbb{P}$ is such that the unique g-entropy maximiser, ${P}^{\u2020}$, for $\left[\mathbb{E}\right]=\left[\langle {\mathbb{P}}^{*}\rangle \right]$, is in $\left[{\mathbb{P}}^{*}\right]$. Then:

**Proof:**As in the previous proof, we shall prove a slightly stronger equality:

## 3. Belief over Sentences

#### 3.1. Normalisation

**Definition 9 (Representation).**A sentence, $\theta \in S\mathcal{L}$, represents the proposition $F=\{\omega :\omega \vDash \theta \}$. Let $\mathcal{F}$ be a set of pairwise distinct propositions. We say that $\Theta \subseteq S\mathcal{L}$ is a set of representatives of $\mathcal{F},$ if and only if each sentence in Θ represents some proposition in $\mathcal{F}$ and each proposition in $\mathcal{F}$ is represented by a unique sentence in Θ. A set, ρ, of representatives of $\mathcal{P}\Omega $ will be called a representation. We denote by ϱ the set of all representations. For a set of pairwise distinct propositions, $\mathcal{F}$, and a representation, $\rho \in \varrho $, we denote by $\rho (\mathcal{F})\subset S\mathcal{L}$ the set of sentences in ρ that represent the propositions in $\mathcal{F}.$

**Definition 10 (Normalised belief function on sentences).**Define the set of normalized belief functions on $S\mathcal{L}$ as:

**Proposition 6.**${P}_{\mathcal{L}}\in {\mathbb{P}}_{\mathcal{L}}$ iff ${P}_{\mathcal{L}}:S\mathcal{L}\u27f6[0,1]$ satisfies the axioms of probability:

**P1:**- ${P}_{\mathcal{L}}(\tau )=1$ for all tautologies $\tau .$
**P2:**- If $\vDash \neg (\phi \wedge \psi )$ then ${P}_{\mathcal{L}}(\phi \vee \psi )={P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi )$.

**Proof:**Suppose ${P}_{\mathcal{L}}\in {\mathbb{P}}_{\mathcal{L}}$. For any tautology, $\tau \in S\mathcal{L}$, it holds that ${P}_{\mathcal{L}}(\tau )=1$, because $\{\tau \}$ is a partition in ${\Pi}_{\mathcal{L}}.$ ${P}_{\mathcal{L}}(\neg \tau )=0$, because $\{\tau ,\neg \tau \}$ is a partition in ${\Pi}_{\mathcal{L}}$ and ${P}_{\mathcal{L}}(\tau )=1$.

- (i)
- $\vDash \phi $ and $\vDash \neg \psi ,$ then $\vDash \phi \vee \psi .$ Thus, by the above ${P}_{\mathcal{L}}(\phi )=1$ and ${P}_{\mathcal{L}}(\psi )=0$, and hence, ${P}_{\mathcal{L}}(\phi \vee \psi )=1={P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi ).$
- (ii)
- $\vDash \neg \phi $ and $\vDash \neg \psi ,$ then $\vDash \neg \phi \vee \neg \psi .$ Thus, ${P}_{\mathcal{L}}(\phi \vee \psi )=0={P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi ).$
- (iii)
- $\u22ad\neg \phi ,$ $\u22ad\phi ,$ and $\vDash \neg \psi ,$ then $\{\phi \vee \psi ,\neg \phi \vee \psi \}$ and $\{\phi ,\neg \phi \vee \psi \}$ are both partitions in ${\Pi}_{\mathcal{L}}.$ Thus, ${P}_{\mathcal{L}}(\phi \vee \psi )+{P}_{\mathcal{L}}(\neg \phi \vee \psi )=1={P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\neg \phi \vee \psi ).$ Putting these observations together, we now find ${P}_{\mathcal{L}}(\phi \vee \psi )={P}_{\mathcal{L}}(\phi )={P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi ).$
- (iv)
- $\u22ad\neg \phi ,$ $\u22ad\neg \psi $ and $\vDash \phi \leftrightarrow \neg \psi ,$ then $\{\phi ,\psi \}$ is a partition and $\phi \vee \psi $ is a tautology. Hence, ${P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi )=1$ and ${P}_{\mathcal{L}}(\phi \vee \psi )=1$. This now yields ${P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi )={P}_{\mathcal{L}}(\phi \vee \psi )$.
- (v)
- $\u22ad\neg \phi ,$ $\u22ad\neg \psi $ and $\u22ad\phi \leftrightarrow \neg \psi ,$ then none of the following sentences is a tautology or a contradiction: $\phi ,\psi ,\phi \vee \psi ,\neg (\phi \vee \psi ).$ Since $\{\phi ,\psi ,\neg (\phi \vee \psi )\}$ and $\{\phi \vee \psi ,\neg (\phi \vee \psi )\}$ are both partitions in ${\Pi}_{\mathcal{L}}$, we obtain ${P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi )=1-{P}_{\mathcal{L}}(\neg (\phi \vee \psi ))={P}_{\mathcal{L}}(\phi \vee \psi )$. So, ${P}_{\mathcal{L}}(\phi )+{P}_{\mathcal{L}}(\psi )={P}_{\mathcal{L}}(\phi \vee \psi )$.

**Definition 11 (Respects logical equivalence).**We say that a belief function ${B}_{\mathcal{L}}\in \langle {\mathbb{B}}_{\mathcal{L}}\rangle $ respects logical equivalence if and only if $\vDash \phi \leftrightarrow \psi $ implies ${B}_{\mathcal{L}}(\phi )={B}_{\mathcal{L}}(\psi ).$

**Proposition 7.**The probability functions ${P}_{\mathcal{L}}\in {\mathbb{P}}_{\mathcal{L}}$ respect logical equivalence.

**Proof:**Suppose ${P}_{\mathcal{L}}\in {\mathbb{P}}_{\mathcal{L}}$ and assume that $\phi ,\psi \in S\mathcal{L}$ are logically equivalent. Note that $\psi \wedge \neg \phi \vDash {A}_{1}\wedge \neg {A}_{1},$ $\psi \vee \neg \phi \vDash {A}_{1}\vee \neg {A}_{1}$ and that $\{\phi ,\neg \phi \}$ and $\{\psi ,\neg \phi \}$ are partitions in ${\Pi}_{\mathcal{L}}.$ Hence:

#### 3.2. Loss

**L1.**- $L(\phi ,{B}_{\mathcal{L}})=0,$ if ${B}_{\mathcal{L}}(\phi )=1.$
**L2.**- $L(\phi ,{B}_{\mathcal{L}})$ strictly increases as ${B}_{\mathcal{L}}(\phi )$ decreases from one towards zero.
**L3.**- $L(\phi ,{B}_{\mathcal{L}})$ only depends on ${B}_{\mathcal{L}}(\phi )$.

**L4.**- Losses are additive when the language is composed of independent sublanguages: if $\mathcal{L}={\mathcal{L}}_{1}\cup {\mathcal{L}}_{2}$ for ${\mathcal{L}}_{1}{\u2aeb}_{{B}_{\mathcal{L}}}{\mathcal{L}}_{2}$, then $L({\varphi}_{1}\wedge {\varphi}_{2},{B}_{\mathcal{L}})={L}_{1}({\varphi}_{1},{B}_{\downharpoonright {\mathcal{L}}_{1}})+{L}_{2}({\varphi}_{2},{B}_{\downharpoonright {\mathcal{L}}_{2}})$, where ${L}_{1},{L}_{2}$ are loss functions defined on ${\mathcal{L}}_{1},{\mathcal{L}}_{2}$, respectively.

**Theorem 4.**If a loss function, L, on $S\mathcal{L}\times \langle {\mathbb{B}}_{\mathcal{L}}\rangle $ satisfies L1–4, then $L(\phi ,{B}_{\mathcal{L}})=-klog{B}_{\mathcal{L}}(\phi )$, where the constant, $k>0$, does not depend on the language, $\mathcal{L}$.

**Proof:**We shall first focus on a loss function, L, defined with respect to a language, $\mathcal{L}$, that contains at least two propositional variables.

#### 3.3. Score, Entropy and Their Connection

**Definition 12 (g-score).**Given a loss function, $L,$ an inclusive weighting function, $g:\Pi \u27f6{\mathbb{R}}_{\ge 0}$, and a representation, $\rho \in \varrho $, we define the representation-relative g-score ${S}_{g,\rho}^{L}:{\mathbb{P}}_{\mathcal{L}}\times \langle {\mathbb{B}}_{\mathcal{L}}\rangle \u27f6[-\infty ,\infty ]$ by

**Lemma 6.**If ${B}_{\mathcal{L}}\in \langle {\mathbb{B}}_{\mathcal{L}}\rangle $ respects logical equivalence, then for all $\rho \in \varrho $, we have ${S}_{g,\mathcal{L}}^{log}({P}_{\mathcal{L}},{B}_{\mathcal{L}})={sup}_{\rho \in \varrho}{S}_{g,\rho}^{log}({P}_{\mathcal{L}},{B}_{\mathcal{L}})={S}_{g}^{log}(P,B).$

**Proof:**Simply note that ${S}_{g,\rho}^{log}({P}_{\mathcal{L}},{B}_{\mathcal{L}})$ does not depend on $\rho .$ ■

**Lemma 7.**For all convex ${\mathbb{E}}_{\mathcal{L}}\subseteq {\mathbb{P}}_{\mathcal{L}}$:

**Proof:**Suppose that:

**Theorem 5.**As usual, ${\mathbb{E}}_{\mathcal{L}}\subseteq {\mathbb{P}}_{\mathcal{L}}$ is taken to be convex and g inclusive. We have that:

**Proof:**As in the corresponding theorem for the proposition (Theorem 2), we shall prove a slightly stronger equality:

**Theorem 6.**If ${\mathbb{P}}_{\mathcal{L}}^{*}\subseteq {\mathbb{P}}_{\mathcal{L}}$ is such that the unique g-entropy maximiser, ${P}_{\mathcal{L}}^{\u2020}$, of $\left[{\mathbb{E}}_{\mathcal{L}}\right]=\left[\langle {\mathbb{P}}_{\mathcal{L}}^{*}\rangle \right]$, is in $\left[{\mathbb{P}}_{\mathcal{L}}^{*}\right],$ then:

**Proof:**Again, we shall prove a slightly stronger statement with ${B}_{\mathcal{L}}$ ranging in $\langle {\mathbb{B}}_{\mathcal{L}}\rangle .$

## 4. Relationship to Standard Entropy Maximisation

**Definition 13 (Symmetric weighting function).**A weighting function, g, is symmetric, if and only if whenever ${\pi}^{\prime}$ can be obtained from π by permuting the ${\omega}_{i}$ in $\pi ,$ then $g({\pi}^{\prime})=g(\pi ).$

**Definition 14 (Refined weighting function).**A weighting function, g, is refined, if and only if whenever ${\pi}^{\prime}$ refines π, then $g({\pi}^{\prime})\ge g(\pi ).$

**E1**:- $\Downarrow \mathbb{E}\ne \varnothing $. An agent is always entitled to hold some beliefs.
**E2**:- $\Downarrow \mathbb{E}\subseteq \mathbb{E}$. Sufficiently equivocal belief functions are calibrated with evidence.
**E3**:- For all $g\in \mathcal{G}$, there is some $\u03f5>{inf}_{B\in \mathbb{B}}{sup}_{P\in \mathbb{E}}{S}_{g}^{log}(P,B)$ such that if $R\in \mathbb{E}$ and ${sup}_{P\in \mathbb{E}}{S}_{g}(P,R)<\u03f5$, then $R\in \Downarrow \mathbb{E}.$, i.e., if R has sufficiently low worst-case g-expected loss for some appropriate g, then R is sufficiently equivocal.
**E4**:- $\Downarrow \Downarrow \mathbb{E}=\Downarrow \mathbb{E}$. Any function, from those that are calibrated with evidence, that is sufficiently equivocal, is a function, from those that are calibrated with evidence and are sufficiently equivocal, that is sufficiently equivocal.
**E5**:- If P is a limit point of $\Downarrow \mathbb{E}$ and $P\in \mathbb{E}$, then $P\in \Downarrow \mathbb{E}$.

**Theorem 7 (Justification of maxent).**If $\mathbb{E}$ contains its standard entropy maximiser, ${P}_{\Omega}^{\u2020}:=arg{sup}_{\mathbb{E}}{H}_{\Omega}$, then ${P}_{\Omega}^{\u2020}\in \Downarrow \mathbb{E}$.

**Proof:**We shall first see that there is a sequence of ${({g}_{t})}_{t\in \mathbb{N}}$ in $\mathcal{G}$ such that the ${g}_{t}$-entropy maximisers ${P}_{t}^{\u2020}\in \left[\mathbb{E}\right]$ converge to ${P}_{\Omega}^{\u2020}$. All respective entropy maximisers are unique, due to Corollary 2.

**Corollary 4.**For all $\u03f5>0$, there exists a $P\in \Downarrow \mathbb{E}$ such that $|P(\omega )-{P}_{\Omega}^{\u2020}(\omega )|<\u03f5$ for all $\omega \in \Omega .$

**Proof:**Consider the same sequence, ${g}_{t}$, as in the above proof. Recall that ${P}_{t}^{\u2020}$ converges to ${P}_{\Omega}^{\u2020}.$ Now, pick a t such that $|{P}_{t}^{\u2020}(\omega )-{P}_{\Omega}^{\u2020}(\omega )|<\frac{\u03f5}{2}$ for all $\omega \in \Omega .$ For this t, it holds that ${P}_{t,{\delta}_{t}}^{\u2021}\in \Downarrow \mathbb{E}$ for small enough ${\delta}_{t}$ and that ${P}_{t,{\delta}_{t}}^{\u2021}$ converges to ${P}_{t}^{\u2020}.$ Thus, for small enough ${\delta}_{t}$, we have $|{P}_{t}^{\u2020}(\omega )-{P}_{t,{\delta}_{t}}^{\u2021}(\omega )|<\frac{\u03f5}{2}$ for all $\omega \in \Omega .$ Thus, $|{P}_{t,{\delta}_{t}}^{\u2021}(\omega )-{P}_{\Omega}^{\u2020}(\omega )|<\u03f5$ for all $\omega \in \Omega .$ ■

**Definition 15 (Language invariant family of weighting functions).**Suppose we are given, as usual, a set $\mathbb{E}$ of probability functions on a fixed language $\mathcal{L}$. For any ${\mathcal{L}}^{\prime}$ extending $\mathcal{L}$, let ${\mathbb{E}}^{\prime}=\mathbb{E}\times {\mathbb{P}}_{{\mathcal{L}}^{\prime}\setminus \mathcal{L}}$ be the translation of $\mathbb{E}$ into the richer language ${\mathcal{L}}^{\prime}$. A family of weighting functions is language invariant, if for any such $\mathbb{E},\mathcal{L},$ any ${P}^{\u2020}\in arg{sup}_{P\in \mathbb{E}}{H}_{{g}^{\mathcal{L}}}(P)$ on $\mathcal{L}$, and for any language ${\mathcal{L}}^{\prime}$ extending $\mathcal{L}$, there is some ${P}^{\u2021}\in arg{sup}_{P\in {\mathbb{E}}^{\prime}}{H}_{{g}^{{\mathcal{L}}^{\prime}}}(P)$ on ${\mathcal{L}}^{\prime}$ such that ${P}_{\downharpoonright \mathcal{L}}^{\u2021}={P}^{\u2020}$, i.e., ${P}^{\u2021}(\omega )={P}^{\u2020}(\omega )$ for each state ω of $\mathcal{L}$.

**Proposition 8.**The family of partition weightings, ${g}_{\Pi}$, and the family of proposition weightings, ${g}_{\mathcal{P}\Omega}$, are not language invariant.

**Proof:**Let $\mathcal{L}=\{{A}_{1},{A}_{2}\}$ and $\mathbb{E}=\{P\in \mathbb{P}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P({\omega}_{1})+2P({\omega}_{2})+3P({\omega}_{3})+4P({\omega}_{4})=1.7\}.$ The partition entropy maximiser ${P}_{\Pi}^{\u2020}$ and the proposition entropy maximiser ${P}_{\mathcal{P}\Omega}^{\u2020}$ for this language and this set $\mathbb{E}$ of calibrated functions are given in the first two rows of the table below.

**Table 1.**Partition entropy and proposition entropy maximisers on $\mathcal{L}$ and ${\mathcal{L}}^{\prime}$.

${\mathbf{\omega}}_{\mathbf{1}}$ | ${\mathbf{\omega}}_{\mathbf{2}}$ | ${\mathbf{\omega}}_{\mathbf{3}}$ | ${\mathbf{\omega}}_{\mathbf{4}}$ | |||||

${P}_{\Pi}^{\u2020}$ | $0.5331$ | $0.2841$ | $0.1324$ | $0.0504$ | ||||

${P}_{\mathcal{P}\Omega}^{\u2020}$ | $0.5192$ | $0.3008$ | $0.1408$ | $0.0392$ | ||||

${\mathbf{\chi}}_{\mathbf{1}}$ | ${\mathbf{\chi}}_{\mathbf{2}}$ | ${\mathbf{\chi}}_{\mathbf{3}}$ | ${\mathbf{\chi}}_{\mathbf{4}}$ | ${\mathbf{\chi}}_{\mathbf{5}}$ | ${\mathbf{\chi}}_{\mathbf{6}}$ | ${\mathbf{\chi}}_{\mathbf{7}}$ | ${\mathbf{\chi}}_{\mathbf{8}}$ | |

${P}_{\Pi}^{\u2021}$ | $0.2649$ | $0.2649$ | $0.1441$ | $0.1441$ | $0.0671$ | $0.0671$ | $0.0239$ | $0.0239$ |

${P}_{\mathcal{P}\Omega}^{\u2021}$ | $0.2510$ | $0.2510$ | $0.1594$ | $0.1594$ | $0.0783$ | $0.0783$ | $0.0113$ | $0.0113$ |

**Lemma 8.**Suppose a function f picks out a partition π for any language $\mathcal{L}$, in such a way that if ${\mathcal{L}}^{\prime}\supseteq \mathcal{L}$, then $f({\mathcal{L}}^{\prime})$ is a refinement of $f(\mathcal{L})$, with each $F\in f(\mathcal{L})$ being refined into the same number k of members ${F}_{1},\dots ,{F}_{k}\in f({\mathcal{L}}^{\prime})$, for $k\ge 1$. Suppose ${g}^{\mathcal{L}}$ is such that for any $\mathcal{L}$, ${g}^{\mathcal{L}}(f(\mathcal{L}))=c>0$, but ${g}^{\mathcal{L}}(\pi )=0$ for all other partitions π. Then, ${g}^{\mathcal{L}}$ is language invariant.

**Proof:**Let ${P}^{\u2020}$ denote a ${g}^{\mathcal{L}}$-entropy maximiser (in $\left[\mathbb{E}\right]$), and let ${P}^{\u2021}$ denote a ${g}^{{\mathcal{L}}^{\prime}}$-entropy maximiser in $\left[\mathbb{E}\right]\times {\mathbb{P}}_{{\mathcal{L}}^{\prime}\setminus \mathcal{L}}$. Since ${g}^{\mathcal{L}}$ and ${g}^{{\mathcal{L}}^{\prime}}$ need not be inclusive, ${H}_{g,\mathcal{L}}$ and ${H}_{g,{\mathcal{L}}^{\prime}}$ need not be strictly concave. Thus, there need not be unique entropy maximisers. Given $F\subseteq \Omega $ refined into subsets ${F}_{1},\dots ,{F}_{k}$ of ${\Omega}^{\prime}$, ${F}^{\prime}\subseteq {\Omega}^{\prime}$ is defined by ${F}^{\prime}:={F}_{1}\cup \dots \cup {F}_{k}$. One can restrict ${P}^{\u2021}$ to $\mathcal{L}$ by setting ${P}^{\u2021}(\omega )={\sum}_{{\omega}^{\prime}\in {\Omega}^{\prime},{\omega}^{\prime}\vDash \omega}{P}^{\u2021}({\omega}^{\prime})$ for $\omega \in \Omega $, so, in particular, ${P}^{\u2021}(F)={P}^{\u2021}({F}^{\prime})={P}^{\u2021}({F}_{1})+\dots +{P}^{\u2021}({F}_{k})$ for $F\in \Omega $.

**Corollary 5.**The family of weighting functions ${g}_{\Omega}$ is language invariant.

**Example 2.**For $\mathcal{L}=\{{A}_{1},{A}_{2}\}$, there are three sublanguages: $\mathcal{L}$ itself and the two proper sublanguages, $\{{A}_{1}\},\{{A}_{2}\}.$ Then, ${g}_{\subseteq}^{\mathcal{L}}$ assigns the following three partitions of Ω the same positive weight: $\{\{{A}_{1}\wedge {A}_{2},{A}_{1}\wedge \neg {A}_{2}\},\{\neg {A}_{1}\wedge {A}_{2},\neg {A}_{1}\wedge \neg {A}_{2}\}\}$, $\{\{{A}_{1}\wedge {A}_{2},\neg {A}_{1}\wedge {A}_{2}\},\{{A}_{1}\wedge \neg {A}_{2},\neg {A}_{1}\wedge \neg {A}_{2}\}\}$, $\{\{{A}_{1}\wedge {A}_{2}\},\{{A}_{1}\wedge \neg {A}_{2}\},\{\neg {A}_{1}\wedge {A}_{2}\},\{\neg {A}_{1}\wedge \neg {A}_{2}\}\}$. ${g}_{\subseteq}^{\mathcal{L}}$ assigns all other $\pi \in \Pi $ weight zero.

**Proposition 9.**The family of substate weighting functions is language invariant.

**Proof:**Consider an extension, ${\mathcal{L}}^{\prime}=\{{A}_{1},\dots ,{A}_{n},{A}_{n+1}\}$, of $\mathcal{L}$. Let ${P}^{\u2020},{P}^{\u2021}$ be ${g}_{\subseteq}$-entropy maximisers on $\mathcal{L},{\mathcal{L}}^{\prime}$, respectively. For simplicity of exposition, we shall view these functions as defined over sentences, so that we can talk of ${P}^{\u2021}({A}_{n+1}\wedge {\omega}^{-})$, etc. For the purposes of the following calculation we shall consider the empty language to be a language. Entropies over the empty language vanish. Summing over the empty language ensures, for example, that the expression ${P}^{\u2021}({A}_{n+1})log{P}^{\u2021}({A}_{n+1})$ appears in Equation (81).

**Example 3.**For $\mathcal{L}=\{{A}_{1},{A}_{2}\}$ and the substate weighting function, ${g}_{\subseteq}^{\mathcal{L}}$ on $\mathcal{L}$ (see Example 2), we find for $\mathbb{E}=\{P\in \mathbb{P}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P({A}_{1}\wedge {A}_{2})+2P({A}_{1}\wedge \neg {A}_{2})=0.1\}$ that the standard entropy maximiser, the partition entropy maximiser, the proposition entropy maximiser and the substate weighting entropy maximiser are pairwise different.

${A}_{1}\wedge {A}_{2}$ | ${A}_{1}\wedge \neg {A}_{2}$ | $\neg {A}_{1}\wedge {A}_{2}$ | $\neg {A}_{1}\wedge \neg {A}_{2}$ | |
---|---|---|---|---|

${P}_{\Omega}^{\u2020}$ | $0.0752$ | $0.0124$ | $0.4562$ | $0.4562$ |

${P}_{\Pi}^{\u2020}$ | $0.0856$ | $0.0072$ | $0.4536$ | $0.4536$ |

${P}_{\mathcal{P}\Omega}^{\u2020}$ | $0.0950$ | $0.0025$ | $0.4513$ | $0.4513$ |

${P}_{{g}_{\subseteq}^{\mathcal{L}}}^{\u2020}$ | $0.0950$ | $0.0025$ | $0.4293$ | $0.4732$ |

## 5. Discussion

#### 5.1. Summary

#### 5.2. Conditionalisation, Conditional Probabilities and Bayes’ Theorem

**Theorem 8.**Suppose that $\mathbb{E}$ is the set of probability functions calibrated with evidence E, and that $\mathbb{E}$ can be written as the set of probability functions which satisfy finitely many constraints of the form, ${c}_{i}={\sum}_{\omega \in \Omega}{d}_{i,\omega}P(\omega ).$ Suppose ${\mathbb{E}}^{\prime}$ is the set of probability functions calibrated with evidence $E\cup \{G\}$, and that ${P}_{E}^{\u2020},{P}_{E\cup \{G\}}^{\u2020}$ are functions in $\mathbb{E},{\mathbb{E}}^{\prime}$, respectively, that maximise standard entropy. If:

- (i)
- $G\subseteq \Omega $,
- (ii)
- the only constraints imposed by $E\cup \{G\}$ are the constraints ${c}_{i}={\sum}_{\omega \in \Omega}{d}_{i,\omega}P(\omega )$ imposed by E together with the constraint $P(G)=1$,
- (iii)
- the constraints in (ii) are consistent, and
- (iv)
- ${P}_{E}^{\u2020}(\xb7|G)\in \mathbb{E}$,then ${P}_{E\cup \{G\}}^{\u2020}(F)={P}_{E}^{\u2020}(F|G)$ for all $F\subseteq \Omega $.

**Theorem 9.**Suppose that convex and closed $\mathbb{E}$ is the set of probability functions calibrated with evidence E, and ${\mathbb{E}}^{\prime}$ is the set of probability functions calibrated with evidence $E\cup \{G\}$. Furthermore, suppose that ${P}_{E}^{\u2020},{P}_{E\cup \{G\}}^{\u2020}$ are functions in $\mathbb{E},{\mathbb{E}}^{\prime}$, respectively, that maximise g-entropy for some fixed $g\in {\mathcal{G}}_{\mathrm{inc}}\cup \{{g}_{\Omega}\}.$ If:

- (i)
- $G\subseteq \Omega $,
- (ii)
- the only constraints imposed by $E\cup \{G\}$ are the constraints imposed by E together with the constraint $P(G)=1$,
- (iii)
- the constraints in (ii) are consistent, and
- (iv)
- ${P}_{E}^{\u2020}(\xb7|G)\in \mathbb{E}$,then ${P}_{E\cup \{G\}}^{\u2020}(F)={P}_{E}^{\u2020}(F|G)$ for all $F\subseteq \Omega $.

#### 5.3. Imprecise Probability

(We are very grateful to an anonymous referee for pointing out that Smets and Kennes adopt this sort of position, and to Hykel Hosni for alerting us to this view of Keynes.)the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealth-owners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know. Nevertheless, the necessity for action and for decision compels us as practical men to do our best to overlook this awkward fact and to behave exactly as we should if we had behind us a good Benthamite calculation of a series of prospective advantages and disadvantages, each multiplied by its appropriate probability, waiting to be summed.(p. 214 in [26])

#### 5.4. A Non-Pragmatic Justification

#### 5.5. Questions for Further Research

## Acknowledgements

## Conflicts of Interest

## Appendix

## A. Entropy of Belief Functions

**Proposition 10 (First characterisation).**Let $H(B)={\sum}_{\pi \in \Pi}g(\pi )f(\pi ,B)$, where $f(\pi ,B):=h(B({F}_{1}),...,B({F}_{k}))$ for $\pi =\{{F}_{1},...,{F}_{k}\}$ and:

**H1:**- h is continuous;
**H2:**- if $1\le {t}_{1}<{t}_{2}\in \mathbb{N}$ then $h(\frac{1}{{t}_{1}}@{t}_{1})<h(\frac{1}{{t}_{2}}@{t}_{2})$;
**H3:**- if $0<|\overrightarrow{x}{|}_{1}\le 1$ and if $|\overrightarrow{{y}_{i}}{|}_{1}=1$ for $1\le i\le k,$ then$$\begin{array}{c}\hfill h({x}_{1}\xb7\overrightarrow{{y}_{1}},\cdots ,{x}_{k}\xb7\overrightarrow{{y}_{k}})=h({x}_{1},\cdots ,{x}_{k})+\sum _{i=1}^{k}{x}_{i}h(\overrightarrow{{y}_{i}})\end{array}$$
**H4:**- $qh(\frac{1}{t})=h(\frac{1}{t}@q)$ for $1\le q\le t\in \mathbb{N}$;

**Proof:**We first apply the proof of Paris [21] (pp. 77–78), which implies (using only H1, H2 and H3) that:

**A**: If $|\overrightarrow{x}{|}_{1}=1$ and if $|\overrightarrow{{y}_{i}}{|}_{1}=1$ for $1\le i\le k,$ then

**B**: If $0<x<1$ and if $|\overrightarrow{y}{|}_{1}=1,$ then

**Proposition 11 (Second characterisation).**Let $H(B)={\sum}_{\pi \in \Pi}g(\pi )f(\pi ,B)$, where $f(\pi ,B):=h(B({F}_{1}),...,B({F}_{k}))$ for $\pi =\{{F}_{1},...,{F}_{k}\}$ and:

**H1:**- h is continuous;
**H2:**- if $1\le {t}_{1}<{t}_{2}\in \mathbb{N}$ then $h(\frac{1}{{t}_{1}}@{t}_{1})<h(\frac{1}{{t}_{2}}@{t}_{2})$;
**A:**- if $|\overrightarrow{x}{|}_{1}=1$ and if $|\overrightarrow{{y}_{i}}{|}_{1}=1$ for $1\le i\le k,$ then$$\begin{array}{c}\hfill h({x}_{1}\xb7\overrightarrow{{y}_{1}},\cdots ,{x}_{k}\xb7\overrightarrow{{y}_{k}})=h({x}_{1},\cdots ,{x}_{k})+\sum _{i=1}^{k}{x}_{i}h(\overrightarrow{{y}_{i}})\end{array}$$
**B:**- : if $0<x<1$ and if $|\overrightarrow{y}{|}_{1}=1,$ then$$\begin{array}{c}\hfill h(x\xb7\overrightarrow{y})=h(x)+xh(\overrightarrow{y})\end{array}$$
**C:**- for $0<x,y<1,$ it holds that $h(x\xb7y)=xh(y)+yh(x)$;
**D:**- for $0<x<1,$ it holds that $h(x)=h(x,1-x)-h(1-x)$;

**Proof:**We shall again invoke the proof in Paris [21] (pp. 77–78) to show (using only H1, H2 and A) that:

**H5**: if $0<x<1$ and if $|\overrightarrow{y}{|}_{1}\le 1,$ then

## B. Properties of g-Entropy Maximisation

#### B.1. Preserving the Equivocator

**Definition 16 (Equivocator-preserving).**A weighting function g is called equivocator-preserving, if and only if $arg{sup}_{P\in \mathbb{P}}{H}_{g}(P)={P}_{=}.$

**Lemma 9.**For inclusive $g,$ g is equivocator-preserving if and only if:

**Proof:**Recall from Proposition 2 that g-entropy is strictly concave on $\mathbb{P}.$ Thus, every critical point in the interior of $\mathbb{P}$ is the unique maximiser of ${H}_{g}(\xb7)$ on $\mathbb{P}.$

**Corollary 6.**If g is symmetric and inclusive, then it is equivocator-preserving.

**Proof:**By Lemma 9, we only need to show that:

#### B.2. Updating

**Proposition 12.**Suppose that $\mathbb{E}$ is the set of probability functions calibrated with evidence E. Let g be inclusive and $G\subseteq \Omega $ such that ${\mathbb{E}}^{\prime}=\{P\in \mathbb{E}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P(G)=1\}\ne \varnothing $, where ${\mathbb{E}}^{\prime}$ is the set of probability functions calibrated with evidence $E\cup \{G\}$. Then, the following are equivalent:

- ${P}_{E}^{\u2020}(\xb7|G)\in \left[\mathbb{E}\right]$
- ${P}_{E\cup \{G\}}^{\u2020}(\xb7)={P}_{E}^{\u2020}(\xb7|G)$,

**Proof:**First, suppose that ${P}_{E}^{\u2020}(\xb7|G)\in \left[\mathbb{E}\right].$

**Theorem 9.**Suppose that convex and closed $\mathbb{E}$ is the set of probability functions calibrated with evidence E, and ${\mathbb{E}}^{\prime}$ is the set of probability functions calibrated with evidence $E\cup \{G\}$. Furthermore, suppose that ${P}_{E}^{\u2020},{P}_{E\cup \{G\}}^{\u2020}$ are functions in $\mathbb{E},{\mathbb{E}}^{\prime}$, respectively, that maximise g-entropy for some fixed $g\in {\mathcal{G}}_{\mathrm{inc}}\cup \{{g}_{\Omega}\}.$ If:

- (i)
- $G\subseteq \Omega $,
- (ii)
- the only constraints imposed by $E\cup \{G\}$ are the constraints imposed by E together with the constraint $P(G)=1$,
- (iii)
- the constraints in (ii) are consistent, and
- (iv)
- ${P}_{E}^{\u2020}(\xb7|G)\in \mathbb{E}$,

**Proof:**For $g\in {\mathcal{G}}_{\mathrm{inc}}$, this follows directly from Proposition 12. Simply note that $\mathbb{E}=\left[\mathbb{E}\right]$ and, thus, ${P}_{E}^{\u2020}(\xb7|G)\in \left[\mathbb{E}\right]$.

#### B.3. Paris-Vencovská Properties

**Definition 17 (1: Equivalence).**${P}^{\u2020}$ only depends on $\mathbb{E}$ and not on the constraints that give rise to $\mathbb{E}.$

**Definition 18 (2: Renaming).**Let $per$ be an element of the permutation group on $\{1,\cdots ,|\Omega |\}.$ For a proposition $F\subseteq \Omega $ with $F=\{{\omega}_{{i}_{1}},\dots ,{\omega}_{{i}_{k}}\}$, define $per(F)=\{{\omega}_{per({i}_{1})},\dots ,{\omega}_{per({i}_{k})}\}.$ Next, let $per(B(F))=B(per(F))$ and $per(\mathbb{E})=\{per(P)\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P\in \mathbb{E}\}.$ Then, g satisfies renaming if and only if ${P}_{\mathbb{E}}^{\u2020}(F)={P}_{per(\mathbb{E})}^{\u2020}(per(F)).$

**Proposition 13.**If g is inclusive and symmetric, then g satisfies renaming.

**Proof:**For $\pi \in \Pi $ with $\pi =\{{F}_{{i}_{1}},\cdots ,{F}_{{i}_{f}}\}$, define $per(\pi )=\{per({F}_{{i}_{1}}),\cdots ,per({F}_{{i}_{f}})\}.$ Using that g is symmetric for the second equality, we find:

**Definition 19 (Symmetric complement).**For $P\in \mathbb{P}$, define the symmetric complement of P with respect to ${A}_{i},$ denoted by ${\sigma}_{i}(P),$ as follows:

**Corollary 7.**For all symmetric and inclusive g and all $\mathbb{E}$ that are symmetric with respect to ${A}_{i}$, it holds that:

**Proof:**Since g is symmetric and inclusive, there is some function $\gamma :\mathbb{N}\to {\mathbb{R}}_{>0}$, such that ${H}_{g}(P)={\sum}_{F\subseteq \Omega}-\gamma (|F|)P(F)logP(F)$ for all $P\in \mathbb{P}.$ Hence:

**Definition 20 (3: Irrelevance).**Let ${\mathbb{P}}_{1},{\mathbb{P}}_{2}$ be the sets of probability functions on disjoint ${\mathcal{L}}_{1},{\mathcal{L}}_{2}$, respectively. Then irrelevance holds if, for ${\mathbb{E}}_{1}\subseteq {\mathbb{P}}_{1}$ and ${\mathbb{E}}_{2}\subseteq {\mathbb{P}}_{2}$, we have that ${P}_{{\mathbb{E}}_{1}}^{\u2020}(F\times {\Omega}_{2})={P}_{{\mathbb{E}}_{1}\times {\mathbb{E}}_{2}}^{\u2020}(F\times {\Omega}_{2})$ for all propositions F of ${\mathcal{L}}_{1},$ where ${P}_{{\mathbb{E}}_{1}}^{\u2020},\phantom{\rule{0.166667em}{0ex}}{P}_{{\mathbb{E}}_{1}\times {\mathbb{E}}_{2}}^{\u2020}$ are the g-entropy maximisers on ${\mathcal{L}}_{1}\cup {\mathcal{L}}_{2}$ with respect to ${\mathbb{E}}_{1}\times {\mathbb{P}}_{2},$ respectively, ${\mathbb{E}}_{1}\times {\mathbb{E}}_{2}.$

**Proposition 14.**Neither the partition nor the proposition weighting satisfy irrelevance.

**Proof:**Let ${\mathcal{L}}_{1}=\{{A}_{1},{A}_{2}\},$ ${\mathcal{L}}_{2}=\{{A}_{3}\},$ ${\mathbb{E}}_{1}=\{P\in {\mathbb{P}}_{1}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P({A}_{1}\wedge {A}_{2})+2P(\neg {A}_{1}\wedge \neg {A}_{2})=0.2\}$ and ${\mathbb{E}}_{2}=\{P\in {\mathbb{P}}_{2}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P({A}_{3})=0.1\}.$ Then, with ${\omega}_{1}=\neg {A}_{1}\wedge \neg {A}_{2}\wedge \neg {A}_{3},$ ${\omega}_{2}=\neg {A}_{1}\wedge \neg {A}_{2}\wedge {A}_{3}$ and so on, we find:

${\omega}_{1}$ | ${\omega}_{2}$ | ${\omega}_{3}$ | ${\omega}_{4}$ | ${\omega}_{5}$ | ${\omega}_{6}$ | ${\omega}_{7}$ | ${\omega}_{8}$ | |
---|---|---|---|---|---|---|---|---|

${P}_{\Pi ,{\mathbb{E}}_{1}}^{\u2020}$ | $0.0142$ | $0.0142$ | $0.2071$ | $0.2071$ | $0.2071$ | $0.2071$ | $0.0715$ | $0.0715$ |

${P}_{\Pi ,{\mathbb{E}}_{1}\times {\mathbb{E}}_{2}}^{\u2020}$ | $0.0312$ | $0.0004$ | $0.3692$ | $0.0466$ | $0.3692$ | $0.0466$ | $0.1304$ | $0.0064$ |

${P}_{\mathcal{P}\Omega ,{\mathbb{E}}_{1}}^{\u2020}$ | $0.0050$ | $0.0050$ | $0.2025$ | $0.2025$ | $0.2025$ | $0.2025$ | $0.0901$ | $0.0901$ |

${P}_{\mathcal{P}\Omega ,{\mathbb{E}}_{1}\times {\mathbb{E}}_{2}}^{\u2020}$ | $0.0211$ | $6.2\times {10}^{-9}$ | $0.3606$ | $0.0500$ | $0.3606$ | $0.0500$ | $0.1577$ | $2.3\times {10}^{-6}$ |

**Definition 21 (4: Relativisation).**Let $\varnothing \subset F\subset \Omega $, $\mathbb{E}=\{P\in \mathbb{P}\phantom{\rule{0.166667em}{0ex}}:\phantom{\rule{0.166667em}{0ex}}P(F)=z\}\cap {\mathbb{E}}_{1}\cap {\mathbb{E}}_{2}$ and ${\mathbb{E}}^{\prime}=\{P\in \mathbb{P}\phantom{\rule{0.166667em}{0ex}}:\phantom{\rule{0.166667em}{0ex}}P(F)=z\}\cap {\mathbb{E}}_{1}\cap {\mathbb{E}}_{2}^{\prime}$, where ${\mathbb{E}}_{1}$ is determined by a set of constraints on the $P(G)$ with $G\subseteq F$, and the ${\mathbb{E}}_{2},{\mathbb{E}}_{2}^{\prime}$ are determined by a set of constraints on the $P(G)$ with $G\subseteq \overline{F}.$ Then, ${P}_{\mathbb{E}}^{\u2020}(G)={P}_{{\mathbb{E}}^{\prime}}^{\u2020}(G)$ for all $G\subseteq F.$

**Proposition 15.**Neither the partition not the proposition weighting satisfy relativisation.

**Proof:**Let $|\Omega |=8,$ $F=\{{\omega}_{1},{\omega}_{2},{\omega}_{3},{\omega}_{4},{\omega}_{5}\},$ $P(F)=0.5$, and put ${\mathbb{E}}_{1}=\{P\in \mathbb{P}\phantom{\rule{0.166667em}{0ex}}:\phantom{\rule{0.166667em}{0ex}}P({\omega}_{1})+2P({\omega}_{2})+3P({\omega}_{3})+4P({\omega}_{4})=0.2\},{\mathbb{E}}_{2}=\mathbb{P},\phantom{\rule{0.166667em}{0ex}}{\mathbb{E}}_{2}^{\prime}=\{P\in \mathbb{P}\phantom{\rule{0.166667em}{0ex}}:\phantom{\rule{0.166667em}{0ex}}P({\omega}_{6})+2P({\omega}_{7})+3P({\omega}_{8})=0.7\}.$ Then, ${P}_{\Pi ,\mathbb{E}}^{\u2020}$ and ${P}_{\Pi ,{\mathbb{E}}^{\prime}}^{\u2020}$ differ substantially on three out of five $\omega \in F$, as do ${P}_{\mathcal{P}\Omega ,\mathbb{E}}^{\u2020}$ and ${P}_{\mathcal{P}\Omega ,{\mathbb{E}}^{\prime}}^{\u2020}$, as can be seen from the following table:

${\omega}_{1}$ | ${\omega}_{2}$ | ${\omega}_{3}$ | ${\omega}_{4}$ | ${\omega}_{5}$ | ${\omega}_{6}$ | ${\omega}_{7}$ | ${\omega}_{8}$ | |
---|---|---|---|---|---|---|---|---|

${P}_{\Pi ,\mathbb{E}}^{\u2020}$ | $0.1251$ | $0.0308$ | $0.0041$ | $0.0003$ | $0.3398$ | $0.1667$ | $0.1667$ | $0.1667$ |

${P}_{\Pi ,{\mathbb{E}}^{\prime}}^{\u2020}$ | $0.1242$ | $0.0312$ | $0.0041$ | $0.0003$ | $0.3402$ | $0.3356$ | $0.1288$ | $0.0356$ |

${P}_{\mathcal{P}\Omega ,\mathbb{E}}^{\u2020}$ | $0.1523$ | $0.0239$ | $5.5\times {10}^{-7}$ | $6.8\times {10}^{-9}$ | $0.3239$ | $0.1667$ | $0.1667$ | $0.1667$ |

${P}_{\mathcal{P}\Omega ,{\mathbb{E}}^{\prime}}^{\u2020}$ | $0.1495$ | $0.0252$ | $7.0\times {10}^{-7}$ | $7.6\times {10}^{-9}$ | $0.3252$ | $0.3252$ | $0.1495$ | $0.0252$ |

**Definition 22 (5: Obstinacy).**If ${\mathbb{E}}_{1}$ is a subset of $\mathbb{E}$ such that ${P}_{\mathbb{E}}^{\u2020}\in \left[{\mathbb{E}}_{1}\right],$ then ${P}_{\mathbb{E}}^{\u2020}={P}_{{\mathbb{E}}_{1}}^{\u2020}.$

**Proposition 16.**If g is inclusive, then it satisfies the obstinacy principle.

**Proof:**This follows directly from the definition of ${P}_{\mathbb{E}}^{\u2020}$. ■

**Definition 23 (6: Independence).**If $\mathbb{E}=\{P\in \mathbb{P}\phantom{\rule{0.277778em}{0ex}}|\phantom{\rule{0.277778em}{0ex}}P({A}_{1}\wedge {A}_{3})=\alpha ,\phantom{\rule{0.277778em}{0ex}}P({A}_{2}\wedge {A}_{3})=\beta ,\phantom{\rule{0.277778em}{0ex}}P({A}_{3})=\gamma \},$ then for $\gamma >0$, it holds that ${P}^{\u2020}({A}_{1}\wedge {A}_{2}\wedge {A}_{3})=\frac{\alpha \beta}{\gamma}.$

**Proposition 17.**Neither the partition entropy nor the proposition weighting satisfy independence.

**Proof:**Let $\mathcal{L}=\{{A}_{1},{A}_{2},{A}_{3}\},$ $\alpha =0.2,\phantom{\rule{0.166667em}{0ex}}\beta =0.35,\phantom{\rule{0.166667em}{0ex}}\gamma =0.6,$ then:

**Definition 24 (7: Open-mindedness).**A weighting function g is open-minded, if and only if for all $\mathbb{E}$ and all $\varnothing \subseteq F\subseteq \Omega $, it holds that ${P}^{\u2020}(F)=0$ if and only if $P(F)=0$ for all $P\in \mathbb{E}.$

**Proposition 18.**Any inclusive g is open-minded.

**Proof:**First, observe that $P(F)=0$ for all $P\in \mathbb{E},$ if and only if $P(F)=0$ for all $P\in \left[\mathbb{E}\right].$

**Definition 25 (8: Continuity).**Let us recall the definition of the Blaschke metric, Δ, between two convex sets, $\mathbb{E},{\mathbb{E}}_{1}\subseteq \mathbb{P}$:

**Proposition 19.**Any inclusive g satisfies the continuity property.

**Proof:**Since the g-entropy is strictly concave (see Proposition 2), we may apply Theorem 7.5 on p. 91 in [21]. Thus, if $\mathbb{E}$ is determined by finitely many linear constraints, then g satisfies continuity. Paris [21] credits I. Maung for the proof of the theorem.

#### B.4. The Topology of g-Entropy

**Proposition 20 (g-entropy is maximised at the boundary).**For inclusive and symmetric g, $\overline{{P}_{=}{P}^{\u2020}}\cap \left[\mathbb{E}\right]=\{{P}^{\u2020}\}.$

**Proof:**If ${P}_{=}\in \left[\mathbb{E}\right],$ then ${P}^{\u2020}={P}_{=},$ by Corollary 6.

**Proposition 21 (Continuity of g-entropy maximisation).**For all $\mathbb{E},$ the function:

**Proof:**Consider a sequence, ${({g}_{t})}_{t\in \mathbb{N}}\subseteq \mathcal{G}$, converging to some $g\in \mathcal{G}.$ We need to show that ${P}_{{g}_{t}}^{\u2020}$ converges to ${P}_{g}^{\u2020}.$

**Proposition 22.**For any $\mathbb{E}$, if $\mathcal{G}\subseteq {\mathcal{G}}_{\mathrm{inc}}$ is path-connected, then the set $\{{P}_{g}^{\u2020}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}g\in \mathcal{G}\}$ is path-connected.

**Proof:**By Proposition 21, the map $arg{sup}_{P\in \mathbb{E}}{H}_{(\xb7)}(P)$ is continuous. The image of a path-connected set under a continuous map is path-connected. ■

**Corollary 8.**For all $\mathbb{E}$, the sets $\{{P}_{g}^{\u2020}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}g\in {\mathcal{G}}_{\mathrm{inc}}\}$ and $\{{P}_{g}^{\u2020}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}g\in {\mathcal{G}}_{0}\}$ are path-connected.

**Proof:**${\mathcal{G}}_{\mathrm{inc}}$ and ${\mathcal{G}}_{0}$ are convex; thus, they are path-connected. Now, apply Proposition 22. ■

**Proposition 23.**For a convex combination of weighting functions, $g=\lambda {g}_{1}+(1-\lambda ){g}_{2}$, in general, it fails to hold that ${P}_{g}^{\u2020}=\lambda {P}_{{g}_{1}}^{\u2020}+(1-\lambda ){P}_{{g}_{2}}^{\u2020}.$ Moreover, in general, ${P}_{g}^{\u2020}\notin \overline{{P}_{{g}_{1}}^{\u2020}\phantom{\rule{0.277778em}{0ex}}{P}_{{g}_{2}}^{\u2020}}.$

**Proof:**Let ${g}_{1}={g}_{\Pi},\phantom{\rule{0.277778em}{0ex}}{g}_{2}={g}_{\mathcal{P}\Omega}$ and $\lambda =0.3.$ Then, for a language $\mathcal{L}$ with two propositional variables and $\mathbb{E}=\{P\in \mathbb{P}\phantom{\rule{0.277778em}{0ex}}:\phantom{\rule{0.277778em}{0ex}}P({\omega}_{1})+2P({\omega}_{2})+3P({\omega}_{3})+4P({\omega}_{4})=1.7\}$, we can see from the following table that ${P}_{0.3{g}_{\Pi}+0.7{g}_{\mathcal{P}\Omega}}^{\u2020}\ne 0.3{P}_{\Pi}^{\u2020}+0.7{P}_{\mathcal{P}\Omega}^{\u2020}.$

${\omega}_{1}$ | ${\omega}_{2}$ | ${\omega}_{3}$ | ${\omega}_{4}$ | |
---|---|---|---|---|

${P}_{\Pi}^{\u2020}$ | $0.5331$ | $0.2841$ | $0.1324$ | $0.0504$ |

${P}_{\mathcal{P}\Omega}^{\u2020}$ | $0.5192$ | $0.3008$ | $0.1408$ | $0.0392$ |

$0.3{P}_{\Pi}^{\u2020}+0.7{P}_{\mathcal{P}\Omega}^{\u2020}$ | $0.5234$ | $0.2958$ | $0.1383$ | $0.0426$ |

${P}_{0.3{g}_{\Pi}+0.7{g}_{\mathcal{P}\Omega}}^{\u2020}$ | $0.5272$ | $0.2915$ | $0.1353$ | $0.0459$ |

$\frac{{P}_{\mathcal{P}\Omega}^{\u2020}-{P}_{0.3{g}_{\Pi}+0.7{g}_{\mathcal{P}\Omega}}^{\u2020}}{{P}_{\mathcal{P}\Omega}^{\u2020}-{P}_{\Pi}^{\u2020}}$ | $0.5755$ | $0.5569$ | $0.6429$ | $0.6036$ |

## C. Level of Generalisation

**Definition 26 (γ-entropy).**Given a function $\gamma :\mathcal{P}\Omega \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, the γ-entropy of a normalised belief function is defined as:

**Definition 27 (γ-score).**Given a loss function, L, and a function $\gamma :\mathcal{P}\Omega \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, the γ-expected loss function or γ-scoring rule, or simply γ-score, is ${S}_{\gamma}^{L}:\mathbb{P}\times \langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$ such that ${S}_{\gamma}^{L}(P,B)={\sum}_{F\subseteq \Omega}\gamma (F)P(F)L(F,B)$.

**Definition 28 (Equivalent to a weighting of partitions).**A weighting of propositions $\gamma :\mathcal{P}\Omega \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$ is equivalent to a weighting of partitions, if there exists a function $g:\Pi \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$, such that for all $F\subseteq \Omega $:

**Definition 29 (Inclusive weighting of propositions).**A weighting of propositions $\gamma :\mathcal{P}\Omega \u27f6{\mathbb{R}}_{\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0}$ is inclusive if $\gamma (F)>0$ for all $F\subseteq \Omega $.

**Definition 30 (Strictly $\mathbb{X}$-proper γ-score).**For $P\subseteq \mathbb{X}\subseteq \langle \mathbb{B}\rangle $, a γ-score ${S}_{\gamma}^{L}:\mathbb{P}\times \langle \mathbb{B}\rangle \u27f6[-\infty ,\infty ]$ is strictly $\mathbb{X}$-proper, if for all $P\in \mathbb{P}$, the restricted function ${S}_{\gamma}^{L}(P,\xb7):\mathbb{X}\u27f6[-\infty ,\infty ]$ has a unique global minimum at $B=P$. A γ-score is strictly proper if it is strictly $\langle \mathbb{B}\rangle $-proper. A γ-score is merely $\mathbb{X}$-proper if for some P, this minimum at $B=P$ is not the only minimum.

**Proposition 24.**The logarithmic γ-score ${S}_{\gamma}^{log}(P,B)$ is non-negative and convex as a function of $B\in \langle \mathbb{B}\rangle $. For inclusive γ, convexity is strict, i.e., ${S}_{\gamma}^{log}(P,\lambda {B}_{1}+(1-\lambda ){B}_{2})<\lambda {S}_{\gamma}^{log}(P,{B}_{1})+(1-\lambda ){S}_{\gamma}^{log}(P,{B}_{2})$ for $\lambda \in (0,1)$, unless ${B}_{1}$ and ${B}_{2}$ agree everywhere except where $P(F)=0$.

**Proof:**The logarithmic γ-score is non-negative because $B(F),P(F)\in [0,1]$ for all F, so $logB(F)\le 0,\gamma (F)P(F)\phantom{\rule{-0.166667em}{0ex}}\ge \phantom{\rule{-0.166667em}{0ex}}0$ and $\gamma (F)P(F)logB(F)\phantom{\rule{-0.166667em}{0ex}}\le \phantom{\rule{-0.166667em}{0ex}}0$.

**Corollary 9.**For inclusive γ and fixed $P\in \mathbb{P}$, $arg{inf}_{B\in \langle \mathbb{B}\rangle}{S}_{\gamma}^{log}(P,B)$ is unique. For ${B}^{\prime}:=arg{inf}_{B\in \langle \mathbb{B}\rangle}{S}_{\gamma}^{log}(P,B)$ and for all $F\subseteq \Omega $, we have ${B}^{\prime}(F)>0$ if and only if $P(F)>0.$ Moreover, ${B}^{\prime}(\Omega )=1$ and ${B}^{\prime}\in \mathbb{B}.$

**Proof:**First of all, suppose that there is an $F\subseteq \Omega $ such that $P(F)>0$ and $B(F)=0.$ Then, ${S}_{\gamma}^{log}(P,B)=\infty .$ Furthermore, ${S}_{\gamma}^{log}(P,P)<\infty $ for all $P\in \mathbb{P}.$ Hence, for ${B}^{\prime}\in arg{inf}_{B\in \langle \mathbb{B}\rangle}{S}_{\gamma}^{log}(P,B)$, it holds that $P(F)>0$ implies ${B}^{\prime}(F)>0.$

**Corollary 10.**${S}_{\gamma}^{log}$ is strictly $\langle \mathbb{B}\rangle $-proper if and only if ${S}_{\gamma}^{log}$ is strictly $\mathbb{B}$-proper.

**Proof:**Assume that ${S}_{\gamma}^{log}$ is strictly $\langle \mathbb{B}\rangle $-proper. Then for all $P\in \mathbb{P}$, we have $P=arg{inf}_{B\in \langle \mathbb{B}\rangle}{S}_{\gamma}^{log}(P,B).$ Since $\mathbb{P}\subset \mathbb{B}\subset \langle \mathbb{B}\rangle $, we hence have $P=arg{inf}_{B\in \mathbb{B}}{S}_{\gamma}^{log}(P,B).$

**Definition 31 (Symmetric weighting of propositions).**A weighting of propositions, γ, is symmetric, if and only if whenever ${F}^{\prime}$ can be obtained from F by permuting the ${\omega}_{i}$ in $F,$ then $\gamma ({F}^{\prime})=\gamma (F).$

**Proposition 25.**For inclusive and symmetric γ, ${S}_{\gamma}^{log}$ is strictly $\mathbb{P}$-proper.

**Proof:**We have that for all $\omega \in \Omega $, $|\{F\subseteq \Omega :|F|=n,\phantom{\rule{0.277778em}{0ex}}\omega \in F\}|=|\{G\subseteq \overline{\{\omega \}}:|G|=n-1\}|=\left(\right)open="("\; close=")">\genfrac{}{}{0pt}{}{|\Omega |-1}{n-1}.$

**Lemma 10.**If γ is an inclusive weighting of propositions that is equivalent to a weighting of partitions, then ${S}_{\gamma}^{log}$ is strictly $\mathbb{B}$-proper.

**Proof:**While this result follows directly from Corollary 3, we shall give another proof that will provide the groundwork for the proof of the next result, Theorem 10.

**Theorem 10.**For inclusive γ with $\gamma (\Omega )\ge \gamma (\varnothing )$, ${S}_{\gamma}^{log}$ is strictly proper if and only if γ is equivalent to a weighting of partitions.

**Proof:**From Lemma 10, we have that the existence of the ${\lambda}_{\pi}$ ensures propriety.

**Example 4.**Let $\Omega =\{{\omega}_{1},{\omega}_{2},{\omega}_{3}\}$ and $\gamma (1)=\gamma (3)=1,$ and $\gamma (2)=10.$ Now, consider $B\in \mathbb{B}$, defined as $B(\varnothing ):=0,$ $B(F):=0.2$ if $|F|=1$, $B(F):=0.8$ if $|F|=2$, and $B(\Omega ):=1.$ Then:

**Proposition 26.**If ${S}_{\gamma}^{log}$ is not strictly $\mathbb{X}$-proper (with $\mathbb{P}\subseteq \mathbb{X}$), then the worst case γ-expected loss minimisation and γ-entropy maximisation are, in general, achieved by different functions.

**Proof:**If ${S}_{g}$ is not merely proper, then there is a ${P}^{\prime}\in \mathbb{P}$ such that ${S}_{\gamma}^{log}({P}^{\prime},\xb7)$ is not minimised over $\mathbb{X}$ by ${P}^{\prime}.$ In particular, there is some $Q\in \mathbb{X}$ such that ${S}_{\gamma}^{log}({P}^{\prime},Q)<{S}_{\gamma}^{log}({P}^{\prime},{P}^{\prime}).$ Suppose that $\mathbb{E}=\{{P}^{\prime}\}.$ Trivially:

## References

- Williamson, J. In Defence of Objective Bayesianism; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
- Jaynes, E.T. Information theory and statistical mechanics. Phys. Rev.
**1957**, 106, 620–630. [Google Scholar] [CrossRef] - Grünwald, P.; Dawid, A.P. Game theory, maximum entropy, minimum discrepancy, and robust Bayesian decision theory. Ann. Stat.
**2004**, 32, 1367–1433. [Google Scholar] - Topsøe, F. Information theoretical optimization techniques. Kybernetika
**1979**, 15, 1–27. [Google Scholar] - Ramsey, F.P. Truth and Probability. In Studies in Subjective Probability; Kyburg, H.E., Smokler, H.E., Robert, E., Eds.; Krieger Publishing Company: Huntington, New York, NY, USA, 1926; pp. 23–52. [Google Scholar]
- Shannon, C. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423, 623–656, Reprinted with corrections. Available online: http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf (accessed on 1 June 2013). [Google Scholar] [CrossRef] - Aczél, J.; Daróczy, Z. On Measures of Information and Their Characterizations; Academic Press: New York, NY, USA, 1975. [Google Scholar]
- Dawid, A.P. Probability Forecasting. In Encyclopedia of Statistical Sciences; Kotz, S., Johnson, N.L., Eds.; Wiley: New York, USA, 1986; Volume 7, pp. 210–218. [Google Scholar]
- Joyce, J.M. Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief. In Degrees of Belief; Huber, F., Schmidt-Petri, C., Eds.; Synthese Library 342; Springer: Dordrecht, The Netherlands, 2009. [Google Scholar]
- Pettigrew, R. Epistemic Utility Arguments for Probabilism. In The Stanford Encyclopedia of Philosophy; Zalta, E.N., Ed.; The Metaphysics Research Lab, Center for the Study of Language and Information, Stanford University: Stanford, CA, USA, 2011. [Google Scholar]
- Predd, J.; Seiringer, R.; Lieb, E.; Osherson, D.; Poor, H.; Kulkarni, S. Probabilistic coherence and proper scoring rules. IEEE Trans. Inf. Theory
**2009**, 55, 4786–4792. [Google Scholar] [CrossRef] - McCarthy, J. Measures of the value of information. Proc. Natl. Acad. Sci. USA
**1956**, 42, 654–655. [Google Scholar] [CrossRef] [PubMed] - Shuford, E.H.; Albert, A.; Massengill, H.E. Admissible probability measurement procedures. Psychometrika
**1966**, 31, 125–145. [Google Scholar] [CrossRef] [PubMed] - Aczel, J.; Pfanzagl, J. Remarks on the measurement of subjective probability and information. Metrika
**1967**, 11, 91–105. [Google Scholar] [CrossRef] - Savage, L.J. Elicitation of personal probabilities and expectations. J. Am. Stat. Assoc.
**1971**, 66, 783–801. [Google Scholar] [CrossRef] - Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley: New York, NY, USA, 1991. [Google Scholar]
- Ricceri, B. Recent Advances in Minimax Theory and Applications. In Pareto Optimality, Game Theory And Equilibria; Chinchuluun, A., Pardalos, P., Migdalas, A., Pitsoulis, L., Eds.; Optimization and Its Applications; Springer: New York, USA, 2008; Volume 17, pp. 23–52. [Google Scholar]
- König, H. A general minimax theorem based on connectedness. Arch. Math.
**1992**, 59, 55–64. [Google Scholar] [CrossRef] - Keynes, J.M. A Treatise on Probability; Macmillan: London, UK, 1948. [Google Scholar]
- Williamson, J. From Bayesian epistemology to inductive logic. J. Appl. Logic
**2013**, in press. [Google Scholar] [CrossRef] - Paris, J.B. The Uncertain Reasoner’s Companion; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
- Seidenfeld, T. Entropy and uncertainty. Philos. Sci.
**1986**, 53, 467–491. [Google Scholar] [CrossRef] - Williamson, J. An Objective Bayesian Account of Confirmation. In Explanation, Prediction, and Confirmation: New Trends and Old Ones Reconsidered; Dieks, D., Gonzalez, W.J., Hartmann, S., Uebel, T., Weber, M., Eds.; Springer: Dordrecht, The Netherlands, 2011; pp. 53–81. [Google Scholar]
- Kyburg, H.E., Jr. Are there degrees of belief? J. Appl. Logic
**2003**, 1, 139–149. [Google Scholar] [CrossRef] - Smets, P.; Kennes, R. The transferable belief model. Artif. Intell.
**1994**, 66, 191–234. [Google Scholar] [CrossRef] - Keynes, J.M. The general theory of employment. Q. J. Econ.
**1937**, 51, 209–223. [Google Scholar] [CrossRef] - Csiszàr, I. Axiomatic characterizations of information measures. Entropy
**2008**, 10, 261–273. [Google Scholar] [CrossRef] - Paris, J.B. Common sense and maximum entropy. Synthese
**1998**, 117, 75–93. [Google Scholar] [CrossRef] - Paris, J.B.; Vencovská, A. A note on the inevitability of maximum entropy. Int. J. Approx. Reason.
**1990**, 4, 183–223. [Google Scholar] [CrossRef] - Paris, J.B.; Vencovská, A. In defense of the maximum entropy inference process. Int. J. Approx. Reason.
**1997**, 17, 77–103. [Google Scholar] [CrossRef]

© 2013 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

## Share and Cite

**MDPI and ACS Style**

Landes, J.; Williamson, J.
Objective Bayesianism and the Maximum Entropy Principle. *Entropy* **2013**, *15*, 3528-3591.
https://doi.org/10.3390/e15093528

**AMA Style**

Landes J, Williamson J.
Objective Bayesianism and the Maximum Entropy Principle. *Entropy*. 2013; 15(9):3528-3591.
https://doi.org/10.3390/e15093528

**Chicago/Turabian Style**

Landes, Jürgen, and Jon Williamson.
2013. "Objective Bayesianism and the Maximum Entropy Principle" *Entropy* 15, no. 9: 3528-3591.
https://doi.org/10.3390/e15093528