Justifying Objective Bayesianism on Predicate Languages

Landes, Jürgen; Williamson, Jon

doi:10.3390/e17042459

Open AccessArticle

Justifying Objective Bayesianism on Predicate Languages

by

Jürgen Landes

^* and

Jon Williamson

Department of Philosophy, School of European Culture and Languages, University of Kent, Canterbury CT2 7NF, UK

^*

Author to whom correspondence should be addressed.

Entropy 2015, 17(4), 2459-2543; https://doi.org/10.3390/e17042459

Submission received: 11 February 2015 / Revised: 27 March 2015 / Accepted: 9 April 2015 / Published: 22 April 2015

(This article belongs to the Special Issue Maximum Entropy Applied to Inductive Logic and Reasoning)

Download Versions Notes

Abstract

:

Objective Bayesianism says that the strengths of one’s beliefs ought to be probabilities, calibrated to physical probabilities insofar as one has evidence of them, and otherwise sufficiently equivocal. These norms of belief are often explicated using the maximum entropy principle. In this paper we investigate the extent to which one can provide a unified justification of the objective Bayesian norms in the case in which the background language is a first-order predicate language, with a view to applying the resulting formalism to inductive logic. We show that the maximum entropy principle can be motivated largely in terms of minimising worst-case expected loss.

Keywords:

objective Bayesianism; g-entropy; predicate language; scoring rule; minimax

1. Introduction

Objective Bayesianism holds that the strengths of one’s beliefs should satisfy three norms [1,2]:

Probability. The strengths of one’s beliefs should satisfy the axioms of probability: if bel is one’s belief function, which assigns a degree of belief to each sentence of one’s language, then bel ∈ ℙ, the set of probability functions defined on the sentences of one’s language.
Calibration. The strengths of one’s beliefs should fit one’s evidence: bel ∈ $E$ , the set of belief functions compatible with one’s evidence. In particular, the strengths of one’s beliefs should be calibrated with physical probabilities, insofar as one has evidence as to what the physical probabilities are: if one’s evidence determines just that the physical probability function P^* lies in some non-empty set ℙ^* of probability functions, then bel ∈ $E$ = 〈ℙ^*〉, where 〈ℙ^*〉 is the convex hull of ℙ^* [3].
Equivocation. The strengths of one’s beliefs should otherwise equivocate sufficiently between the basic possibilities that one can express: bel is some function in E that is sufficiently equivocal. Note that entropy is often used as a measure of the extent to which a probability function equivocates.

These three norms are usually justified in rather different ways. The Probability norm is usually justified as being required if one is to avoid sure loss—the Dutch book argument. The Calibration norm needs to hold if one is to avoid loss in the long run when one repeatedly bets on similar events. It has also been argued that the Equivocation norm should hold if one is to minimise worst-case expected loss. See Williamson [1] (Chapter 3) for discussion of these justifications. Unfortunately, these justifications do not cohere particularly well, because the betting set-up and the notion of loss differ in each case—for the Probability norm, the notion of loss is sure single-case loss, where losses may be positive or negative; for the Calibration norm it is almost-sure (i.e., probability 1) long-run loss, positive or negative; for the Equivocation norm, it is worst-case expected loss, where the loss is positive and logarithmic. Furthermore, a justification for the order in which the norms are applied is missing. In particular, the justification of the Equivocation norm presumes that belief is probabilistic; for this justification to work, some argument is needed for the claim that avoiding sure loss should be prioritised over minimising worst-case expected loss; but there is as yet no such argument. The question thus arises as to whether a single, unified justification can be given for the three norms, in order to circumvent the above problems.

Landes and Williamson [4] provided a single, unified justification for the situation in which one’s beliefs are defined over propositions, construed as subsets of a finite set Ω of outcomes. It turns out that all three norms must hold if one is to minimise worst-case expected loss: one’s belief function should be a probability function in

E

= 〈ℙ^*〉 that has sufficiently high entropy. This line of argument will be described in Section 2. Landes and Williamson [4] went on to extend this unified justification to the situation in which beliefs are defined over sentences of a propositional language, formed by recursively applying the usual propositional connectives ¬, ˄, ˅, →, ↔ to a finite set of propositional variables.

In this paper we shall show that a similar justification goes through for the situation in which beliefs are defined over sentences of a first-order predicate language, with the use of predicate, constant and variable symbols as well as the quantifiers ∀, ∃. In Section 3 we shall formulate the norms of objective Bayesianism in the context of a predicate language. In Section 4 we shall provide a justification for maximising entropy when the language in question is a predicate language without quantifier symbols and when the evidence set is finitely generated. In Section 5, we shall extend this line of argument to predicate languages that contain quantifier symbols. In Section 6 we shall investigate the case of evidence which is not finitely generated. Key concepts and notation are collected in Appendix C for ease of reference.

The key technical results in this paper are Theorem 3, Theorem 6, Theorem 7, and Theorem 8. These results all suppose that the available evidence is finitely generated (in the sense of Definition 5). The first two jointly show that, on a quantifier-free predicate language, the belief function with the best loss profile is the calibrated probability function which has maximal entropy. Theorem 7 implies that adding new constant or predicate symbols to the language does not change the inferences one draws which are expressible in the original language. Theorem 8 extends Theorem 3 and Theorem 6 to predicate languages with quantifiers. En route to proving Theorem 8, we improve on Gaifman’s Unique Extension Theorem [5] (Theorem 1) in Proposition 24.

The case of evidence which cannot be finitely generated is more involved. We consider a case in which no belief function has an optimal loss profile in Proposition 28 and Proposition 30. While there are no functions with the best loss profile in that case, we show in Proposition 29 and Proposition 31 that probability functions in a neighbourhood of the calibrated function with maximal entropy have arbitrarily good loss profiles. We also discuss a case in which the belief function with best loss profile does indeed turn out to be the calibrated probability function which has maximal entropy, see Theorem 9.

2. Beliefs over Propositions

Here we will recap the relevant results of Landes and Williamson [4], to which the reader is referred for further details and motivation. In this section we will be concerned solely with a finite set Ω of possible outcomes. We shall suppose that each member ω of Ω is a state ±A₁ ˄…˄ ±A_n of a finite propositional language L = {A₁,∆, A_n}. A proposition F is a subset of Ω. Let Π be the set of all partitions of Ω. We take {∅, Ω}, {Ω} ∈ Π. In order to limit the proliferation of partitions, we suppose that the only partition in which ∅ occurs is {∅, Ω}.

Given a belief function bel :

P Ω \to ℝ_{\geq}_{0}

that is not zero everywhere, we normalise by dividing each degree of belief by

\max_{π} {_{\in}}_{Π} \sum_{F \in π} b e l (F)

to form a belief function, B :

P Ω \to [0, 1]

, with degrees of belief in the unit interval. The set of normalised belief functions is

B : = {B : P Ω \to [0, 1] : \sum_{F \in π} B (F) \leq 1 for all π \in Π and \sum_{F \in π} B (F) = 1 for some π} .

On the other hand, the set of probability functions is

ℙ : = {B : P Ω \to [0, 1] : \sum_{F \in π} B (F) = 1 for all π \in Π} \cup B,

where ⊂ denotes strict subset inclusion. The inclusion is strict since the following normalised belief function B is not in ℙ, B(∅) = 1 and B(F ) = 0 for all ∅ ⊂ F ⊆ Ω. Since {Ω} is a partition we have P (Ω) = 1 and since {Ω, ∅} is a partition it holds that P (∅) = 0 for all P ∈ ℙ.

Let L(F, B) be the loss incurred by adopting belief function B when proposition F turns out to be true. Arguably, in the absence of knowledge of the true loss function, the loss function L should be taken to be logarithmic, as we shall now see. Consider the following four conditions on a default loss function L:

L1. L(F, B) = 0 if B(F ) = 1.
L2. L(F, B) strictly increases as B(F) decreases from 1 towards 0.
L3. L(F, B) depends only on B(F), not on B(F ⁰) for F ⁰ 6= F.
To express the next condition we need some notation. Suppose $ℒ = ℒ_{1} \cup ℒ_{2}$ : say that $ℒ = {A_{1}, \dots, A_{n}}, ℒ_{1} = {A_{1}, \dots, A_{m}}, ℒ_{2} = {A_{m}_{+ 1}, \dots, A_{n}}$ for some 1 ≤ m < n. Then ω ∈ Ω takes the form ω₁ ˄ ω₂ where ω₁ ∈ Ω₁ is a state of $ℒ_{1}$ , and ω₂ ∈ Ω₂ is a state of $ℒ_{2}$ . Given propositions F₁ ⊆ Ω₁ and F₂ ⊆ Ω₂ we can define F₁ × F₂ := {ω = ω₁ ˄ ω₂ : ω₁ ∈ F₁, ω₂ ∈ F₂}, a proposition of $ℒ$ . Given a fixed belief function B such that B(Ω) = 1, $ℒ_{1}$ and $ℒ_{2}$ are independent sublanguages, written $ℒ_{1} ╨_{B} ℒ_{2}$ , if B(F₁ × F₂) = B(F₁) · B(F₂) for all F₁ ⊆ Ω₁ and F₂ ⊆ Ω₂, where B(F₁) := B(F₁ × Ω₂) and B(F₂) := B(Ω₁ × F₂). The restriction $B_{⇂}_{ℒ 1}$ of B to $ℒ_{1}$ is a belief function on $ℒ_{1}$ defined by $B_{⇂}_{ℒ 1} (F_{1}) = B (F_{1}) = B (F_{1} \times Ω_{2})$ , and similarly for $ℒ_{2}$ .
L4. Losses are additive when the language is composed of independent sublanguages: if $ℒ = ℒ_{1} \cup ℒ_{2}$ for $ℒ_{1} ╨_{B} ℒ_{2}$ , then $L (F_{1} \times F_{2}, B) = L_{1} (F_{1}, B_{⇂ ℒ_{1}}) + L_{2} (F_{2}, B_{ℒ_{2}})$ , where L₁, L₂ are loss functions defined on $ℒ_{1}$ , $ℒ_{2}$ respectively.

Theorem 1. If a loss function L satisfies L1–4 then L(F, B) = −k log B(F) for some constant k > 0 that does not depend on

ℒ

.

When we consider the notion of expected loss, we see that this concept depends on the weight given to the various partitions under consideration. Let g : Π → ℝ_≥₀ be a function that assigns a weight to each partition. Then the g-expected loss or g-score of a belief function

B \in B

with respect to a probability function P ∈ ℙ is defined by

S_{g}^{L} (P, B) : = \sum_{π \in Π} g (π) \sum_{F \in π} P (F) L (F, B),

for any weighting function g that is inclusive in the sense that for any proposition F, some partition π containing F is given positive weight. We adopt the usual convention that 0 log 0 = 0. This ensures that

S_{g}^{L} (P, B)

is well-defined. Theorem 1 allows us to focus attention on logarithmic g-score,

S_{g} (P, B) : = - \sum_{π \in Π} g (π) \sum_{F \in π} P (F) \log B (F) .

(1)

An important property of a scoring rule is that

\arg \inf_{B \in B} S_{g}^{L} (P, B) = {P}

for all P ∈ ℙ. That is, for fixed P ∈ ℙ,

S_{g}^{L} (P, B)

is uniquely minimised by B = P. This property is known as strict propriety.

Proposition 1 (Strict Propriety). S_g is strictly proper.

By analogy with the generalised notion of scoring rule, we get a similar generalisation of entropy, g-entropy:

H_{g} (B) : = - \sum_{π \in Π} g (π) \sum_{F \in π} B (F) \log B (F) .

(2)

The standard entropy function corresponds to the special case in which g = g_Ω, the (non-inclusive) weighting function that gives weight 1 to the partition {{ω} : ω ∈ Ω} of states and weight 0 to all other partitions.

It turns out that, if there is such a function, the probability function that minimises worst-case g-score, where the worst case is taken over physical probability functions in the set

E = 〈 ℙ^{*} 〉

, is the probability function in

E

that has maximum g-entropy:

Theorem 2. As noted above,

E

is taken to be convex and g inclusive. There is a unique member of

{arg sup}_{P} {_{\in}}_{E} H g (P)

, which we shall denote by

P_{g}^{†}

. Furthermore,

\arg \sup_{P \in E} H_{g} (P) = \arg \inf_{B \in B} \sup_{P \in E} S_{g} (P, B) = {P_{g}^{†}} .

Throughout this paper we use

{arg sup}_{P \in E}

(and

{arg inf}_{P \in E}

) to refer to the points in the closure [

E

] of

E

that achieve the supremum (respectively infimum) whether or not these points are in

E

. (This convention shall also apply mutatis mutandis to suprema and infima over sets of belief functions defined on predicate languages later in this paper.)

The above theorem concerns the minimisation of worst-case g-score. If one replaces the minimisation of worst-case g-score by a more fine-grained criterion (which breaks ties between belief functions with the same worst-case g-score), then an analogue of the above theorem holds: there exists a unique belief function which is best with respect to this criterion and this function is

P_{g}^{†}

, which maximises g-score in [

E

]. When we move to predicate languages we will consider such a refinement in Definition 21.

3. Beliefs over Sentences of a Predicate Language

3.1. Norms

In this section we introduce the norms of objective Bayesianism as they apply to strength of belief in sentences formulated in a predicate language. This framework is presented in more detail in Williamson [1] (Chapter 5). It is this set of norms that we seek to justify in terms of the loss that a belief function exposes one to.

We shall take

ℒ

to be a first-order predicate language with finitely many relation symbols U₁, …, U_s, countably many constant symbols t₁, t₂, …, but no function or equality symbols. We will consider languages with and without the existential quantifier symbol, using the notation

ℒ^{∄}

and

ℒ^{\exists}

to disambiguate where needed. We shall assume, as is usual in this setting, that each individual in the domain of discourse is picked out by a some constant symbol. The sentences

S ℒ

of

ℒ

are formed by recursively applying the usual connectives and the existential quantifier, if present. In

ℒ = ℒ^{\exists}

, universally quantified sentences may be defined in terms of existentially quantified sentences as usual via ∀xθ(x) := ¬∃x¬θ(x). Note that

S ℒ^{∄}

coincides with the set of quantifier-free sentences of

ℒ^{\exists}

. We shall also be interested in the finite sublanguages

ℒ_{n}

, for n≥1, which are identical to

ℒ

except that they have only finitely many constant symbols t₁, …, t_n.

We shall list the atomic sentences of

ℒ

, i.e., sentences of the form Ut where U is a relation symbol and t is a tuple of constant symbols of the corresponding arity, as A₁, A₂, …, ordered in such a way that atomic sentences that can be expressed in

ℒ_{n}_{+ 1}

but not in

ℒ_{n}

occur after the atomic sentences A₁, …, A_rn of

ℒ_{n}

, for each n. Ω_n will denote the set of n-states, i.e., sentences of the form

\pm A_{1} \land \dots \land \pm A_{r_{n}}

. We shall use Greek letters, such as θ, to denote sentences of

ℒ

, and Roman letters, e.g., F, to denote propositions expressed by such sentences. We shall construe propositions as sets of n-states, F ⊆ Ω_n for some n (see Section 2).

The norms of objective Bayesianism can then be explicated thus:

Probability. The strengths of one’s beliefs should be representable by a probability function, i.e., a function

P : S ℒ \to ℝ

that satisfies the properties:

P1. P (τ) = 1 for all tautologies τ.
P2. If ⊨¬(φ ˄ ψ) then P (φ ˅ ψ) = P (φ) + P (ψ).
P3. $P (\exists x θ (x)) = \sup_{m} P (\lor_{i = 1}^{m} θ (t_{i}))$ .

(Clearly P3 is only applicable in the case

ℒ = ℒ^{\exists}

.)

Calibration. One’s degrees of belief should satisfy constraints imposed by one’s evidence. Assuming all evidence is evidence of physical probabilities, P should lie in the set

E_{L} = 〈 ℙ^{*} 〉

, the convex hull of the set of epistemically possible physical probability functions.

Equivocation. One’s degrees of belief should otherwise be sufficiently equivocal. Again, one can explicate this by saying that one’s belief function should have sufficiently high entropy. Here P has higher entropy than Q if there is some N such that for all n≥N,

H_{Ω}^{n} (P) > H_{Ω}^{n} (Q)

, where

H_{Ω}^{n}

is standard entropy on

ℒ_{n}

,

H_{Ω}^{n} (P) : = - \sum_{ω \in Ω_{n}} P (ω) \log P (ω)

.

The key question we attempt to answer here is: can these norms be given a unified justification in terms of avoiding avoidable loss?

3.2. Belief and Probability

A (non-normalised) belief function bel :

S ℒ \to ℝ_{\geq}_{0}

is a function that maps any sentence of the language to a non-negative real number. For technical convenience we shall focus our attention on normalised belief functions, which are defined below.

A (countable) set of mutually exclusive sentences

π \subset S ℒ

is called exhaustive if, for all interpretations

ℐ

under which the constants exhaust the universe of

ℳ

, there exists a sentence θ ∈ π such that

ℐ ⊨ θ

. This means that it is not possible for all θ ∈ π to be false at the same time. In order to control the number of partitions, we shall assume that the only partitions in which contradictions κ occur are the partitions of the form {τ, κ}, for some tautology τ. Let

Π_{ℒ}

denote the set of partitions of

ℒ

.

Example 1 (Infinite partitions). Even though

ℒ^{\exists}

does not contain a symbol for equality and every element of a partition is a sentence of

ℒ^{\exists}

, which is of finite length, infinite partitions such as the following do exist:

π_{\infty} : = {\forall x \neg U_{1} x} \cup \cup_{k = 1}^{\infty} {U_{1} t_{k} \land \land_{l = 1}^{k - 1} \neg U_{1} t_{l}} .

(Here it is presupposed that

ℒ^{\exists}

contains a unary predicate symbol U₁.) On the other hand, it turns out that there are no infinite partitions in

ℒ^{∄}

[6] (§2.5).

We take it that it is a matter of convention on which scale beliefs are measured. For convenience, we want to normalise this scale to the unit interval, [0, 1], so that all belief functions are considered on the same scale.

Definition 1 (Normalised belief function). Let

M : = \sup_{π \in Π_{ℒ}} \sum_{φ \in π} b e l (φ)

. Then define the normalisation of bel as

B (φ) : = \frac{b e l (φ)}{M}

, if M > 0. For a function f assigning every

φ \in S ℒ

the same value v ∈ ℝ_≥0 we write f ≡ v. We shall consider bel ≡ 0 as normalised. The set of normalized belief functions on

S ℒ

then is

\begin{array}{l} B_{ℒ} : = {B : S ℒ \to [0, 1] :} \sum_{φ \in π} B (φ) \leq 1 f o r a l l π \in Π_{ℒ} a n d \sum_{φ \in π} B (φ) = 1 f o r s o m e π \in Π_{ℒ}} \\ \cup {B \equiv 0} . \end{array}

For the normalisation of bel, B, it holds that B ≡ 0, if and only if M = +∞ or bel ≡ 0.

We will be particularly interested in the following subset of functions:

ℙ_{ℒ} : = {P : S ℒ \to [0, 1] : \sum_{φ \in π} P (φ) = 1 for all π \in Π_{ℒ}} .

These are the probability functions:

Proposition 2.

P \in ℙ_{ℒ}

, if and only if

P : S ℒ \to [0, 1]

satisfies the axioms of probability:

P1. P (τ) = 1 for all tautologies $τ \in S ℒ$ .
P2. If ⊨ ¬(φ ˄ ψ) then P (φ ˅ ψ) = P (φ) + P (ψ).
P3. $P (\exists x θ (x)) = \sup_{m} P (\lor_{i = 1}^{m} θ (t_{i}))$ .

Proof. First we shall see that

P \in ℙ_{ℒ}

satisfies the axioms of probability.

P1. For any tautology τ ∈ SL it holds that P (τ) = 1 because {τ} is a partition in Π_L. P (κ) = 0 for all contradictions κ because {τ, κ} is a partition in Π_L and P (τ) = 1.
P2. Suppose that φ, ψ ∈ S ℒ are such that ⊨ ¬(φ ˄ ψ). We shall proceed by cases to show that P (φ ˅ ψ) = P (φ) + P (ψ). In the first three cases one of the sentences is a contradiction, in the last two cases there are no contradictions.
- φ and ⊨ ¬ψ, then ⊨ φ ˅ ψ. Thus by the above P (φ) = 1 and P (ψ) = 0 and hence P (φ ˅ ψ) = 1 = P (φ) + P (ψ).
- ⊨ ¬φ and ⊨ ¬ψ, then ⊨ ¬φ ˄ ¬ψ. Thus P (φ ˅ ψ) = 0 = P (φ) + P (ψ).
- ⊭ ¬φ, ⊭ φ, and ⊨ ¬ψ, then {φ ˅ ψ, ¬φ ˅ ψ} and {φ, ¬φ ˅ ψ} are both partitions in $Π_{ℒ}$ . Thus P (φ ˅ ψ) + P (¬φ ˅ ψ) = 1 = P (φ) + P (¬φ ˅ ψ). Putting these observations together we now find P (φ ˅ ψ) = P (φ) = P (φ) + P (ψ).
- ⊭ ¬φ, ⊭ ¬ψ and ⊨ φ ↔ ¬ψ, then {φ, ψ} is a partition and φ ˅ ψ is a tautology. Hence, P (φ) + P (ψ) = 1 and P (φ ˅ ψ) = 1. This now yields P (φ) + P (ψ) = P (φ ˅ ψ).
- ⊭ ¬φ, ⊭ ¬ψ and ⊭ φ ↔ ¬ψ, then none of the following sentences is a tautology or a contradiction φ, ψ, φ˅ψ, ¬(φ˅ψ). Since {φ, ψ, ¬(φ˅ψ)} and {φ˅ψ, ¬(φ˅ψ)} are both partitions in Π_L we obtain P (φ) + P (ψ) = 1 − P (¬(φ ˅ ψ)) = P (φ ˅ ψ). So P (φ) + P (ψ) = P (φ ˅ ψ).
P3. For the rest of this proof we only have to consider $ℒ = ℒ^{\exists}$ .

If ⊨ ∃xθ(x), then P (∃xθ(x)) = 1.

Furthermore, the set {θ_n : n ∈ ℕ} with

θ_{n} : = θ (t_{n}) \land \land_{j = 1}^{n - 1} \neg θ (t_{j})

is exhaustive. Note that

⊨ \lor_{i = 1}^{n} θ (t_{i}) \leftrightarrow \lor_{i = 1}^{n} θ_{i}

. P1 and P2 are well-known to imply that logically equivalent sentences are assigned the same probability; see [7] (Proposition 2.1.c). Hence,

P (\lor_{i = 1}^{n} θ (t_{i})) = P (\lor_{i = 1}^{n} θ_{i})

.

The θ_i are mutually exclusive. We obtain from P2 that

P (\lor_{i = 1}^{n} θ_{i}) = \sum_{i = 1}^{n} P (θ_{i})

. Next, define a set Θ := {θ_n : θ_n satisfiable} which consists of exhaustive, satisfiable and mutually exclusive sentences. Hence Θ is a partition in

Π_{ℒ}

. We finally obtain

1 = \sum_{θ \in Θ} P (θ) \leq \lim_{n \to \infty} \sum_{i = 1}^{n} P (θ_{n}) = \lim_{n \to \infty} P (\lor_{i = 1}^{n} θ_{i}) = \lim_{n \to \infty} P (\lor_{i = 1}^{n} θ (t_{i})) \leq 1.

P1 and P2 are also well-known to imply that if ⊨ χ → ψ then P (χ) ≤ P (ψ), see [7] (Proposition 2.1.c). Since

⊨ \lor_{i = 1}^{n} θ (t_{i}) \to \lor_{i = 1}^{n + 1} θ (t_{i})

we obtain

P (\lor_{i = 1}^{n} θ (t_{i})) \leq P (\lor_{i = 1}^{n + 1} θ (t_{i}))

.

P {(\lor_{i = 1}^{n} θ_{i}))}_{n \in ℕ}

is a (not necessarily strictly) increasing sequence. Then

1 = \lim_{n \to \infty} P (\lor_{i = 1}^{n} θ (t_{i})) = \sup_{n \in ℕ} P (\lor_{i = 1}^{n} θ (t_{i})) .

(3)

The second equality holds also when

1 > \lim_{n \to \infty} P (\lor_{i = 1}^{n} θ (t_{i}))

.

If neither ⊨ ∃xθ(x) nor ⊨ ¬∃xθ(x), then {∀x¬θ(x),∃xθ(x)} is a partition. We consider two cases.

In the first case the set

{\forall x \neg θ (x), θ (t_{1}), θ (t_{2}) \land \neg θ (t_{1}), \dots, θ (t_{k}) \land \neg \lor_{j = 1}^{k - 1} θ (t_{j}), \dots}

is not a partition.

For example, this set fails to be a partition for θ(x) = ¬Ut₂ ˄ Ux: the sentence θ(t₂) ˄ ¬θ(t₁) = ¬Ut₂˄Ut₂˄¬(¬Ut₂˄Ut₁) is a contradiction and hence it cannot be contained in a partition π consisting of infinitely many sentences.

\neg \lor_{i = 1}^{m} θ (t_{i})

cannot be a contradiction since ¬∃θ(x) is satisfiable and

⊨ \neg \exists θ (x) \to \neg \lor_{i = 1}^{m} θ (t_{i})

. If

\neg \lor_{i = 1}^{m} θ (t_{i})

is a tautology, then all θ_n with n ≤ m are contradictions. Hence, for all m ∈ ℕ the set

{\neg \lor_{i = 1}^{m} θ (t_{i})}, \cup {θ_{n} : n \leq m and θ_{n} is satisfiable}

is a partition, as is

{\neg \lor_{i = 1}^{m} θ (t_{i}), \lor_{i = 1}^{m} θ (t_{i})}

. Furthermore, {∀x¬θ(x)} ∪ {θ_n : θ_n is satisfiable} is a partition.

Recalling that P(κ) = 0 for all contradictions κ we obtain

\sum_{k = 1}^{m} P (θ_{k}) = P (\lor_{i = 1}^{m} θ (t_{i}))

and

P (\exists x θ (x)) = \lim_{m \to \infty} P (\lor_{i = 1}^{m} θ (t_{i})) .

It remains to show that

\lim_{m \to \infty} P (\lor_{i = 1}^{m} θ (t_{i})) = \sup_{m} P (\lor_{i = 1}^{m} θ (t_{i})) .

This follows as we saw above in (3).

In the second case the set {∀x¬θ(x), θ(t₁), θ(t₂) ˄ ¬θ(t₁), …, θ(t_k) ˄ ¬

\lor_{j = 1}^{k - 1}

θ(t_j),…} is a partition. Recall that {∀x¬θ(x), ∃xθ(x)} is also a partition. We obtain as in the first case that

P (\exists x θ (x)) = \sum_{k = 1}^{\infty} P (θ (t_{k}) \land \neg \lor_{j = 1}^{k - 1} θ (t_{j})) = \lim_{m \to \infty} \sum_{k = 1}^{m} P (θ_{k}) = \sup_{m} P (\lor_{i = 1}^{m} θ (t_{i})) .

For the converse, note that P1–3 imply that P is a probability measure on

S ℒ

, and so additive over countable partitions (§2 in [8]; §2.5 in [6]). Hence

P \in ℙ_{ℒ}

. □

Another key feature of probability functions is that they respect logical equivalence:

Definition 2 (Respecting logical equivalence). For a sublanguage

ℒ^{'}

of

ℒ

we say that a function f :

S ℒ \to [0, 1]

respects logical equivalence on

ℒ^{'}

, if and only if for all φ,

ψ \in S ℒ^{'}

with φ ↔ ψ it holds that f(φ) = f(ψ). For

ℒ^{'} = ℒ

we simply say that f respects logical equivalence.

Proposition 3. The probability functions

P \in ℙ_{ℒ}

respect logical equivalence.

Proof. Suppose

P \in ℙ_{ℒ}

and assume that φ,

ψ \in S ℒ^{'}

are logically equivalent. Observe that {φ, ¬φ} and {ψ, ¬φ} are partitions in

Π_{ℒ}

. Hence,

P (φ) + P (\neg φ) = 1 = P (ψ) + P (\neg φ) .

But then P (φ) = P (ψ).

Thus, the

P \in ℙ_{ℒ}

assign logically equivalent sentences the same probability. □

If a belief function

B : S ℒ \to [0, 1]

respects logical equivalence, it gives sentences which express the same proposition the same degree of belief. Hence, for any n ∈ ℕ, B induces a function °B defined over the propositions F ⊆ Ω_n (c.f., Section 2). °B is defined by:

° B (F) : = B (\lor F) = B (\underset{\begin{matrix} ω \in Ω_{n} \\ ω \in F \end{matrix}}{\lor} ω) .

We will use the notation °ⁿB to avoid ambiguity in cases where n varies.

The notion of a dominated belief function will prove useful in what follows:

Definition 3 (Dominated belief function).

B \in B_{ℒ} \ ℙ_{ℒ}

is dominated by a probability function

P \in ℙ_{ℒ}

, if and only if for all

φ \in S ℒ

it holds that B(φ) ≤ P (φ).

Note that if B is dominated by P, then B ≠ P, and thus B(φ) < P (φ) has to hold at least for one sentence φ.

Proposition 4. There exist

B \in B_{ℒ} \ ℙ_{ℒ}

which are not dominated.

Proof. Let U be a relation symbol in

ℒ

of arity a ≥ 1, say. Let Ut₁t be a well-formed formula of

ℒ_{2}

, i.e., t is a a − 1 tuple with consisting only of t₁ and t₂. Let O₄ := {Ut₁t ˄ Ut₂t, Ut₁t ˄ ¬Ut₂t, ¬Ut₁t ˄ Ut₂t, ¬Ut₁t ˄ ¬Ut₂t}.

Let

B : S ℒ \to [0, 1]

be such that

B (φ) : = {\begin{array}{l} \frac{1}{100} & iff ⊨ φ \leftrightarrow ω for some ω \in O_{4} \\ \frac{50}{100} & iff ⊨ φ \leftrightarrow ω \lor ν for different ω, ν \in O_{4} \\ \frac{99}{100} & iff ⊨ φ \leftrightarrow \neg ω for some ω \in O_{4} \\ 1 & iff φ is a tautolog y \\ 0 & otherwise \end{array}

Clearly,

B \in B_{ℒ}

. We now that there does not exist a

P \in ℙ_{ℒ}

such that B(φ) ≤ P (φ) for all show φ ∈ SL.

Note that

\sum_{ω \in O_{4}} B (\neg ω) = 3 + \frac{96}{100}

and that for all

P \in ℙ_{ℒ}

it holds that

\sum_{ω \in O_{4}} P (\neg ω) = \sum_{ω \in O_{4}} (P (\neg ω) + P (ω)) - \sum_{ω \in O_{4}} P (ω) = 4 - 1 = 3.

(4)

Note for later reference that for all n ≥ 3 and ω ∈ O₄, {¬ω} ∪ {ν ∈ Ω_n : ν⊨ ω} is a partition. So,

\sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ω \end{matrix}} B (ν) \leq \frac{1}{100}

has to hold. Hence,

\sum_{ν \in Ω_{n}} B (ν) \leq \frac{4}{100}

.

Thus far we have considered partitions of sentences. We shall also need to consider partitions of propositions:

Definition 4 (Partitions of propositions). Let Π_n be the set of partitions on Ω_n. As in Section 2, we take {Ω_n} and {Ω_n, ∅} to be partitions and we suppose that there is no further partition containing ∅.

We then define the set of partitions:

Π : = \cap_{n = 1}^{\infty} Π_{n}

.

We use πⁿ to denote the partition of n-states {{ω} : ω ∈ Ω_n}.

Note that F₁ := {ω ∈ Ω₁ : ω ⊨U₁t₁} and F₂ := {ω ∈ Ω₂ : ω U₁t₁} are different propositions, where U₁ is a unary predicate symbol. F₁ is a member of {F₁,

{\bar{F}}_{1}

} ∈ Π₁ and F₂ is a member of {F₂,

{\bar{F}}_{2}

} ∈ Π₂, but not vice versa. So {F₁,

{\bar{F}}_{1}

} and {F₂,

{\bar{F}}_{2}

} are different partitions, even if these partitions are intuitively equivalent.

3.3. Application to Inductive Logic

We shall be particularly interested in the use of objective Bayesianism over predicate languages to provide semantics for inductive logic.

Inductive logic typically seeks to answer questions of following form [9] (§1.1):

φ_{1}^{X_{1}}, \dots, φ_{k}^{X_{k}} | \approx ψ^{?}

This asks, if premiss sentences φ₁, …, φ_k of

ℒ

have probabilities in sets X₁, …, X_k ⊆ [0, 1] respectively, which probability or set of probabilities should attach to the conclusion sentence ψ?

The answer to this question will depend on the semantics given to the inductive entailment relation |≈ [9] (Part I). One natural option is to give the entailment relation objective Bayesian semantics, denoted by|≈°. Here the premisses are construed as statements about chance, i.e., P^*(φ₁) ∈ X₁, …, P^*(φ_k) ∈ X_k, and the question concerns rational belief: if one’s total evidence is captured by the premisses, to what extent should one believe the conclusion sentence ψ? Applying the norms of objective Bayesianism,

φ_{1}^{X_{1}}, \dots, φ_{k}^{X_{k}} | \approx ° ψ^{Y}

holds just in case P (ψ) ∈ Y for every

P \in E_{ℒ}

that has maximal entropy, where

E = 〈 φ_{1}^{X_{1}}, \dots, φ_{k}^{X_{k}} 〉 : = 〈 {P^{*} \in ℙ_{ℒ} : P^{*} (φ_{1}) ϵ X_{1}, \dots, P^{*} (φ_{k}) ϵ X_{k}} 〉 .

This application of objective Bayesian epistemology to inductive logic is an example in which

E_{ℒ}

is generated by constraints involving only sentences of some finite sublanguage

ℒ_{n}

. We will be particularly interested in the case where φ₁, …, φ_k are quantifier-free sentences, i.e., sentences of

ℒ_{n}^{∄}

for some n.

Let

ℙ_{ℒ_{n}^{∄}}

be the set of probability functions on

ℒ_{n}^{∄}

, and let

E_{n} : = {P_{n} \in ℙ_{ℒ_{n}^{∄}} : P_{n} = P_{⇂ n}, P \in E_{ℒ}}

where P_⇂_n is the restriction of P to

S ℒ_{n}^{∄}

. Note that,

P_{⇂ n} (θ) : = \sum_{\begin{matrix} ω \in Ω_{n} \\ ω ⊨ θ \end{matrix}} P (ω)

for all

θ \in S ℒ_{n}^{∄}

.

To ease the reading we also let

ℙ_{n} : = {P_{n} \in ℙ_{ℒ_{n}^{∄}}}

.

Definition 5 (Finitely generated evidence set).

E_{ℒ}

is finitely generated if it takes the form

E_{ℒ} = {P \in ℙ_{ℒ} : P_{⇂ n} \in E_{n}}

for some n ∈ ℕ, where

E_{n} \subseteq ℙ_{ℒ_{n}^{∄}}

. Thus,

E_{ℒ}

is generated by constraints involving only some

φ_{1}, φ_{2}, \dots ϵ S ℒ_{n}^{∄}

and no other sentences.

From now on, for finitely generated

E_{ℒ}

, the letter K is used to denote the smallest number n such that

E_{ℒ}

is generated by constraints on

ℒ_{^{n}}^{∄}

.

Note that an evidence set

E_{ℒ}

which is not finitely generated may not be recapturable from

{E_{1}, E_{2}, \dots}

. For instance, for

E_{ℒ} = {P \in ℙ_{ℒ} : \lim_{n \to \infty} P (\land_{i = 1}^{n} U t_{i}) = 0}

the following two facts hold simultaneously:

$E_{ℒ} \subset ℙ_{ℒ}$
$E_{n} \subset ℙ_{n}$ for all n ∈ N.

4. Quantifier-Free Languages

We would like to develop an analogue of Theorem 2 for beliefs defined over the sentences of a predicate language: we would like to show that belief functions which minimise worst-case expected loss are probability functions in E that maximise entropy. The main difficulty in moving from the finite domain of propositions to countably many sentences of a predicate language is to ensure that worst-case expected loss is finite where possible, so that these losses can be compared and a belief function can be chosen that minimises worst-case expected loss. For this reason we proceed in two steps. First, in this section, we shall consider the case in which the predicate language has no quantifier symbol, i.e.,

ℒ = ℒ^{∄}

; comparing worst-case expected loss is more straightforward in this case. Then, in Section 5, we shall examine how far our approach can be extended to handle predicate languages with quantifiers.

First, in Section 4.1 we define the notion of a weighting function. This allows us to define and analyse the concept of entropy of a probability function on

ℒ = ℒ^{∄}

in Section 4.2. In Section 4.3 we introduce the idea of the loss profile of a belief function. Finally in Section 4.4 we show that, in various natural scenarios, the belief function that has the best loss profile is the probability function, from all those calibrated with evidence, that has maximal standard entropy.

4.1. Weighting Functions

Definition 6 (Weighting function). A weighting function on

ℒ_{n}

, g_n : Π_n → ℝ_≥₀, maps partitions π ∈ Π_n to non-negative real numbers. A weighting function on

ℒ

,

g^{ℒ}

: Π → ℝ_≥₀, is defined over partitions of propositions of all finite sublanguages. A weighting function on

ℒ

can be thought of as a family of weighting functions g_n on

ℒ_{n}

, where n ranges over the natural numbers. Given a fixed weighting function

g^{ℒ}

on

ℒ

, we shall take

g_{n}^{ℒ} : = g_{⇂ Π n}

for each n ∈ ℕ. A (general) weighting function g is taken to be defined over each predicate language

ℒ = ℒ^{∄}

. Different languages

ℒ = ℒ^{∄}

,

ℒ^{'} = {ℒ^{'}}^{∄}

have different sets of relation symbols.

A weighting function g is atomic if for each

ℒ

and each n, g_n depends only on the number of atomic propositions in

ℒ_{n}

, not on the structure of those atomic propositions. Thus if

ℒ

and

ℒ^{'}

are such that

ℒ_{n}

and

{ℒ^{'}}_{m}

have the same number of atomic propositions, then

g_{m}^{ℒ} = g_{m}^{ℒ^{'}}

. In this paper we shall suppose that all weighting functions are atomic; hence there will be no need to superscript a weighting function on

ℒ

or

ℒ_{n}

by the particular language

ℒ

.

We call g inclusive, if and only if it attaches positive weight to each proposition, i.e., if and only if for all n and all F ⊆ Ω_n it holds that

\sum_{\begin{matrix} π \in Π_{n} \\ F \in π \end{matrix}} g (π) > 0 .

As in Section 2, g is symmetric if for each n it is invariant under permutations of the states of

ℒ_{n}

. It is refined if for each n it gives no less weight to a refinement π′ ∈ Π_n of a partition π ∈ Π_n than to π itself. For example, the partition weighting g_Π gives weight 1 to each partition, g_Π(π) = 1 for all π ∈ Π. The proposition weighting

g_{P}_{Ω}

gives weight 1 to each partition of size 2 and weight 0 to all other partitions; this amounts to giving weight 1 to each proposition. The standard weighting g_Ω gives weight 1 to the partition πⁿ of n-states, for each n, and weight 0 to all other partitions. These weighting functions are all symmetric. The partition and proposition weightings are inclusive, but the standard weighting is not. The partition and standard weightings are refined, but the proposition weighting is not.

Definition 7 (Strongly refined weighting function). g is strongly refined if and only if it satisfies the following properties:

g is refined: in each finite sublanguage, if partition π′ is a refinement of partition π, then g(π′) ≥ g(π).
Each finite sublanguage receives the same total weight: for all n, $\sum_{π \in Π_{n}} g (π)$ is constant.
A state partition on a richer language should not receive less weight than one one a less rich language: if m < n then g(π^m) ≤ g(πⁿ)
Non-state-partitions receive finite total weight: the following limit exists (i.e., is finite),

\lim_{k \to \infty} \sum_{n = 1}^{k} \sum_{π \in Π_{n} \ {π^{n}}} g (π) .

Throughout this paper we will be particularly interested in the following weighting functions:

Definition 8 (Regular weighting function). g is regular if it is atomic, inclusive, symmetric and strongly refined.

4.2. Entropy

Definition 9 (n-entropy). Given a weighting function g and n ∈ ℕ, we define the n-entropy

H_{g}^{n} : ℙ_{ℒ} \to [0, \infty]

by:

H_{g}^{n} (P) : = - \sum_{π \in Π_{n}} g (π) \sum_{F \in π} ° P (F) \log ° P (F) .

(5)

Recall that, for a probability function P (or indeed any belief function that respects logical equivalence) defined on sentences, °P is the function induced by P over the domain of propositions. Note that by our convention, −0 log 0 = 0 = −1 log 1. Thus, for all n ∈ ℕ,

g ({Ω_{n}}) P (Ω_{n}) \log P (Ω_{n}) = 0 = g ({Ω_{n}, \emptyset}) (P (\emptyset) \log P (\emptyset) + P (Ω_{n}) \log P (Ω_{n})) .

In calculating n-entropy we may thus ignore all partitions which contain Ω_n.

Definition 10 (Standard entropy). For the standard weighting g_Ω we denote the corresponding n-entropy by

H_{Ω}^{n}

. We refer to

H_{Ω}^{n}

as standard entropy (on L_n).

H_{Ω}^{n} (P)

is the well-known Shannon Entropy of the n-states of P :

H_{Ω}^{n} (P) = - \sum_{ω \in Ω_{n}} P (ω) \log P (ω) .

For a fixed weighting function g, we say that

P \in ℙ_{ℒ}

has greater entropy than

Q \in ℙ_{ℒ}

, written P ≫ Q, if the n-entropy of P eventually dominates that of Q, i.e., if there is some N ∈ ℕ such that for all n ≥ N,

H_{g}^{n} (P) > H_{g}^{n} (Q)

.

This relation ≫ for comparing entropy is preferable to an alternative notion posed in terms of the limiting behaviour of the n-entropy of P and Q, which says that P has greater entropy than Q just when

\lim_{n \to \infty} H_{g}^{n} (P) > \lim_{n \to \infty} H_{g}^{n} (Q)

. This is because the limiting behaviour is not fine-grained enough to distinguish greater from lesser entropy: n-entropy will often tend to infinity for both P and Q, and, even where the limiting n-entropy of P and Q are both finite, these limits may be equal even though the entropy of P is intuitively greater than that of Q, insofar as the n-entropy of P eventually dominates that of Q. See Williamson [1] (§5.5) for further discussion of these comparative notions of entropy.

We will be particularly interested in the probability functions in [

E_{ℒ}

] with maximal entropy:

{maxent E}_{ℒ} : = {P \in [E_{ℒ}] : there is no Q \in [E_{ℒ}] such that Q ≫ P} .

We shall also consider entropy maximisers on finite sublanguages. We shall use the notation:

ℙ_{n}^{†} : = \arg \sup_{P \in E_{n}} H_{g}^{n} (P) .

(The members of this set are defined only on the sentences of

ℒ_{n}

, not on the sentences of the language

ℒ

as a whole.) Note that for convex

E_{ℒ}

,

E_{n}

is convex for all n ∈ ℕ and that

H_{g}^{n}

is a strictly concave function on

E_{n}

for inclusive g. If g is inclusive, then

H_{g}^{n}

is strictly concave on

ℙ_{n}

. Hence

ℙ_{n}^{†}

contains a unique element, which we will denote by

ℙ_{n}^{†}

.

Let us consider the set of limit points of the entropy maximisers on finite sublanguages:

Definition 11 (Entropy limit). A probability function is a limit point of the entropy maximisers on finite sublanguages if it is arbitrarily close to infinitely many such maximisers. We denote the set of such limit points by:

ℙ^{†} : = {P \in ℙ_{ℒ} : \forall ϵ > 0, \exists infinite I \subseteq ℕ, \forall n \in I, \forall φ \in S ℒ_{n}, | P (φ) - P_{n}^{†} (φ) | < ϵ} .

Whenever ℙ^† consists only of a single function we shall denote that function by ℙ^† and refer to ℙ^† as the entropy limit.

One important desideratum for a procedure for choosing a rational belief function, particularly in the context of inductive logic, is language invariance. We shall consider two notions of language invariance: the following notion defined in terms of finite sublanguages, and a second form of language invariance, introduced in Definition 23, which we term infinite-language invariance.

Definition 12 (Finite-language invariant weighting function). A weighting function g : Π → ℝ_≥₀ is finite-language invariant, if and only if the following holds: for all

E_{ℒ}

finitely generated by constraints on

ℒ_{K}

, if

ℒ_{n}

and

ℒ_{m}

are such that

ℒ_{K} \subseteq ℒ_{n} \subseteq ℒ_{m}

, then for all

Q \in {arg sup}_{P \in E_{ℒ}} H_{g}^{n} (P)

there exists some

R \in \arg \sup_{P \in E_{ℒ}} H_{g}^{m} (P)

such that Q_⇂_n =R_⇂_n

4.2.1. The Standard Entropy Limit

Standard entropy, i.e., entropy with respect to the standard weighting g_Ω, is the subject of a substantial literature. We here collect the features of standard entropy most relevant for our purposes.

Firstly, g_Ω is finite-language invariant; see, e.g., [7]. If

E_{ℒ}

is finitely generated and g = g_Ω, then

ℙ_{n}^{†}

contains a unique element. Furthermore, there exists a unique function P ∈ [

E_{ℒ}

] such that for all n ≥ K P_⇂_n ∈

ℙ_{n}^{†}

holds. This function P is the entropy limit with respect to the standard weighting g_Ω; it will be called the standard entropy limit and denoted by

P_{Ω}^{†}

. Henceforth we use

P_{Ω}^{†}

to denote the standard entropy limit on

ℒ

, rather than on Ω as in Section 2.

Definition 13 (Open-minded belief function). We say that a belief function

B \in B_{ℒ}

is open-minded on

ℒ^{'} \subseteq ℒ

, if and only if for all

φ \in S ℒ^{'}

for which there exists some

P \in [E_{ℒ}]

such that P (φ) > 0 it holds that B(φ) > 0. For

ℒ^{'} = ℒ

we say that the belief function

B \in B_{ℒ}

is open-minded.

The following proposition lists further important properties of

P_{Ω}^{†}

which we shall make frequent use of in the following two properties—see [7] (p. 95) for a proof of the first property.

Proposition 5.

P_{Ω}^{†}

satisfies the following properties:

$P_{Ω}^{†}$ is open-minded.
For a finitely generated $E_{ℒ}$ , for all n ≥ K and all ν ∈ Ω_n, ω ∈ Ω_K with ν ω it holds that $P_{Ω}^{†} (ν) = P_{Ω}^{†} (ω) \frac{| Ω_{K} |}{| Ω_{n} |}$ .

The second property will follow from Proposition 9 and from the fact that g_Ω is language invariant. Let ν be a consistent conjunction of pairwise different literals such that ν ⊨ ω for some n-state ω with n ≥ K. Denoting by |ν|, |ω| the number of literals in ν, respectively, ω, it follows from the second property in Proposition 5 that

P_{Ω}^{†} (ν) = P_{Ω}^{†} (ω) \frac{2^{| ω |}}{2^{| ν |}}

.

4.2.2. General Entropies

The question remains as to how the functions on

ℒ

with maximal entropy, i.e., the members of maxent

E_{ℒ}

, relate to the entropy maximisers

P_{n}^{†} \in ℙ_{n}^{†}

on the finite sublanguages

ℒ_{n}

. We shall explore this question here.

Proposition 6.

ℙ^{†} \subseteq [E_{ℒ}]

.

Proof. Let P ^† ∈ ℙ^†. Thus, for all sentences

φ \in S ℒ

, P^†(φ) is the limit of a sequence

{(P_{n}^{†})}_{n}_{\in I}

such that

P_{n}^{^{†}} \in [E_{n}]

and I ⊆ ℕ is infinite. Since [

E_{ℒ}

] and all the [

E_{n}

] are closed, P ^† ∈ [

E_{ℒ}

].

Of particular interest is the most equivocal probability function of

ℙ_{ℒ}

, which is called the equivocator and denoted by P=. P= is uniquely defined by the requirement that for all n ∈ ℕ it assigns all n-states ω ∈ Ω_n the same probability,

P_{=} (ω) = \frac{1}{| Ω |}

The restriction of P= to ℙ_n is denoted by P_{= ⇂}_n.

In certain cases ℙ^† will only contain a single limit point ℙ^†.

Definition 14. [4] (Definition 16, p. 3573.) A weighting function g_n on

ℒ_{n}

is called equivocator-preserving, if and only if

ℙ_{n}^{†} = {Q_{⇂ n} : Q \in \arg \sup_{P \in ℙ_{ℒ}} H_{g}^{n} (P)} = {P_{= ⇂ n}} .

g is called equivocator-preserving, if and only if g_n is equivocator-preserving for all n ∈ ℕ.

Proposition 7. If P= ∈ [

E_{ℒ}

] and if g is symmetric and inclusive, then ℙ^† = {P=}.

Proof. By Landes and Williamson [4] (Corollary 6, p. 3574) we have

\arg \sup_{P \in E_{ℒ}} H_{g}^{n} (P) = {P \in ℙ_{ℒ} : P_{⇂ n} = P_{= ⇂ n}} .

It follows that

\lim_{n \to \infty} \arg \sup_{P \in E_{ℒ}} H_{g}^{n} (P) = {P_{=}}

and hence ℙ^† = {P=}. □

So, if g is symmetric and inclusive, then g is equivocator-preserving. In Appendix B we shall show that there exist non-symmetric g which are equivocator-preserving.

Definition 15 (State-inclusive weighting function). Given

ℒ

, we call a weighting function g : Π → [0, 1] state-inclusive on

ℒ_{n}

, if and only if for each state ω ∈ Ω_n there exists a π ∈ Π_n such that {ω} ∈ π and g(π) > 0. A weighting function g : Π → [0, 1] is state-inclusive, if and only if it is state-inclusive on each

ℒ_{n}

. It is eventually state-inclusive, if and only if there exists a J ∈ ℕ such that for all n ≥ J, g is state-inclusive on

ℒ_{n}

.

For example, if g(πⁿ) > 0 for all n ∈ ℕ, then g is state-inclusive. Moreover, inclusive implies state-inclusive.

Lemma 1. If g is state-inclusive on

ℒ_{n}

, then

H_{g}^{n}

is strictly concave on ℙ_n.

Proof. Let P, Q ∈ ℙ_n be different and λ ∈ (0, 1). Since for all π ∈ Π_n we have

\sum_{F \in π} ° P (F) = 1 = \sum_{F \in π} ° Q (F)

we find using the strict concavity of −x · log x on [0, 1]

\begin{array}{l} H_{g}^{n} (λ P + (1 - λ) Q) = \sum_{π \in Π_{n}} - g (π) \sum_{F \in π} (λ ° P (F) + (1 - λ) ° Q (F)) \cdot \log (λ ° P (F) + (1 - λ) ° Q (F)) \\ \geq \sum_{π \in Π_{n}} - g (π) \sum_{F \in π} (λ ° P (F) \log (λ ° P (F))) + ((1 - λ) ° Q (F) \log ((1 - λ) ° Q (F))) \\ = H_{g}^{n} (λ P) + H_{g}^{n} ((1 - λ) Q) . \end{array}

The inequality is strict, if and only if there exists some π ∈ Π_n with g(π) > 0 such that there is some F ∈ π with °P (F ) ≠ °Q(F). Since P, Q are different probability functions, there exists some ω ∈ Ω_n such that P (ω) ≠ Q(ω). Since g is state-inclusive, g(π) > 0 for some π ∈ Π_n with {ω} ∈ π. Hence, the inequality is strict. □

Proposition 8. If

E_{ℒ}

is finitely generated, and g is eventually state-inclusive and language invariant, then ℙ^† consists of a single probability function ℙ^† and for all

φ \in S ℒ

it holds that

\lim_{n \to \infty} P_{n}^{†} (φ) = P^{†} (φ)

.

Proof. Recall that

E_{ℒ}

is expressible by constraints in

ℒ_{K}

and let J as in Definition 15. Let n ≥ max{J, K}.

By the above Lemma 1,

H_{g}^{n}

is strictly concave on ℙ_n. Since

E_{n}

is convex,

\arg \sup_{P \in E_{n}} H_{g}^{n} (P)

contains a single element. Hence, Q, R ∈

\arg \sup_{P \in E_{ℒ}} H_{g}^{n} (P)

agree on

S ℒ_{n}

.

Since g is language invariant, we have

\arg \sup_{P \in E_{ℒ}} H_{g}^{m} (P)

\subseteq \arg \sup_{P \in E_{ℒ}} H_{g}^{l} (P)

for all n ≤ l ≤ m.

For all

φ \in S ℒ

, there exists an s ∈ ℕ such that

φ \in S ℒ_{s}

. Hence, for l, m ≥ max{J, K,s} it holds that for

R \arg \sup_{P \in E_{ℒ}} H_{g}^{m} (P)

and

Q \in \arg \sup_{P \in E_{ℒ}} H_{g}^{l} (P)

that R(φ) = Q(φ). □

For instance, standard entropy [4] (Equation 80), the substate weighting and other examples generated by Landes and Williamson [4] (Lemma 8) are eventually state-inclusive and language invariant. Note that these weighting functions are not inclusive.

Definition 16. We say that H_g is strictly concave, if and only if for all n ∈ ℕ,

H_{g}^{n}

is strictly concave on ℙ_n.

Proposition 9 (Equivocation beyond

ℒ_{n}

). Let

E_{ℒ}

be finitely generated and let g be symmetric. If H_g is strictly concave, then for all n ≥ K and all ν, μ ∈ Ω_n such that there exists an ω ∈ Ω_K with ν ⊨ ω and μ ⊨ ω it holds that

P_{n}^{†} (ν) = P_{n}^{†} (μ) = P_{n}^{†} (ω) \cdot \frac{| Ω_{K} |}{| Ω_{n} |}

for all

P_{n}^{†} \in ℙ_{n}^{†}

.

We call such ν, μ ∈ Ω_n extensions of ω ∈ Ω_K and say that

P_{n}^{†}

equivocates beyond

ℒ_{K}

. In particular,

P_{n}^{†}

equivocates beyond

ℒ_{K}

up to

ℒ_{n}

.

Proof. Let n > K and let P ∈ [

E_{ℒ}

] be such that there exist ν, μ ∈ Ω_n with P (ν) ≠ P(μ) such that there exists an ω ∈ Ω_K with ν ⊨ ω and μ ⊨ ω. Assume for contradiction that

P \in \arg \sup_{R \in E_{ℒ}} H_{g}^{n} (R)

.

Now define a probability function

Q \in ℙ_{ℒ}

by first specifying Q on the n-states. Let

\begin{array}{l} Q (ν) : = P (μ) \\ Q (μ) : = P (ν) \\ Q (η) : = P (η) for all η \in Ω_{n} \ {ν, μ} . \end{array}

For a λ ∈ Ω_r with r ≥ n we let

Q (λ) : = Q (ξ) \frac{| Ω_{n} |}{| Ω_{r} |}

where ξ ∈ Ω_r is the unique r-state such that λ ⊨ ξ.

By construction, Q and P agree on

S ℒ_{K}

. Since

E_{ℒ}

is finitely generated, it follows that Q ∈ [

E_{ℒ}

]. Furthermore, Q_⇂_n can be obtained from P_⇂_n by a renaming of n-states and it holds that Q_⇂_n ≠ P_⇂_n. Since g_n is symmetric it holds that

H_{g}^{n} (P) = H_{g}^{n} (Q)

. Since [

E_{ℒ}

] is convex and

H_{g}^{n}

is strictly concave, neither P_⇂_n nor Q_⇂_n can maximise

H_{g}^{n}

over [

E_{n}

].

This contradicts P maximising

H_{g}^{n}

over [

E_{n}

].

Corollary 1. Let

E_{ℒ}

be finitely generated. If

H_{g}^{n}

is strictly concave on ℙ_n for n ≥ K and if g is symmetric, then for n ≥ K the following maximisation problem

\begin{matrix} m a x i m i s e : & H_{g}^{n} (P) \\ s u b j e c t t o : & P \in [E_{ℒ}] \end{matrix}

can be understood as an optimisation problem in the variables P (ω) with ω ∈ Ω_K. In particular, the number of variables does not grow as n tends to infinity.

Proof. Follows immediately from the above proposition by noting that

P_{n}^{†} \in \arg \sup_{P \in E_{ℒ}} H_{g}^{n} (P)

equivocates beyond

ℒ_{K}

up to

ℒ_{n}

.

This corollary shows that in order to compute

P_{n}^{†}

for n ≥ K one needs to solve an optimisation problem on Ω_K. If g is not language invariant, then, in general, the objective function of the optimisation problem changes as n changes. So, in general, (

P_{n}^{†}

)_⇂_K varies with n.

Corollary 2. Under the assumptions of Proposition 9 it holds that for F ⊆ Ω_n and ν, μ ∈ Ω_n, °

P_{n}^{†}

(F) =°

P_{n}^{†}

₍_Fν,μ₎_, where F_ν,μ is the result we obtain by replacing ν by μ and vice versa in F.

Proof. For an η ∈ Ω_n denote by ω_η ∈ Ω_K the unique K-state such that η ⊨ ω_η. Now simply note that by Proposition 9

° P_{n}^{†} (F) = \sum_{\begin{matrix} η \in Ω_{n} \\ η \in F \end{matrix}} P_{n}^{†} (η) = \sum_{\begin{matrix} η \in Ω_{n} \\ η \in F \end{matrix}} P_{n}^{†} (ω_{η}) \frac{| Ω_{K} |}{| Ω_{n} |} = \sum_{\begin{matrix} η \in Ω_{n} \\ η \in F_{ν, μ} \end{matrix}} P_{n}^{†} (ω_{η}) \frac{| Ω_{K} |}{| Ω_{n} |} = ° P_{n}^{†} (F_{ν, μ}) .

Corollary 3. Let

E_{ℒ}

be finitely generated. For all n ≥ K and all

P \in ℙ_{ℒ}

equivocating beyond

ℒ_{K}

up to

ℒ_{n}

it holds for all K ≤ k ≤ n − 1 that

H_{Ω}^{k + 1} (P) = H_{Ω}^{k} (P) - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} .

If g is symmetric and H_g is strictly concave, then

H_{Ω}^{K} (P_{k}^{†}) - H_{Ω}^{K} (P_{Ω}^{†}) = H_{Ω}^{k + 1} (P_{k + 1}^{†}) - H_{Ω}^{k + 1} (P_{Ω}^{†}) .

Proof. For ν ∈ Ω_k₊₁ let ω_ν ∈ Ω_k be the unique k state such that ν ⊨ ω_ν. For K ≤ k ≤ n − 1 we now find for a

P \in ℙ_{ℒ}

equivocating beyond

ℒ_{K}

up to

ℒ_{n}

\begin{array}{l} H_{Ω}^{k + 1} (P) = - \sum_{ν \in Ω_{k + 1}} P (ν) \log P (ν) \\ = - \sum_{ν \in Ω_{k + 1}} P (ν) \log (P (ω_{ν}) \cdot \frac{| Ω_{k} |}{| Ω_{k + 1} |}) \\ = - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} - \sum_{ν \in Ω_{k + 1}} P (ν) \log P (ω_{ν}) \\ = - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} - \sum_{ν \in Ω_{k + 1}} \sum_{\begin{matrix} ν \in Ω_{k + 1} \\ ν ⊨ ω \end{matrix}} P (ν) \log P (ω_{ν}) \\ = - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} - \sum_{ω \in Ω_{k}} \log P (ω) \cdot (\sum_{\begin{matrix} ν \in Ω_{k + 1} \\ ν ⊨ ω \end{matrix}} P (ν)) \\ = - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} - \sum_{ω \in Ω_{k}} P (ω) \log P (ω) \cdot \\ = - \log \frac{| Ω_{k} |}{| Ω_{k + 1} |} + H_{Ω}^{k} (P) . \end{array}

The second part of the proof follows directly by observing that

P_{Ω}^{†}

and

P_{n}^{†}

equivocate beyond

ℒ_{K}

up to

ℒ_{n}

by Proposition 9. □

Corollary 4. Let

E_{ℒ}

be finitely generated. For all n ≥ K and all

P \in ℙ_{ℒ}

not equivocating beyond

ℒ_{K}

up to

ℒ_{n}

it holds that

H_{Ω}^{n} (P) < H_{Ω}^{K} (P) - \log \frac{| Ω_{K} |}{| Ω_{n} |}

.

Proof. There has to exist at least one ξ ∈ Ω_K such that there exist ν, λ ∈ Ω_n with ν ⊨ ξ and λ ⊨ ξ such that P (ν) ≠ P (λ). Since P is a probability function it holds that

P (ξ) = \sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ξ \end{matrix}} P (ν)

. We thus find sing the log-sum inequality (see, e.g., Theorem 2.7.1 in [10])

\begin{array}{l} - P (ξ) \log (\frac{| Ω_{K} |}{| Ω_{n} |}) - P (ξ) \log P (ξ) = - P (ξ) \log (\frac{| Ω_{K} |}{| Ω_{n} |} P (ξ)) \\ = - \sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ξ \end{matrix}} (\frac{| Ω_{K} |}{| Ω_{n} |} P (ξ)) \log (\frac{| Ω_{K} |}{| Ω_{n} |} P (ξ)) \\ > - \sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ξ \end{matrix}} P (ν) \log P (ν) . \end{array}

If ξ ∈ Ω_K is such that for all ν, λ ∈ Ω_n with ν⊨ ξ and λ⊨ ξ it holds that P (ν) = P (λ), then the above calculation holds with the exception that the inequality is in fact an equality.

We hence find by summing over all ω ∈ Ω_K

\begin{array}{l} H_{Ω}^{K} (P) - \log (\frac{| Ω_{K} |}{| Ω_{n} |}) = \sum_{ω \in Ω_{K}} - P (ω) (\log (\frac{| Ω_{K} |}{| Ω_{n} |}) + \log P (ω)) \\ > - \sum_{ω \in Ω_{K}} \sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ξ \end{matrix}} - P (ν) \log P (ν) \\ = \sum_{\begin{matrix} ν \in Ω_{n} \\ ν ⊨ ξ \end{matrix}} - P (ν) \log P (ν) . \\ = H_{Ω}^{n} (P) . \end{array}

Corollary 5. Let E_L be finitely generated. If g is symmetric and if for all n ≥ K

H_{g}^{n}

is strictly concave on ℙ_n, then

ℙ^{†} \neq \emptyset .

Proof. By Corollary 1,

P_{n}^{†} \in [E_{n}]

is uniquely determined by

P_{n}^{†} (ω)

for ω ∈ Ω_K. That is, we can understand

{(P_{n}^{†})}_{n \in ℕ}

as sequence taking values in

{[0, 1]}^{|}^{Ω_{K} |} \subset ℝ^{|}^{Ω_{K} |}

and

{[0, 1]}^{|}^{Ω_{K} |}

is compact. Hence, the sequence

{({(P_{n}^{†})}_{⇂ K})}_{n \in ℕ}

has point of accumulation, Q, with Q ∈ [

E_{ℒ}

]. Let I ⊆ ℕ be infinite such that

\lim_{i}_{\in I, i \to \infty} P_{n_{i}}^{†} (ω) = Q (ω)

for all ω ∈ Ω_K.

Recall that for n > K that

P_{n}^{†}

equivocates under

ℒ_{K}

up to

ℒ_{n}

. We now extend Q to a probability function in [

E_{ℒ}

] by defining it on the n-states ν ∈ Ω_n for n > K as follows:

Q (ν) : = \frac{| Ω_{K} |}{| Ω_{n} |} \cdot Q (ω_{ν}) = \frac{| Ω_{K} |}{| Ω_{n} |} \cdot \lim_{i \in I, i \to \infty} P_{n_{i}}^{†} (ω_{ν})

. Hence, Q equivocates beyond

ℒ_{K}

.

Consider some

φ \in S ℒ

. It follows that there is some r ≥ K such that

φ \in S ℒ_{r}

. For ν ∈ Ω_r denote by ω_ν the unique element of Ω_K such that ν ⊨ ω_ν.

We thus find

\begin{array}{l} \lim_{\begin{matrix} i \to \infty \\ i \in I \end{matrix}} P_{n_{i}}^{†} (φ) = \lim_{\begin{matrix} i \to \infty \\ i \in I \end{matrix}} \sum_{\begin{matrix} ν \in Ω_{r} \\ ν ⊨ φ \end{matrix}} P_{n_{i}}^{†} (ν) \\ = \sum_{\begin{matrix} ν \in Ω_{r} \\ ν ⊨ φ \end{matrix}} \lim_{\begin{matrix} i \to \infty \\ i \in I \end{matrix}} P_{n_{i}}^{†} (ν) \\ = \sum_{\begin{matrix} ν \in Ω_{r} \\ ν ⊨ φ \end{matrix}} \frac{| Ω_{K} |}{| Ω_{n} |} \cdot \lim_{\begin{matrix} i \to \infty \\ i \in I \end{matrix}} P_{n_{i}}^{†} (ω_{ν}) \\ = \sum_{\begin{matrix} ν \in Ω_{r} \\ ν ⊨ φ \end{matrix}} \frac{| Ω_{K} |}{| Ω_{n} |} \cdot Q (ω_{ν}) \\ = \sum_{\begin{matrix} ν \in Ω_{r} \\ ν ⊨ φ \end{matrix}} Q (ν) \\ = Q (φ) . \end{array}

We now turn our attention to the calibrated functions with maximal entropy, maxent

E_{ℒ}

. Our aim is to show that maxent

E_{ℒ} = ℙ^{†} = {P_{Ω}^{†}}

holds for regular g.

Lemma 2. If g is regular, then

\lim_{n \to \infty} \log (| Ω_{n} |) \cdot \sum_{π \in Π_{n} \ {π^{n}}} g (π) = 0.

Proof. Since g is total it is in particular g defined for the language

ℒ^{U}

which only contains a single relation symbol which is unary. When needed, we shall add a superscript ^U express that we consider

ℒ^{U}

.

Now define a sequence (a_n)_n∈_ℕ by

a_{n} : = \sum_{π \in Π_{n}^{U} \ {π^{n}}} g (π) .

By the Cauchy condensation test [11] (p. 61, Theorem 3.27) for (not necessarily strictly) decreasing sequences we have that

\sum_{n = 1}^{\infty} a_{n} < \infty \Leftrightarrow \sum_{k = 0}^{\infty} 2^{k} a_{2^{k}} < \infty .

(6)

Since the series on the left converges by the assumption on finite weights, so does the right, and that implies that

\lim_{k \to \infty} 2^{k} a_{2^{k}} = 0

.

For n ∈ ℕ let k ∈ ℕ be such that 2^k ≤ n < 2^k⁺¹. Since a_n is (not necessarily strictly) decreasing

a_{n} \leq a_{2^{k}}

. Hence,

0 \leq n a_{n} \leq 2^{k + 1} a_{n} \leq 2^{k + 1} a_{2^{k}} = 2 (2^{k} a_{2^{k}}) .

The right hand side converges to 0 by Cauchy’s condensation test (6). Thus,

\begin{array}{l} 0 = \lim_{n \to \infty} n \cdot a_{n} \\ = \lim_{n \to \infty} n \cdot \log_{2} (2) \cdot a_{n} \\ = \lim_{n \to \infty} n \cdot \log_{2} (2) \cdot a_{n} \\ = \lim_{n \to \infty} \log_{2} (| Ω_{n}^{U} |) \cdot a_{n} \\ = \lim_{n \to \infty} \log (| Ω_{n}^{U} |) \cdot a_{n} \\ = \lim_{n \to \infty} \log (| Ω_{n}^{U} |) \cdot \sum_{π \in Π_{n}^{U} \ {π^{n}}} g (π) \end{array}

Now if

ℒ

is some other language in our sense different from

ℒ^{U}

, then for all n ∈ ℕ there exists an m_n > n such that

| Ω_{n} | = | Ω_{n}^{U} |

. This in turn implies the existence of a canonical bijections f_n identifying Π_n with

Π_{m_{n}}^{U}

which respect the structure of partitions.

Because g is atomic it follows that for all π ∈ Π_n that g(π) = g(f_n(π)) holds. Thus,

a_{n} = \sum_{π \in Π_{n} \ {π^{n}}} g (π) = \sum_{π \in Π_{m_{n}}^{U} \ {π^{m_{n}}}} g (π) .

We then observe that the sequence

{(\log (| Ω_{n}^{U} |) \cdot \sum_{π \in Π_{n} \ {π^{n}}} g (π))}_{n \in ℕ}

is a subsequence of

{(\log (| Ω_{n}^{U} |) \cdot \sum_{π \in Π_{n}^{U} \ {π^{n}}} g (π))}_{n \in ℕ}

Hence,

\begin{array}{l} 0 = \lim_{n \to \infty} \log (| Ω_{m_{n}}^{U} |) \cdot \sum_{π \in Π_{m_{n}}^{U} \ {π^{m_{n}}}} g (π) \\ = \lim_{n \to \infty} \log (| Ω_{n} |) \cdot \sum_{π \in Π_{n} \ {π^{n}}} g (π) . \end{array}

Lemma 3. If g is strongly refined and state-inclusive, then there exist 0 < a ≤ b < +∞ such that for all n ∈ ℕ, g(πⁿ) ∈ [a, b].

Proof. For every ω ∈ Ω₁ there exists some π ∈ Π₁ which contains {ω} with g(π) > 0. π¹ refines all these partitions (or π¹ is that partition). Hence, g(π¹) > 0.

Since state partitions on richer languages are assigned more weight it follows that g(πⁿ) ≥ g(π¹) > 0 for all n ∈ ℕ.

Trivially,

g (π^{n}) \leq \sum_{π \in Π_{n}} g (π)

. The latter is constant for all n. Hence, the sequence g(πⁿ) is bounded from above by

\sum_{π \in Π_{n}} g (π)

.

We can thus choose a, b as follows a := g(π¹) and

b : = \sum_{π \in Π_{1}} g (π)

.

Following [4] (p. 3556) we define:

Definition 17 (Spectrum of π). The spectrum of a partition π is defined as the multi-set of sizes of the members of π. We write σ(π) to denote the spectrum of π.

In other words, if π′ can be obtained from π by permuting the states in the members of π, then σ(π) = σ(π′). If g is symmetric, then g(π) only depends on the spectrum of π.

Lemma 4. If g is symmetric, then for all n and all spectra s

P_{=} \in \arg \sup_{P \in ℙ_{ℒ}} \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) .

Proof. First note that

\sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log P ° (F)

is a concave function, since −x log x is concave function for x ∈ [0, 1].

If P, P′ ∈

ℙ_{ℒ}

are such that one can be obtained from the other by a permutation of n-states, then for all spectra s

\sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) = \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P^{'} (F) ° \log P^{'} (F) .

Hence, for all fixed spectra s P_{= ⇂}_n lies inside the contour lines of the function

\sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log P ° (F)

in ℙ_n. It follows that

P_{=} \in \arg \sup_{P \in ℙ_{ℒ}} \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) .

Corollary 6. If g is symmetric and such that

\lim_{n \to \infty} \log | Ω_{n} | \sum_{\begin{matrix} π \in Π_{n} \\ π \neq π^{n} \end{matrix}} g (π) = 0,

then for all P ∈ P_L

\lim_{n \to \infty} \sum_{\begin{matrix} π \in Π_{n} \\ π \neq π^{n} \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) = 0.

Proof. For a fixed spectrum s we have

\begin{array}{l} \sup_{P \in E_{ℒ}} \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) = \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} ° P_{=} (F) \log ° P_{=} (F) \\ = \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - g (π) \sum_{F \in π} \frac{| F |}{| Ω_{n} |} \cdot \log \frac{| F |}{| Ω_{n} |} \\ = \sum_{\begin{matrix} π \in Π_{n} \\ σ (π) = s \end{matrix}} - \frac{g (π)}{| Ω_{n} |} \sum_{F \in π} | F | \cdot (\log | F | - \log | Ω_{n} |) . \end{array}

Thus,

\begin{array}{l} | \sup_{P \in E_{ℒ}} \sum_{_{σ (π) = s}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) | \leq \sum_{_{σ (π) = s}^{π \in Π_{n}}} \frac{g (π)}{| Ω_{n} |} \sum_{F \in π} | F | \cdot \log | Ω_{n} | \\ = \sum_{_{σ (π) = s}^{π \in Π_{n}}} g (π) \cdot \log | Ω_{n} | . \end{array}

Summing over all spectra now yields for all

P \in ℙ_{ℒ}

\sum_{_{π \neq π^{n}}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) \leq \log | Ω_{n} | \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} g (π) .

The claimed result follows.

In particular, if g is regular then the above Corollary applies, by Lemma 2.

Let us consider the application of objective Bayesianism to inductive logic (Section 3.3). It turns out that if g is regular and

E_{ℒ}

is finitely generated then the functions in

[E_{ℒ}]

with maximal entropy coincide with the entropy limits (Definition 11), and moreover there is a unique such function, the standard entropy limit:

Theorem 3. Let g be symmetric, atomic, state-inclusive and strongly refined, and

E_{ℒ}

be finitely generated. Then

{maxent E}_{ℒ} = P^{†} = {P_{Ω}^{†}} .

(7)

Note that if g is also inclusive, then g is regular.

Proof. By Lemma 3 there exist 0 < a ≤ b < +∞ such that g(πⁿ) ∈ [a, b] for all n ∈ N and by Corollary 6 the combined weight given to all other partitions on Π_n tends to zero, as n increases, fast enough that, for all

P \in ℙ_{ℒ}

,

\lim_{n \to \infty} \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° P (F) = 0.

For

Q \in [E_{ℒ}] \ {P_{Ω}^{†}}

there exists a minimal

n \in ℕ

with n ≥ K such that

{(P_{Ω}^{†})}_{⇂ n} \neq Q_{⇂ n}

. Since

H_{Ω}^{n}

is strictly convex on

E_{n}

and

P_{Ω}^{†}

maximises

H_{Ω}^{n}

over

[E_{n}]

it holds that

H_{Ω}^{n} (P_{Ω}^{†}) > H_{Ω}^{n} (Q)

. Using Corollary 3 and Corollary 4 we obtain

H_{Ω}^{r} (P_{Ω}^{†}) - H_{Ω}^{r} (Q) \geq H_{Ω}^{k} (P_{Ω}^{†}) - H_{Ω}^{k} (Q)

for r ≥ n. Thus,

\begin{array}{l} H_{Ω}^{r} (P_{Ω}^{†}) - H_{Ω}^{r} (Q) = - g (π^{r}) H_{Ω}^{k} (P_{Ω}^{†}) + \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° P_{Ω}^{†} (F) \log ° P_{Ω}^{†} (F) \\ + g (π^{r}) H_{Ω}^{k} (Q) + \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° Q (F) \log ° Q (F) \\ \geq - g (π^{r}) (H_{Ω}^{r} (P_{Ω}^{†}) - H_{Ω}^{r} (Q)) \\ + \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} - g (π) \sum_{F \in π} ° P_{Ω}^{†} (F) \log ° P_{Ω}^{†} (F) + \sum_{_{π \neq π^{n}}^{π \in Π_{n}}} g (π) \sum_{F \in π} ° Q (F) \log ° Q (F) . \end{array}

For large enough r the sums over the π ≠ π^r become negligible. Since g(π^r) is bounded there has to exist some R ∈ ℕ with R ≥ max{K, n} such that for all r ≥ R it holds that

g (π^{r}) (H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q)) > \sum_{\underset{π \neq π^{r}}{π \in Π_{r}}} - g (π) \sum_{F \in π} ° P_{Ω}^{†} (F) \log ° P_{Ω}^{†} (F) + ° Q (F) \log ° Q (F) .

Hence, for all large enough r it holds that

H_{g}^{r} (P_{Ω}^{†}) - H_{g}^{r} (Q) > 0

.

Thus, maxent

E_{ℒ} = {P_{Ω}^{†}}

.

For the second part of the proof we show that for all r ∈ N and all F ⊆ Ω_r it holds that

\lim_{n \to \infty} ° P_{n}^{†} (F) - ° P_{Ω}^{†} (F) = 0.

(8)

Observe that for all n ∈ ℕ

\begin{array}{r} | H_{Ω}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) | = | H_{Ω}^{n} (P_{n}^{†}) - \frac{1}{n (π^{n})} H_{g}^{n} (P_{n}^{†}) + \frac{1}{n (π^{n})} H_{g}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) | \\ \leq \sum_{\underset{π \neq π^{n}}{π \in Π_{n}}} - \frac{g (π)}{g (π^{n})} \sum_{F \in π} ° P_{n}^{†} (F) \log ° P_{n}^{†} (F) + | \frac{1}{g (π^{n})} H_{g}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) | . \end{array}

The first sum tends to zero as n goes to infinity by our assumptions on g.

For the second sum observe that for all ϵ > 0 there exists an N ∈ ℕ such that for all n ≥ max{N, K} and all P ∈ [

E_{ℒ}

] it holds that

| \frac{1}{g (π^{n})} H_{g}^{n} (P) - H_{Ω}^{n} (P) |

< ϵ. Hence, ϵ >

| \sup_{P \in E_{ℒ}} \frac{1}{g (π^{n})} H_{g}^{n} (P) - \sup_{P \in E_{ℒ}} H_{Ω}^{n} (P) | = | \frac{1}{g (π^{n})} H_{g}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) |

. So,

\lim_{n \to \infty} H_{Ω}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) = 0.

For all n ≥ K,

P_{n}^{†}

and

P_{Ω}^{†}

equivocate under

ℒ

_K up to

ℒ

_n (Proposition 9). Hence, it holds that

H_{Ω}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) = H_{Ω}^{K} (P_{n}^{†}) - H_{Ω}^{K} (P_{Ω}^{†})

(Corollary 3). So,

\lim_{n \to \infty} H_{Ω}^{K} (P_{n}^{†}) - H_{Ω}^{K} (P_{Ω}^{†}) = \lim_{n \to \infty} H_{Ω}^{n} (P_{n}^{†}) - H_{Ω}^{n} (P_{Ω}^{†}) = 0.

H_{Ω}^{K}

is a strictly concave and continuous function on ℙ_K. Hence, lim_n→∞

P_{n}^{†}

(ω) =

P_{Ω}^{†}

(ω) for all ω ∈ Ω_K. So, lim_n→(

P_{n}^{†}

)_⇂_K = (

P_{Ω}^{†}

)_⇂_K.

For an arbitrary n ≥ K and an F ⊆ Ω_n we find using that

P_{Ω}^{†}

equivocates beyond

ℒ

_K

\begin{array}{l} \lim_{n \to \infty} ° P_{k}^{†} (F) = \lim_{n \to \infty} \sum_{\underset{v \in F}{v \in Ω_{n}}} P_{k}^{†} (v) = \lim_{n \to \infty} \sum_{\underset{v \in F}{v \in Ω_{n}}} \frac{| Ω_{K} |}{| Ω_{n} |} \cdot P_{k}^{†} (ω_{v}) \\ = \sum_{\underset{v \in F}{v \in Ω_{n}}} \frac{| Ω_{K} |}{| Ω_{n} |} \cdot \lim_{n \to \infty} P_{k}^{†} (ω_{v}) \\ = \sum_{\underset{v \in F}{v \in Ω_{n}}} \frac{| Ω_{K} |}{| Ω_{n} |} \cdot P_{Ω}^{†} (ω_{v}) \\ = \sum_{\underset{v \in F}{v \in Ω_{n}}} P_{Ω}^{†} (ω_{v}) \\ = ° P_{Ω}^{†} (F) . \end{array}

The result for F ⊆ Ω_r with r < K follows similarly. □

4.3. Loss and Expected Loss

We shall now analyse the notion of the loss incurred by an agent with belief function B ∈

B_{ℒ}

. In Section Section 5 we shall be interested how degrees of beliefs in quantified sentences affect losses. The following definition, axioms L1–4, Theorem 4 and Proposition 12 apply within our current, quantifier-free framework, i.e.,

ℒ

=

ℒ

^∄but they also apply to quantified sentences, i.e.,

ℒ

=

ℒ

^∃.

Definition 18 (Independent Sublanguages). Let B ∈

B_{ℒ}

be a fixed belief function such that B(τ) = 1 for any tautology τ, and

ℒ

=

ℒ

₁ ∪

ℒ

₂ where

ℒ

₁ and

ℒ

₂ are disjoint:

ℒ

₁ and

ℒ

₂ contain the same constants, they do not have a relation symbol in common and the union of the relation symbols in

ℒ

₁ and

ℒ

₂ equals {U₁,…, U_s}, the set of relation symbols in

ℒ

. We say that

ℒ

₁ and

ℒ

₂ are independent sublanguages, written

ℒ

₁⫫_B

ℒ

₂, if and only if B(ϕ₁ ˄ ϕ₂) = B(ϕ₁) · B(ϕ₂) for all ϕ₁ ∈ S

ℒ

₁ and ϕ₂ ∈ S

ℒ

₂. Let B_{⇂ $ℒ$ 1}(ϕ₁) := B(ϕ₁), B_{⇂ $ℒ$ 2} (ϕ₂) := B(ϕ₂).

By analogy with the line of argument of Section 2, we shall suppose that a default loss function L : S

ℒ

×

B_{ℒ}

→ (− ∞, ∞] satisfies the following requirements. Here L(φ, B) is to be interpreted as the loss specific to φ turning out to be true, when one adopts belief function B:

L1. L(φ, B) = 0, if B(φ) = 1.
L2. L(φ, B) strictly increases as B(φ) decreases from 1 towards 0.
L3. L(φ, B) only depends on B(φ).
L4. Losses are additive when the language is composed of independent sublanguages: if $ℒ$ = $ℒ$ ₁ ∪ $ℒ$ ₂ for $ℒ$ ₁⫫_B $ℒ$ ₂, then L(ϕ₁ ˄ ϕ₂, B) = L₁(ϕ₁, B_{⇂ $ℒ$ 1}) + L₂(ϕ₂, B_{⇂ $ℒ$ 2}), where L₁, L₂ are loss functions defined on $ℒ$ ₁, $ℒ$ ₂ respectively.

Theorem 4. If a loss function L on S

ℒ

×

B_{ℒ}

satisfies L1–4, then L(φ, B) = −k log B(φ), where the constant k > 0 does not depend on the language

ℒ

.

Proof. The proof is exactly analogous to that of Landes and Williamson [4] (Theorem 4), which gives the result in the case in which

ℒ

is a finite propositional language. □

Since multiplication by a constant is equivalent to change of base, we can take log to be the natural logarithm. Since we will be interested in the belief functions that minimise loss, rather than in the absolute value of any particular losses, we can take k = 1 without loss of generality. Theorem 4 thus allows us to focus on the logarithmic loss function:

L^{\log} (φ, B) : = - \log B (φ) .

Next we define our notion of expected loss. The expectation is taken with respect to a probability function P, and we consider the expectation taken over each partition of propositions. Each partition is weighted by the given weighting function g. Attention is restricted to inclusive weighting functions, so that each belief is evaluated; if the weighting function were not inclusive then degrees of belief in some propositions would fail to contribute to the expectation.

Definition 19 (n-representation). A sentence θ ∈ S

ℒ

_n n-represents a proposition F ⊆ Ω_n, if and only if F = {ω ∈ Ω_n: ω ⊨ θ}. Let

ℱ

⊆

P

Ω_n be a set of pairwise distinct propositions. We say that Θ ⊆ S

ℒ

_n is a set of n-representatives of

ℱ

, if and only if each sentence θ ∈ Θ n-represents a unique proposition in

ℱ

and each proposition in

ℱ

is n-represented by a unique sentence θ ∈ Θ.

A set ρ of n-representatives of

P

Ω_n will be called an n-representation. We shall use ρF to denote the sentence in ρ which n-represents F. We denote by ϱ_n the set of all n-representations.

Note that if belief function B respects logical equivalence, then for all n ∈ ℕ, all F ⊆ Ω_n and all l-representations ρ with l ≥ n it holds that B(ρF ) = °B(F ). Otherwise there exist an n ∈ ℕ a proposition F ⊆ Ω_n and n-representations ρ, ρ′, such that B(ρF) ≠ B(ρ′F).

Definition 20 (n-score). Given a loss function L, an inclusive weighting function g: Π → ℝ_≥₀, n ∈ ℕ, and an n-representation ρ ∈ ϱ_n we define the representation-relative n-score

S_{g, ρ}^{L, n}

: ℙ_$ℒ$ ×

B_{ℒ}

→ [−∞, ∞] by:

S_{g, ρ}^{L, n} (P, B) : = \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} P (ρ F) L (ρ F, B) .

Define the (representation-independent) n-score

S_{g}^{L, n} : ℙ_{ℒ} \times B_{ℒ} \to [- \infty, \infty]

by

S_{g}^{L, n} (P, B) : = \sup_{ρ \in ϱ_{n}} S_{g, ρ}^{L, n} (P, B) .

(As a technical convenience, we shall consider loss functions and n-scores to be defined more generally, taking arguments P, B: S

ℒ

→ [0, 1], although we will primarily be concerned with the case above where P is a probability function and B is a belief function.)

In the light of Theorem 4, we will focus exclusively on the logarithmic loss function in this paper:

\begin{matrix} S_{g, ρ}^{L, n} (P, B) : = \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} P (ρ F) \log B (ρ F), \\ S_{g}^{n} (P, B) : = \sup_{ρ \in ϱ_{n}} S_{g, ρ}^{n} (P, B) . \end{matrix}

For P ∈ ℙ_$ℒ$ we have that P (ρF) = P (ρ′F ) for all ρ, ρ′ ∈ ϱ_n, since P respects logical equivalence. Hence for P, Q ∈ ℙ_$ℒ$ we have

\begin{array}{l} S_{g}^{n} (P, Q) = \sup_{ρ \in ϱ_{n}} S_{g, ρ}^{n} (P, Q) \\ = - \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} ° P (F) \log ° Q (F) \\ = S_{g} (° P, ° Q), \end{array}

where S_g is the propositional scoring rule introduced in Section 2, in the case Ω = Ω_n. There are also connections with g-entropy

H_{g}^{n}

, defined in (5), and the propositional notion of entropy H_g, defined in Section 2:

S_{g}^{n} (P, P) = H_{g}^{n} (P) = H_{g} (° P) .

If g = g_Ω, we call the resulting function the standard logarithmic n-score:

\begin{array}{l} S_{Ω}^{n} (P, B) = \sup_{ρ \in ϱ_{n}} - \sum_{ω \in Ω_{n}} P (ρ {ω}) \log B (ρ {ω}) \\ = - \sum_{ω \in Ω_{n}} P (ω) P {ω} \log B (ω), \end{array}

where the latter equality applies if B respects logical equivalence.

The question arises as to how

S_{g}^{n}

, the notion of expected loss defined on a finite sublanguage _$ℒ$_n, relates to loss on

ℒ

, the language as a whole. One particularly natural suggestion is that B has a better overall loss profile than B′ if the latter’s n-scores eventually dominate those of B or if the worst-case n-score incurred by B′ is eventually greater than that of B:

If B has lower worst-case expected loss than B′ for all sufficiently large n, then B has a better loss profile than B′.
If for all P ∈ ℙ_$ℒ$, B has an expected loss which is less than or equal than that of B′, and if for some P ∈ [ $E_{ℒ}$ ], B has strictly lower expected loss than B′ for sufficiently large n, then B has a better loss profile than B′.

We make this precise as follows:

Definition 21 (Better loss profile). B has a better loss profile than B′ if and only if:

There exists some N ∈ ℕ such that for all n ≥ N, $\sup_{P \in E_{ℒ}} S_{g}^{n}$ (P, B) < $\sup_{P \in E_{ℒ}} S_{g}^{n}$ (P, B′), or
$S_{g}^{n}$ (P, B) ≤ $S_{g}^{n}$ (P, B′) < +∞ for all P ∈ ℙ_$ℒ$ and all n ∈ ℕ, and there exist at least one function Q ∈ [ $E_{ℒ}$ ] and some N_Q ∈ ℕ such that $S_{g}^{n}$ (Q, B) < $S_{g}^{n}$ (Q, B′) for all n ≥ N_Q.

We write B ≺ B′ to denote that B has better loss profile than B′. We will be interested in those belief functions that have the best loss profile, i.e., the minimal elements of ≺, and define:

{minloss B}_{ℒ} : = {B \in B_{ℒ} : t h e r e i s n o B^{'} \in B_{ℒ} s u c h t h a t B^{'} ≺ B} .

(9)

Proposition 10 (Properties of ≺). The binary relation ≺ is asymmetric, partial, irreflexive and transitive.

Proof. Note that if for all P ∈ ℙ_$ℒ$ and all n ∈ ℕ it holds that

S_{g}^{n}

(P, B) ≤

S_{g}^{n}

(P, B′), then

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B) ≤

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B′) follows trivially. Hence, conditions 1 and 2 of Definition 21 are consistent, in the sense that the induced relation ≺ is asymmetric.

There exist different B, B′ ∈

B_{ℒ}

which are not open-minded on

ℒ

₁ and thus have infinite loss on

ℒ

_n for all n ≥ 1 (cf., Proposition 13). For example, if B(τ′) = B′(τ′) = 0 where τ′ is a tautology in S

ℒ

₁, then B and B′ have infinite expected loss for all n ∈ ℕ and all P ∈ ℙ_$ℒ$. Thus, ≺ is only partial.

That ≺ is irreflexive follows directly from the definition.

Now consider B₁, B₂, B₃ ∈

B_{ℒ}

such that B₁ ≺ B₂ ≺ B₃. We will consider cases to prove that B₁ ≺ B₃.

If there exist N_1,2, N_2,3 such that

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{1}) < \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{2}) for all n \geq N_{1, 2} \\ \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{2}) < \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{3}) for all n \geq N_{2, 3,} \end{array}

then

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{1}) < \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B_{3}) for all n \geq \max {N_{1, 2}, N_{2, 3}} .

Thus, B₁ ≺ B₃.

Now assume that there exists a number N_1,2 such that

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B₁) <

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B₂) for all n ≥ N_1,2 and assume that the pair (B₂, B₃) satisfies the second condition of Definition 21. Then,

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B₁) <

\sup_{P \in E_{ℒ}} S_{g}^{n}

(P, B₃) for all n ≥ N_1,2. Thus, B₁ ≺ B₃.

The same argument shows that if the pair (B₁, B₂) satisfies the second condition of Definition 21 and the pair (B₂, B₃) satisfies the first condition, then B₁ ≺ B₃.

Finally, suppose that the pairs (B₁, B₂) and (B₂, B₃) both satisfy the second condition of Definition 21. Then for all P ∈ ℙ_$ℒ$ and all n ∈ ℕ it holds that

S_{g}^{n}

(P, B₁) ≤

S_{g}^{n}

(P, B₃). Furthermore, there has to exist a Q ∈ [

E_{ℒ}

] and an N_Q ∈ ℕ such that for all n ≥ N_Q it holds that

S_{g}^{n}

(Q, B₁) <

S_{g}^{n}

(Q, B₂). But then

S_{g}^{n}

(Q, B₁) <

S_{g}^{n}

(Q, B₃) for all n ≥ N_Q. Thus, B₁ ≺ B₃.

Since ≺ is irreflexive and transitive it cannot contain a cycle.

One main theme of the rest of this paper will be the search for belief functions with the best loss profile. Since the loss function L we are interested in is − log B(φ), and these values monotonically decrease as B(φ) increases from 0 to 1, it follows that, ceteris paribus, the belief functions with better loss profiles assign greater degrees of belief to sentences.

It might appear then that the normalisation (see Definition 1) would directly imply that no B ∈

B_{ℒ}

\ℙ_$ℒ$ could have the best loss profile. Intuitively, this might be thought to hold since the belief functions B ∈

B_{ℒ}

\ℙ_$ℒ$ assign smaller degrees of belief than the probability functions P ∈ ℙ_$ℒ$. However, Equation (4) shows that some B ∈

B_{ℒ}

\ℙ_$ℒ$ assign greater degrees of belief than a probability function P ∈ ℙ_$ℒ$ to certain sentences in the following sense: there exists a set of sentences Φ ⊂ S

ℒ

such that for all P ∈ ℙ_$ℒ$ it holds that ∑_φ_∈Φ B(φ)> ∑_φ_∈Φ P(φ).

While Condition 1 of Definition 21 deals with worst-case expected loss, Condition 2 deals with dominance of expected loss. Now, dominance is often used on its own to justify the Probability norm; see, e.g., de Finetti [12] (Chapter 3) and more recently by Joyce [13,14]. So, one might think that Condition 2 is strong enough on its own to imply the probability norm. However this is not the case:

Proposition 11. For

E_{ℒ}

= ℙ_$ℒ$ there exist a weighting function g and a non-probabilistic belief function B ∈

B_{ℒ}

\ℙ_$ℒ$ such that no probability function P ∈ ℙ_$ℒ$ has a loss which dominates that of B in the sense of Condition 2.

Proof. It suffices to show that there exist a weighting g and a B ∈

B_{ℒ}

\ℙ_$ℒ$ such that for all Q ∈ ℙ_$ℒ$ there exist a P ∈ ℙ_$ℒ$ and infinitely many n ∈ ℕ such that

S_{g}^{n}

(P, B) <

S_{g}^{n}

(P, Q).

Consider a B ∈

B_{ℒ}

\ℙ_$ℒ$ from Proposition 4 and consider an arbitrary Q ∈ ℙ_$ℒ$. Then there has to exist an ν ∈ O₄ such that Q(ν) ≠ B(ν). Next note that Q(¬ν) ≠ B(¬ν) follows. Then, −

\frac{1}{100}

log

\frac{1}{100} - \frac{99}{100}

log

\frac{99}{100} < - \frac{1}{100}

log Q(ν) −

\frac{99}{100}

log Q(ν) since the logarithmic scoring rule is strictly proper.

So, for P ∈ ℙ_$ℒ$ with P (ν) =

\frac{1}{100}

and g({ν, ¬ν}) > 0 it holds that

\begin{array}{l} g ({v, \neg v}) (- P (v) \log (B (v)) - P (\neg v) \log (B (\neg v))) \\ < g ({v, \neg v}) (- P (v) \log (Q (v)) - P (\neg v) \log (Q (\neg v))) . \end{array}

Next let ν₁ := ¬Ut₁t ˄ ¬Ut₂t, ν₂ := Ut₁t ˄ ¬Ut₂t, ν₃ := ¬Ut₁t ˄ Ut₂t, and ν₄ := Ut₁t ˄ Ut₂t. For n ≥ 4 let

F_{n}^{i}

⊂ Ω_n be the unique proposition which is equivalent to ν_i,

F_{n}^{i}

= {ω ∈ Ω_n : ω ⊨ ν_i}.

Now define g_n for n ≥ 4 as follows:

\begin{matrix} g_{n} ({F_{n}^{i}, {\bar{F}}_{n}^{i}}) : = 1, if n \equiv i \mod 4 \\ g_{n} (π) : = 0, else . \end{matrix}

So, for this B and this g we have found that for all Q ∈ ℙ_$ℒ$ there exist a P ∈ ℙ_$ℒ$ and infinitely many n ∈ ℕ (every fourth n) such that

\begin{array}{l} S_{g}^{n} (P, B) = - \frac{1}{100} \log \frac{1}{100} - \frac{99}{100} \log \frac{99}{100} \\ < - \frac{1}{100} \log \log Q (v) - \frac{99}{100} \log Q (v) \\ = S_{g}^{n} (P, Q) . \end{array}

□

In general, determining the functions comprising minloss

B_{ℒ}

is a challenging problem, which we shall tackle in due course. However, there is one general property we can prove directly: assigning zero degree of belief to an epistemically possible sentence is irrational, in the sense that it exposes one to avoidable losses. To see this, first note that:

Proposition 12. For any

E_{ℒ}

, there exists a probability function P ∈

E_{ℒ}

which is open-minded.

Proof. The set of consistent sentences in

ℒ

is countable. The set

ϕ : = {φ \in S ℒ : there exists a P \in E_{ℒ} with P (ψ) > 0}

is a subset of the set of consistent sentences and is thus countable, too. We can hence enumerate Φ by some countable index set, I, say. Note that |I| ≥ 2 since P (τ) = 1 for all P ∈ ℙ_$ℒ$ and all tautologies τ.

For all φ ∈ Φ choose some P_φ ∈

E_{ℒ}

such that P_φ(φ) > 0. Next, for all i ∈ I pick an α_i ∈ (0, 1) ⊂ ℝ such that ∑_i_∈_I α_i = 1. Since |I| ≥ 2 such α_i exist.

We shall now define an open-minded function P ∈

E_{ℒ}

by putting

P = \sum_{i \in I} α_{i} P_{φ_{i}} .

Note that P is in

E_{ℒ}

since it is a convex combination of probability functions in the convex set

E_{ℒ}

.

We next show that P is indeed open-minded. Let φ ∈ Φ be at the j-th position in the enumeration I of Φ. We now obtain P (φ) ≥ α_jP_φ(φ) > 0. So, P (φ) > 0 for all φ ∈ Φ. □

Proposition 13. B ∈ minloss

B_{ℒ}

implies that B is open-minded.

Proof. If B is not open-minded, then there exists a k ∈ ℕ and a φ ∈ S

ℒ

_k such that B(φ) = 0 and there exists a P ∈ [

E_{ℒ}

] such that P (φ) > 0. Since φ ∈ S

ℒ

_r for all r ≥ k, it holds for all r ≥ k that

\sup_{P \in E_{ℒ}} S_{g}^{r}

(P,B)=+∞.

By Proposition 12 there exists an open-minded Q ∈ [

E_{ℒ}

]. Thus,

\sup_{P \in E_{ℒ}} S_{g}^{r}

(P, Q) < ∞ for all r. □

Note that the above proposition does not imply that minloss

B_{ℒ}

is non-empty.

4.4. Minimax Theorems

In this section we shall relate the belief functions that have best loss profile to the probability functions that have maximal g-entropy.

It turns out that an improvement in loss profile is not necessarily accompanied by an increase in entropy (Appendix A). Nevertheless, we shall see that given appropriate conditions on g, there is a close relationship between the belief function that has the best loss profile and the probability function which has maximum entropy. On a finite sublanguage, the unique belief function with minimum worst-case expected loss is the probability function with maximum entropy (Section 4.4.1). Moreover, on the language

ℒ

as whole, if the evidence set

E_{ℒ}

is finitely generated then the unique belief function with the best lost profile (i.e., the belief function that is minimal with respect to ≺) is the probability function in E_L with maximal entropy (Section 4.4.2). However, this is not necessarily so when

E_{ℒ}

is not finitely generated (Section 6.1).

4.4.1. Minimax on Finite Sublanguages

Lemma 5. For all n ∈ ℕ, all P ∈ ℙ_$ℒ$ and all B ∈

B_{ℒ}

respecting logical equivalence on

ℒ

_n it holds that

S_{g}^{n}

(P, B) =

S_{g, ρ}^{n}

(P, B) for all ρ ∈ ϱ_n.

Proof. Simply note that

S_{g, ρ}^{n}

(P, B) = −

- \sum_{π \in Π_{n}} g (π) \sum_{F \in π} ° P (F)

log B(ρF) does not depend on ρ ∈ ϱ_n. □

Lemma 6. For all inclusive g, for all n ∈ ℕ and each belief function

B^{†} \in \arg \inf_{B \in B_{ℒ}} \sup_{P \in E_{ℒ}} \sup_{ρ \in ϱ_{_{n}}} S_{g, ρ}^{n} (P, B),

B^† respects logical equivalence on

ℒ

_n. Furthermore, for all such B^† there exists a partition π ∈ Π_n such that ∑_F_∈π B^†(ρF)=1 for all ρ ∈ϱ_n.

Proof. Firstly, B^† cannot assign all φ ∈ S

ℒ

_n degree of belief 0, since this would an incur an infinite worst-case expected loss; and as we saw in Proposition 13, there are functions which have finite worst-case expected loss.

Assume for contradiction that a B^† ∈

B_{ℒ}

does not respect logical equivalence on

ℒ

_n. Then define a function B^inf : S

ℒ

→ [0, 1] which respects logical equivalence on

ℒ

_n by

B^{\inf} (φ) : = {\begin{array}{l} \inf_{\underset{⊨ φ \leftrightarrow ψ}{ψ \in S ℒ_{n}}} B^{†} (ψ), & if φ \in S ℒ_{n} \\ B (φ) & otherwise . \end{array}

The next step in this proof is to show that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{\inf}) = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{†}) .

In the second part of the proof we shall see that there is a belief function which has a strictly better worst case expected loss than B^inf. This then contradicts the assumption that the belief function B^† has best worst case expected loss, i.e., B^† ∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

\sup_{ρ \in ϱ_{n}}

S_{g, ρ}^{n} (P, B)

.

Since B^† does not respect logical equivalence on

ℒ_{n}

, there are logical equivalent φ, ψ ∈

S ℒ_{n}

such that B^†(φ) ≠ B^†(ψ). Thus, B^inf(φ) < max{B^†(φ), B^†(ψ)} and hence B^inf(φ) + B^inf(¬φ) < max{B^†(φ), B^†(ψ)} + B^†(¬φ) ≤ 1. The last inequality holds since B^† ∈

B_{ℒ}

. So,

B^{\inf}_{⇂ n} \notin ℙ_{n}

.

Recall that we extended the definition of scoring rules allowing the belief function to be any function defined on

S ℒ

taking values in [0, 1]. We shall be careful not to appeal to results that assume a normalised belief function in this situation.

We now find for P ∈

ℙ_{ℒ}

\begin{array}{l} S_{g}^{n} (P, B^{†}) = \sup_{ρ \in ϱ_{n}} S_{g, ρ_{n}}^{n} (P, B^{†}) \\ = \sup_{ρ \in ϱ_{n}} - \sum_{π \in Π_{n}} g (π) \sum_{F \in π} P (ρ F) \log B^{†} (ρ F) \\ = - \sum_{π \in Π_{n}} g (π) \sum_{F \in π} ° P (F) \inf_{ρ \in ϱ_{n}} \log B^{†} (ρ F) \\ = - \sum_{π \in Π_{n}} g (π) \sum_{F \in π} P (ρ F) \log B^{\inf} (ρ F) for all ρ \in ϱ_{n} \\ = S_{g, ρ}^{n} (P, B^{\inf}) f o r a l l ρ \in ϱ_{n} \\ = \sup_{ρ \in ϱ_{n}} S_{g, ρ}^{n} (P, B^{\inf}) \\ = S_{g}^{n} (P, B^{\inf}) . \end{array}

Hence

\sup_{p \in E_{ℒ}} S_{g}^{n} (P, B^{†}) = \sup_{p \in E_{ℒ}} S_{g}^{n} (P, B^{\inf})

, as claimed above.

Let us now consider cases to derive a contradiciton.

Case i There exists a π ∈

\prod_{n}

such that ∑_F_∈π B^inf(ρF)=1.

Since B^inf respects logical equivalence this fact is independent of the particular ρ ∈ ϱ_n. Recall that we use the notation °B^inf = °ⁿB^inf to denote the function that B^inf induces over propositions in Ω_n, defined by °B^inf(F) = B^inf(∨F).

With this convention we then note that °B^inf ∈

B

\

ℙ

. Let

E

be the set of probability functions on Ω_n which are in the canonical one-to-one correspondence with the probability functions on

E_{n}

, i.e.,

E : = {° P : P \in E_{ℒ}}

. We thus find, using Theorem 2 to obtain the strict inequality, that:

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{†}) = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{\inf}) \\ = \sup_{P \in E} S_{g} (° P, ° B^{\inf}) \\ = \sup_{P \in E} - \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} ° P (F) \log ° B^{\inf} (F) \\ > \sup_{P \in E} - \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} ° P (F) \log ° P_{n}^{†} (F) \\ = \sup_{P \in E} S_{g} (° P, ° P_{n}^{†}) \\ = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{n}^{†}) . \end{array}

Case ii For all π ∈ Π_n and all ρ ∈ ϱ_n it holds that ∑_F_∈_π B^inf(ρF) < 1.

Since B^inf respects logical equivalence on

ℒ_{n}

we may consider the induced function °B^inf defined over propositions of Ω_n. Since Π_n is finite, so is the set {∑_F_∈_π °B^inf(F)}. Thus, sup_π_∈Π_n ∑_F_∈_π °B^inf (F) = 1 − ϵ for some ϵ ∈ (0, 1].

Let us now define a function

B^{'} : S ℒ \to [0, 1]

. Denote by μ ∈ (0, 1] the unique number such that for all π ∈ Π_n and all ρ ∈ ϱ_n it holds that ∑_F_∈_π μ+B^inf (ρF) = ∑_F_∈_π μ + °B^inf(F) ≤ 1 and for at least one π ∈ Π_n and one ρ ∈ ϱ_n we have ∑_F_∈_π μ + B^inf (ρF) = ∑_F_∈_π μ + °B^inf(F) = 1

Put B′(φ) := μ + B^inf(φ) > B^inf(φ) for all φ ∈

S ℒ_{n}

and B′(φ) := 0 otherwise. Observe that B′ ∈

B_{ℒ}

and that B′(¬τ) ≥ μ > 0 for the tautologies τ of

ℒ_{n}

. But then °B′ ∈

B \ ℙ

. Then for all π ∈ Π_n and all P ∈

[E_{n}]

we have −∑ _F_∈_π P (ρF) log B′ (ρF) < −∑ _F_∈_π P(ρF) log B^inf (ρF). We now apply Theorem 2 to find the strict inequality below

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{†}) = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{\inf}) \\ \geq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{'}) \\ = \sup_{P \in E} S_{g} (° P, ° B^{'}) \\ > S_{g} (° P, ° P_{n}^{†}) \\ = \sup_{P \in E} S_{g} (° P, ° P_{n}^{†}) \\ = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{n}^{†}) . \end{array}

So, in Case i and in Case ii we have found that

P_{n}^{†}

has strictly better worst-case expected loss than B^† contradicting B^† ∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

\sup_{ρ \in ϱ_{n}}

S_{g, ρ}^{n} (P, B)

.

Finally, we need to show that for all such belief functions B^† there exists a π ∈ Π_n such that ∑_F_∈_π °B^†(F) = 1. Suppose for contradiction that is not the case. Note that B^† respects logical equivalence on

ℒ_{n}

. Hence, we can define a belief function B′ ∈

B_{ℒ}

by adding a strictly positive number μ as in Case ii. B′ has a worst-case expected loss that is less or equal to the worst-case expected loss of B^†. Again, we find that °B′ ∈

B \ ℙ

and hence B′ does not have minimal worst-case expected loss. Clearly then, B^† cannot have minimal worst-case expected loss. Contradiction. □

Theorem 5 (Finite sublanguage minimax). For all inclusive

g

, all n ∈

ℕ

, all C ∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, B)

and all Q ∈ arg

\sup_{P \in E_{ℒ}}

H_{g}^{n} (P)

it holds that

C_{⇂ n} = Q_{⇂ n} = P_{n}^{†} .

Proof. From Lemma 6 we know that for every C ∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, B)

it holds that C_⇂_n respects logical equivalence on

ℒ_{n}

and that °C := °ⁿC ∈

B

(since C is normalised). Every probability function in P ∈

ℙ_{ℒ}

respects logical equivalence (Proposition 3).

Thus,

S_{g}^{n} (P, C)

and

S_{g}^{n} (P, P)

collapse to

S_{g} (° P, ° C)

, respectively

S_{g} (° P, ° P)

, the logarithmic scoring rule for propositions (1).

However, for the propositional case we know from Theorem 2 that the unique

g

-entropy maximiser on

ℙ

is the unique worst-case expected loss minimiser on

B

,

P_{g}^{†} = ° P_{n}^{†}

. arg

\inf_{B \in B}

\sup_{P \in E}

S_{g} (P, B) = \arg \sup_{P \in E}

H_{g} (P) = {P_{g}^{†}}

.

Thus, for all F ⊆ Ω_n it holds that

C (ρ F) = P_{g}^{†} (F)

for all ρ ∈ ϱ_n. Hence,

C_{⇂ n} = Q_{⇂ n} = P_{n}^{†}

. □

4.4.2. Minimax for Inductive Logic

We shall now consider the language

ℒ

as a whole. We shall assume in this section that E_L is finitely generated by constraints on

ℒ_{K}

. As noted in Section 3.3, this is the scenario that is of key relevance to inductive logic. Our goal is to justify the norms of objective Bayesianism by showing that the belief functions with the best loss profile are the probability functions in

E_{ℒ}

with maximum entropy.

First we shall see that this is the case if

g

is language invariant:

Proposition 14 (Language invariance minimax). If

g

is inclusive and language invariant and if

E_{ℒ}

is finitely generated, then

\min loss B_{ℒ} = \max ent E_{ℒ} = ℙ^{†} = {P^{†}} .

Proof. Note that we have

ℙ^{†} = {P^{†}}

from Proposition 8, in particular

P_{n}^{†} = P_{⇂ n}^{†}

for all n ≥ K.

Since

g

is inclusive,

H_{g}^{n}

is strictly concave on

ℙ_{n}

(Lemma 1). Hence,

P_{n}^{†}

is uniquely determined. By language invariance we obtain P^† ∈ arg

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, P)

for all n ≥ K. Thus, P^† ∈ maxent

E_{ℒ}

.

For Q ∈

[E_{ℒ}]

\ {P^†} there has to exist some N ∈

ℕ

such than Q_⇂_n ≠ P^†_⇂_n for all n ≥ N. Since

H_{g}^{n}

is a strictly concave function on

ℙ_{n}

and since P^† maximises

H_{g}^{n}

for all n ≥ K it follows that

H_{g}^{n} (P^{†}) > H_{g}^{n} (Q)

for all n ≥ max{K, N}. Thus, Q ∉ maxent

E_{ℒ}

.

From Theorem 5 we have that

P_{n}^{†}

∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, B)

for all n ≥ K. Since

E_{ℒ}

is finitely generated and g is language invariant we have that P^† ∈ arg

\inf_{B \in B_{ℒ}}

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, B)

for all n ≥ K. Thus, P^† ∈ minloss

B_{ℒ}

.

For every C ∈

B_{ℒ}

\{P^†} there has to exist an N ∈

ℕ

such that for all n ≥ N it holds that

C_{⇂ n} \neq P_{⇂ n}^{†}

For all n ≥ max{K, N} we now apply Theorem 5 to obtain

\sup_{P \in E_{ℒ}}

S_{g}^{n} (P, C) > \sup_{P \in E_{ℒ}}

S_{g}^{n} (P, P^{†})

. Hence, C ∉ minloss

B_{ℒ}

. □

This result is not entirely satisfactory, because we cannot say anything yet about whether such weighting functions exist. Indeed, it was conjectured in Landes and Williamson [4] (p. 3564) that no inclusive, symmetric and refined weighting function

g

is language invariant. This conjecture remains open.

Our next result says that, for the standard weighting

g_{Ω}

, the probability function with the best loss profile is the standard entropy maximiser:

Proposition 15 (Standard entropy minimax). If

E_{ℒ}

is finitely generated and

g = g_{Ω}

, then

\min loss ℙ_{ℒ} = \max ent E_{ℒ} = {P_{Ω}^{†}} .

Proof.

{P_{Ω}^{†}} = \max ent E_{ℒ}

follows directly, since

g_{Ω}

is language-invariant and state-inclusive, Proposition 8.

It is well-known that

\arg \inf_{Q \in ℙ_{n}} \sup_{P \in E_{n}} S_{g Ω} (P, Q) = \arg \sup_{P \in E_{n}} S_{g Ω} (P, P) = {P_{g Ω}^{†}},

see for instance [15]. Hence,

\min loss ℙ_{ℒ} = \max ent E_{ℒ} = {P_{Ω}^{†}} .

□

Because it only identifies probability functions with the best loss profile, rather than normalised belief functions with the best loss profile, Proposition 15 provides a justification for only two norms of objective Bayesianism, the Calibration Norm and the Equivocation Norm, under the supposition that

g = g_{Ω}

. This is a useful result if there is some independent reason—such as the Dutch book argument—for taking belief functions to be probability functions. But our goal in this paper is to investigate the extent to which the notion of loss profile developed above can be used to justify all three norms at once.

We know that there are weighting functions that are regular, i.e., which are atomic, inclusive, symmetric and strongly refined. The plan of the rest this section is to prove the following analogous minimax theorem for regular weighting functions. This says that, for any regular weighting function, the belief function with the best loss profile is the probability function in

E_{ℒ}

which has maximal standard entropy. This theorem thus justifies all three norms at once.

Theorem 6 (Regularity minimax). If

g

is regular and

E_{ℒ}

is finitely generated, then

minloss B_{ℒ} = \max ent E_{ℒ} = ℙ^{†} = {P_{Ω}^{†}} .

In order to prove this theorem we give a number of lemmata. We shall state these lemmata under more minimal conditions on

g

. The reader not interested in the details might always replace the stated conditions on

g

by: “

g

is regular”.

To begin with, we shall consider only belief functions B which respect logical equivalence. (Later we shall relax this restriction.) Hence,

S_{g, ρ}^{n} (P, B)

does not depend on ρ and we can ignore the particular representation ρ. This will allow us to focus on propositions.

Lemma 7. If n ≥ K, Q ∈

ℙ_{ℒ}

and if

\sup_{P \in E_{ℒ}}

S_{Ω}^{n} (P, Q)

is finite, then it holds that

\sup_{P \in E_{ℒ}} S_{Ω}^{n + 1} (P, Q) \geq \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) + \log \frac{| Ω_{n + 1} |}{| Ω_{n} |} .

Proof. Let P^′ ∈ arg

\sup_{P \in E_{ℓ}} S_{Ω}^{n} (P, Q)

. Then define P″ on Ω_n₊₁ by

P^{″} (v) : = P^{'} (ω_{v}) \frac{| Ω_{n} |}{| Ω_{n + 1} |}

for all ν ∈ Ω_n₊₁ and ω_ν ∈ Ω_n with

v | = ω_{v}

. Now extend P″ arbitrarily to a function in

[E_{ℒ}]

. Note that

{P^{″}}_{⇂ n + 1} \in [E_{n + 1}]

since

E_{ℒ}

is finitely generated and n ≥ K.

Since − log(x) is a strictly convex function on (0, 1] and since

Q (w) = \sum_{\underset{ν | = ω}{ν \in Ω_{n + 1}}} Q (v)

for all ω ∈ Ω_n it holds for all fixed ω ∈ Ω_n that

\sum_{\underset{ν | = ω}{ν \in Ω_{n + 1}}} - \log Q (v) \geq - \frac{| Ω_{n + 1} |}{| Ω_{n} |} \log (\frac{| Ω_{n} |}{| Ω_{n + 1} |} Q (ω))

. We now find

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{Ω}^{n + 1} (P, Q) \geq S_{Ω}^{n + 1} (P^{″}, Q) \\ = - \sum_{ν \in Ω_{n + 1}} P^{″} (ν) \log Q (ν) \\ = - \sum_{ν \in Ω_{n + 1}} P^{'} (ω_{ν}) \frac{| Ω_{n} |}{Ω_{n + 1}} \log Q (ν) \\ \geq - \sum_{w \in Ω_{n}} P^{'} (ω) \cdot \log \frac{| Ω_{n} | \cdot Q (ω)}{| Ω_{n + 1} |} \\ = - \log \frac{| Ω_{n} |}{| Ω_{n + 1} |} - \sum_{ω \in Ω_{n}} P^{'} (ω) \cdot \log Q (ω) \\ = \log \frac{| Ω_{n + 1} |}{| Ω_{n} |} + \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) . \end{array}

Definition 22 (γ-weighting). To simplify notation we define for n ∈ N and F ⊆ Ω_n

γ_{n} (F) : = \sum_{\underset{F \in π}{π \in Π_{n}}} g_{n} (π) .

If g is symmetric, then γ_n(F ) only depends on |F | := |{ω ∈ Ω_n : ω ∈ F }| and we write γ_n(|F |).

In particular, since the belief function B is assumed to respect logical equivalence, we can write

\begin{array}{l} S_{g}^{n} (P, B) = \sup_{ρ \in ϱ n} \sum_{F \subseteq Ω_{n}} - γ_{n} (F) P (ρ F) \log B (ρ F) \\ = \sum_{F \subseteq Ω_{n}} - γ_{n} {(F)}^{\circ} P (ρ F) \log^{\circ} B (ρ F) . \end{array}

Furthermore, we can easily characterise the set of inclusive g. g is inclusive, if and only if for all

n \in ℕ

and all F ⊆ Ω_n γ_n(F) > 0.

Lemma 8. Let g be inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πⁿ) ∈ [a, b] for all

n \in ℕ

and such that

\lim_{n \to \infty} \log | Ω_{n} | \sum_{π \in \prod_{n} \ {π^{n}}} g (π) = 0.

Then

R e s t_{n} : = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) - g (π^{n}) S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) \to 0 a s n \to \infty .

Proof. Let us thus first note that

S_{g}^{n} (P, P_{Ω}^{†}) - g (π^{n}) S_{Ω}^{n} (P, P_{Ω}^{†}) = \sum_{π \in \prod_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° P_{Ω}^{†} (F) .

(10)

Recall that

P_{Ω}^{†}

is open-minded (Proposition 5). Thus,

P \in [E_{ℒ}]

, F ⊆ Ω_n and

° P (F) > 0

imply

° P_{Ω}^{†} (F) > 0

. Let

m : = \min {P_{Ω}^{†} (ω) : ω \in Ω_{K} & P_{Ω}^{†} (ω) > 0} ϵ (0, 1] .

Then, for F ⊆ Ω_n such that

° P_{Ω}^{†} (F) > 0

it holds that

\begin{matrix} ° P_{Ω}^{†} (F) \geq \min {P_{Ω}^{†} (ν) : ν \in Ω_{n} & P_{Ω}^{†} (ν) > 0} \\ = m \cdot \frac{| Ω_{K} |}{| Ω_{n} |} \geq \frac{m}{| Ω_{n} |}, \end{matrix}

since

P_{Ω}^{†}

equivocates beyond

ℒ_{K}

.

Hence,

P \in [E_{ℒ}]

, F ⊆ Ω_n and

° P (F) > 0

imply that

° P_{Ω}^{†} (F) \geq \frac{m}{| Ω_{n} |}

. Since

\sum_{F \in π} ° P (F) = 1

we now find

\begin{array}{l} 0 \leq \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, P_{Ω}^{†}) - g (π^{n}) S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) \\ \leq \sup_{_{P \in E_{ℒ}}} g (π^{n}) S_{Ω}^{n} (P, P_{Ω}^{†}) \\ + \sup_{_{P \in E_{ℒ}}} \sum_{π \in \prod_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° P_{Ω}^{†} (F) \sup_{_{P \in E_{ℒ}}} g (π^{n}) S_{Ω}^{n} (P, P_{Ω}^{†}) \\ \leq \sup_{_{P \in E_{ℒ}}} \sum_{π \in \prod_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° P (F) \log ° \frac{m}{| Ω_{n} |} \\ = \log \frac{m}{| Ω_{n} |} \sum_{π \in \prod_{n} \ {π^{n}}} - g (π) \\ = (\log (| Ω_{n} |) - \log (m)) \cdot \sum_{π \in \prod_{n} \ {π^{n}}} g (π) \end{array}

To complete the proof, it suffices to note that this sums is eventually positive and converges in

n \in ℕ

to zero by our assumption on g and the fact that m is constant.

Proposition 16. Let g be inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πⁿ) ∈ [a, b] for all

n \in ℕ

and such that

\lim_{n \to \infty} \log | Ω_{n} | \sum_{π \in \prod_{n} \ {π^{n}}} g (π) = 0.

Then for all

B \in B_{ℒ} \ {P_{Ω}^{†}}

that respect logical equivalence,

P_{Ω}^{†} ≺ B

.

Proof. We shall proceed by considering cases.\

Case 1

B \in ℙ_{ℒ} \ {P_{Ω}^{†}}

.

There exists an N ≥ K such that for all n ≥ N it holds that

B_{⇂ n} \neq {(P_{Ω}^{†})}_{⇂ n}

. It is well-known that for all

P \in ℙ

\arg \inf_{Q \in ℙ} - \sum_{ω \in Ω} P (ω) \log Q (ω) = {P} .

(11)

That is, the usual logarithmic scoring rule, when applied to probability functions

P \in ℙ

and

Q \in ℙ

, is strictly proper. Savage [16] showed that this scoring rule is not only strictly proper but also unique under the further assumption of locality, which is requirement L3 in our framework. Thus,

S_{Ω}^{n} (P_{Ω}^{†}, B) - S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) > 0

.

We then find by the first part of Corollary 3 and Lemma 7 for all n ≥ N that

\begin{array}{l} \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, B) - \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, P_{Ω}^{†}) \\ = \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, B) - g (π^{n}) S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) - R e s t_{n} \\ \geq g (π^{n}) \sup_{_{P \in E_{ℒ}}} S_{Ω}^{n} (P, B) - g (π^{n}) S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) - R e s t_{n} \\ = g (π^{n}) \sup_{_{P \in E_{ℒ}}} S_{Ω}^{n} (P, B) - g (π^{n}) (S_{Ω}^{N} (P_{Ω}^{†}, P_{Ω}^{†}) + \log \frac{| Ω_{n} |}{| Ω_{n} |}) - R e s t_{n} \\ \geq g (π^{n}) (\sup_{_{P \in E_{ℒ}}} S_{Ω}^{N} (P, B) + \log \frac{| Ω_{n} |}{| Ω_{n} |}) \\ - g (π^{n}) (S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) + \log \frac{| Ω_{n} |}{| Ω_{n} |}) - R e s t_{n} \\ \geq - g (π^{n}) (S_{Ω}^{n} (P_{Ω}^{†}, B) - S_{Ω}^{N} (P_{Ω}^{†}, P_{Ω}^{†})) - R e s t_{n} \end{array}

Recall from Lemma 8 that Rest_n converges to zero. Furthermore, the sequence

{(g (π^{n}))}_{n} {_{\in}}_{ℕ}

is bounded in [a, b] with a > 0. Thus, for all large enough n ∈ N it holds that

\begin{matrix} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) \geq g (π^{n}) (S_{Ω}^{N} (P_{Ω}^{†}, B) - (S_{Ω}^{N} (P_{Ω}^{†}, P_{Ω}^{†})) - R e s t_{n} \\ > 0. \end{matrix}

Case 2

B \in B_{ℒ} \ ℙ_{ℒ}

.

Case 2A There exists a

P_{B} \in ℙ_{ℒ}

such that for all

n \in ℕ

and all F ⊆ Ω_n it holds that

° B (F) \leq ° P_{B} (F)

, i.e., P_B dominates B.

Case 2Ai

P_{B} = P_{Ω}^{†}

and no other

P \in ℙ_{ℒ}

is such that

° B (F) \leq ° P (F)

for all n and all F ⊆ Ω_n. Then for all

P \in ℙ_{ℒ}

and all propositions F it holds that

γ_{n} (F) ° P (F) (- \log ° B (F) + \log ° P_{Ω}^{†} (F)) \geq 0.

Thus, for all

P \in ℙ_{ℒ}

and

n \in ℕ

it holds that

S_{g}^{n} (P, B) \geq S_{g}^{n} (P, P_{Ω}^{†})

.

Since

B \neq P_{Ω}^{†}

there exists some

N \in ℕ

and a ∅ ⊂ F ⊆ Ω_N such that

° B (F) < ° P_{Ω}^{†} (F)

. For n > N let ∅ ⊂ F_n ⊆ Ω_n be such that F_n = {ω ∈ Ω_n : ω ∈ F }. Hence, for all n > N it holds that

- \log ° B (F_{n}) + \log ° P_{Ω}^{†} (F_{n}) > 0

. Thus,

° P_{Ω}^{†} (F_{n}) γ_{n} (F_{n}) (- \log ° B (F_{n}) + \log ° P_{Ω}^{†} (F_{n})) > 0

. Since g is inclusive (γ_n(F ) > 0 for all

n \in ℕ

and all F ⊆ Ω_n) it holds that

S_{g}^{n} (P_{Ω}^{†}, B) > S_{g}^{n} (P_{Ω}^{†}, P_{Ω}^{†})

for all n ≥ N.

Applying the second condition of Definition 21 yields

P_{Ω}^{†} ≺ B

.

Case 2Aii There exists a

P_{B} \in ℙ_{ℒ}

dominating B such that

P_{B} \neq P_{Ω}^{†}

.

Then for all n ≥ K and all

P \in E_{ℒ}

it holds that

S_{g}^{n} (P, B) - S_{g}^{n} (P, P_{B}) \geq 0

. For all large enough

n \in ℕ

it holds by Case 1 that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{B}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) > 0

. Thus, we find for all large enough n

\begin{matrix} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) \geq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{B}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) \\ > 0. \end{matrix}

Cas 2B There does not exist a

P_{B} \in ℙ_{ℒ}

such that for all

n \in ℕ

and all F ⊆ Ω_n it holds that

° B (F) \leq ° P_{B} (F)

.

For example, the belief functions constructed in Proposition 4 are of this form, i.e., not dominated by a probability function.

Let us assume for contradiction that there exists an infinite set

J : = {j_{1}, j_{2}, \dots} \subseteq ℕ

such that

\lim_{i \to \infty} \sum_{ω \in Ω_{j_{i}}} B (ω) = 1

. Now define a function Q on

S ℒ

by requiring that Q respects logical equivalence and that

° Q (F) : = \lim_{i \to \infty} \sum_{\underset{ω \in F}{ω \in Ω_{j_{i}}}} B (ω) .

Next we show

Q \in ℙ_{ℒ}

and

° B (F) \leq ° Q (F)

for all F which will allow us to derive the required contradiction.

First note that for all

n \in ℕ

it holds that

\begin{array}{l} \sum_{ν \in Ω_{n}} Q (ν) = \lim_{i \to \infty} \sum_{ν \in Ω_{n}} \sum_{\underset{ω | = ν}{ω \in Ω_{j_{i}}}} B (ω) \\ = \lim_{i \to \infty} \sum_{\underset{ω | = ν}{ω \in Ω_{j_{i}}}} B (ω) \\ = 1. \end{array}

Furthermore, we have for all

n \in ℕ

and all F ⊆ Ω_n

\begin{array}{l} ° Q (F) = \lim_{i \to \infty} \sum_{\underset{ω \in F}{ω \in Ω_{j_{i}}}} B (ω) \\ = \lim_{i \to \infty} \sum_{\underset{ν \in F}{ν \in Ω_{n}}} \sum_{\underset{ω | = ν}{ω \in Ω_{j_{i}}}} B (ω) \\ = \sum_{\underset{ν \in F}{ν \in Ω_{n}}} \lim_{i \to \infty} \sum_{\underset{ω | = ν}{ω \in Ω_{j_{i}}}} B (ω) \end{array}

So,

Q \in ℙ_{ℒ}

.

Now assume that there exists a proposition F ⊆ Ω_n such that

° B (F) > ° Q (F)

. Since

Q \in ℙ_{ℒ}

it holds that

° Q (F) + ° Q (\bar{F}) = 1

. Note that

{ω \in Ω_{j_{i}} : ω \in F} \cup \underset{\underset{ω \in F}{ω \in Ω_{j_{i}}}}{\cup} {ω}

is a partition in

Π_{j_{i}}

. Since we assumed that B respects logical equivalence it holds that

B (\lor_{ω \in Ω_{i_{i}} : ω \in F} ω)

. Thus,

° B (F) + \sum_{\underset{ω \in F}{ω \in Ω_{j_{i}}}} B (ω) \leq 1

has to hold for all large i. We now obtain the required contradiction as follows:

\begin{array}{l} 1 \geq \lim_{i \to \infty} (° B (F) + \sum_{\underset{ω \in \bar{F}}{ω \in Ω_{j_{i}}}} B (ω)) \\ = ° B (F) + ° Q (\bar{F}) \\ > ° Q (F) + ° Q (\bar{F}) \\ = 1 \end{array}

Thus, there has to exist an α > 0 and an

N \in ℕ

with N ≥ K such that for all n ≥ N it holds that

\sum_{ω \in Ω_{n}} B (ω) \leq 1 - α

. We have for n ≥ N that

\begin{array}{l} \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, P_{B}) - \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, P_{Ω}^{†}) = \sup_{_{P \in E_{ℒ}}} S_{g}^{n} (P, B) - g (π^{n}) S_{g}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) - R e s t_{n} \\ \geq g (π^{n}) (\sup_{_{P \in E_{ℒ}}} S_{Ω}^{n} (P, B) - S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†})) - R e s t_{n} \\ \geq g (π^{n}) (S_{Ω}^{n} (P_{Ω}^{†}, B) - S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†})) - R e s t_{n} \end{array}

To complete the proof we will now show that there exists some β > 0, which depends on

E_{ℒ}

and g but does not depend on the particular n ≥ N, such that

S_{Ω}^{n} (P_{Ω}^{†}, B) > S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) > β

. Since g(πⁿ) is bounded, we then obtain that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) > 0

for all large enough n.

We need to show that for all large enough n,

- \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log f (ω) - S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) \geq β > 0

for all functions f : Ω_n → [0, 1] such that

\sum_{ω \in Ω_{n}} f (ω) \leq 1 - α

.

Suppose

f^{'} \in \arg \min_{f} \sum_{ω \in Ω_{n}} - P_{Ω}^{†} (ω) \log f (ω)

. If

P_{Ω}^{†} (ω) > 0

and f′(ω) = 0, then

\sum_{ω \in Ω_{n}} - P_{Ω}^{†} (ω) \log f^{'} (ω) = \infty

. Hence, the minimum cannot obtain for such an f′. On the other hand, if f′(ω) > 0 and

P_{Ω}^{†} (ω) = 0

, then there has to exist a μ ∈ Ω_n \ {ω} such that

P_{Ω}^{†} (μ) > 0

. Then define a function f″ such that f″ (ω) := 0, f″ (μ) := f′ (μ) + f′ (ω) > f′ (μ) and f″ (λ) := f′ (λ) for all λ ∈ Ω_n \ {ω, μ}. Then

\sum_{ν \in Ω_{n}} - P_{Ω}^{†} (ν) \log f^{'} (ν) > \sum_{ν \in Ω_{n}} - P_{Ω}^{†} (ν) \log f^{″} (ν)

. Again, the minimum cannot obtain for such an f′.

We may thus assume in the following that any f′ minimising the above sum satisfies:

P_{Ω}^{†} (ω) > 0

, if and only if f′(ω) > 0. In particular, the function f′(ω) = 0 for all ω ∈ Ω_n cannot be optimal.

Let

a_{f} : = \sum_{ω \in Ω_{n}} f (ω) ϵ (0, 1 - α]

. Then

\begin{array}{r} - \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log f (ω) = - \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) (\log \frac{f (ω)}{a_{f}} + \log a_{f}) \\ = - \log (a_{f}) - \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log \frac{f (ω)}{a_{f}} . \end{array}

By definition,

\sum_{ω \in Ω_{n}} \frac{f (ω)}{a_{f}} = 1

. The sum in the above equation is thus standard logarithmic scoring rule on

B_{n}

,

S_{Ω}^{n} (P, \frac{f}{a_{f}})

. For fixed P ∈ ℙ_$ℒ$ the minimum under this scoring rule obtains for a function which agrees with P on the states ω ∈ Ω_n.

Thus, for fixed a_f the function f minimising

- \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log f (ω)

is the a_f multiple of

P_{Ω}^{†}

. In order to minimize

- \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log f (ω)

, −log a_f has to be minimal. This minimum obtains for a_f = 1 − α. We hence find the value of the minimum as

{f : Ω_{n} \to [0.1]}_{\sum_{ω \in Ω_{n}} f (ν) \leq 1 - α}^{\inf} - \sum_{ω \in Ω_{n}} P_{Ω}^{†} (ω) \log f (ω) = - \log (1 - α) - S_{Ω}^{n} (P_{Ω}^{†}, P_{Ω}^{†}) .

β may thus be chosen as β = − log(1 − α) > 0. □

We now drop the assumption that belief functions respect logical equivalence.

Proposition 17. If g is inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πⁿ) ∈ [a, b] for all n ∈ ℕ and such that

\lim_{n \to \infty} \log | Ω_{n} | \sum_{π \in Π_{n} \ {π^{n}}} g (π) = 0,

then

minloss B_{ℒ} = {P_{Ω}^{†}} .

(12)

Proof. We shall consider cases for

B \in B_{ℒ} \ {P_{Ω}^{†}}

. We will show that

P_{Ω}^{†} ≺ B

holds for all cases. Then minloss

B_{ℒ} = {P_{Ω}^{†}}

follows.

Case 1 B respects logical equivalence.

By Proposition 16 we obtain

P_{Ω}^{†} ≺ B

.

Case 2 B does not respect logical equivalence.

Since B does not respect logical equivalence, there exists a minimal N ∈ ℕ such that two different logically equivalent sentences φ, ψ ∈ S

ℒ

_N are assigned different degrees of belief, i.e., B(φ) ≠ B(ψ).

We now inductively define functions B_n : S

ℒ

→ [0, 1] for n ≥ N. First, let

B_{N} (χ) : = {\begin{array}{l} \inf {B (θ) : θ \in S ℒ_{N} & ⊨ χ \leftrightarrow θ} & if χ \in S ℒ_{N} \\ B (χ) & if χ \in S ℒ \ S ℒ_{N} \end{array} .

Now assume n > N. For all χ ∈ S

ℒ

_n such that no θ ∈ S

ℒ

_n−₁ is logically equivalent to χ let

B_{n} (χ) : = \inf {B (θ) : θ \in S ℒ_{n} & ⊨ χ \leftrightarrow θ}

and otherwise let

B_{n} (χ) : = {\begin{array}{l} B_{n - 1} (θ) & if χ \in S ℒ_{n} and there exists a θ \in S ℒ_{n - 1} with ⊨ χ \leftrightarrow θ \\ B (χ) & if χ \in S ℒ \ S ℒ_{n} \end{array} .

Note that B_n is well-defined, B_n−₁ respects logical equivalence on

ℒ

_n−₁ and thus B_n−₁(θ) does not depend on the particular sentence θ ∈ S

ℒ

_n−₁ which is logically equivalent to χ.

By construction, B_n₊₁ agrees with B_n on S

ℒ

_n.

Finally, let B^I(χ) := lim_n→∞ B_n(χ). Trivially, B^I_⇂_N = B_N_⇂_N.

Since for all n ≥ N the B_n respect logical equivalence on

ℒ

_n, B^I respects logical equivalence on

ℒ

.

Furthermore, B^I agrees with B_n on the sentences of

ℒ

_n.

Now consider a χ ∈ S

ℒ

and let k ∈ ℕ be minimal such that χ ∈ S

ℒ

_k and consider the corresponding proposition F ⊆ Ω_k. For all n ≥ max{N, k} we shall show that

\inf_{ρ \in ϱ_{n}} B (ρ F) \leq B^{I} (χ) .

If k ≤ N, then for all n ≥ N it holds that B_n(χ) = inf{B(θ) : θ ∈ S

ℒ

_N & ⊨χ ↔ θ} = B_N(χ). Hence, B^I(χ) = B_N(χ). For n ≥ N there exist ρ ∈ ϱ_n such that ρF = χ. Thus,

\inf_{ρ \in ϱ_{n}} B (ρ F) \leq B_{N} (χ) = B^{I} (χ)

.

If k ≥ N, then there are two cases. If no θ ∈ S

ℒ

_k−₁ is logically equivalent to χ, then B_k(χ) = inf{B(θ) : θ ∈ S

ℒ

_k \ S

ℒ

_k−₁ & ⊨ χ ↔ θ}. In which case, we find for all n ≥ k > N

\begin{array}{l} \inf_{ρ \in ϱ_{n}} B (ρ F) \leq \inf_{ρ \in ϱ_{k}} B (ρ F) \\ = \inf {B (θ) : θ \in S ℒ_{k} \ S ℒ_{k - 1} & ⊨ χ \leftrightarrow θ} \\ = B^{I} (χ) . \end{array}

In the other case there does exist some θ ∈ S

ℒ

_k−₁ which is logically equivalent to χ. Then B_n(χ) = B_k−₁(θ) for all n ≥ k. So B^I(χ) = B_k−₁(θ). Thus, for all n ≥ max{N, k} ≥ k − 1 it is true that

\begin{array}{l} \inf_{ρ \in ϱ_{n}} B (ρ F) \leq \inf_{ρ \in ϱ_{k}} B (ρ F) \\ \leq \inf_{ρ \in ϱ_{\max} {N, k}} B (ρ F) \\ = \inf {B (θ) : θ \in S ℒ_{k - 1} & ⊨ χ \leftrightarrow θ} \\ = B^{I} (χ) . \end{array}

It thus follows for all P ∈ ℙ_$ℒ$ and all n ≥ N that

\begin{array}{l} S_{g}^{n} (P, B) = \sup_{ρ \in ϱ_{n}} S_{g, ρ}^{n} (P, B) \\ = - \sum_{F \subseteq Ω_{n}} γ_{n} (F) ° P (F) \inf_{ρ \in ϱ_{n}} \log B (ρ F) \\ \geq - \sum_{F \subseteq Ω_{n}} γ_{n} (F) ° P (F) \log B^{I} (ρ F) for all ρ \in ϱ_{n} \\ = S_{g}^{n} (P, B^{I}) . \end{array}

(13)

Let us now note that B^I(φ) < max{B(φ), B(ψ)}. Thus, B^I(φ) + B^I(¬φ) < max{B(φ), B(ψ)} + B^I(¬φ). Also observe that B^I(χ) ≤ B(χ) for all χ ∈ S

ℒ

_N. Thus, B^I(¬φ) ≤ B(¬φ). Hence,

\begin{array}{l} B^{I} (φ) + B^{I} (\neg φ) < \max {B (φ), B (ψ)} + B^{I} (\neg φ) \\ \leq \max {B (φ), B (ψ)} + B (\neg φ) \\ \leq 1. \end{array}

We infer B^I(φ) + B^I(¬φ) < 1 and thus B^I ∉ ℙ_$ℒ$.

Case 2A

B^{I} \in B_{ℒ} \ ℙ_{ℒ}

.

Since B^I respects logical equivalence, we obtain by Proposition 16 that

P_{Ω}^{†} ≺ B^{I}

. Applying (13) we obtain

P_{Ω}^{†} ≺ B

.

Case 2B

B^{I} \notin B_{ℒ}

.

We shall now define a function B^J assigning every proposition a value in [0, 1] as follows. Let τ ∈ S

ℒ

be some tautology. {τ} is a partition. Since

B^{I} \notin B_{ℒ}

it follows that B^I(τ) < 1. Now put B^J(κ) := 1 − B^I(τ) for all contradictions κ ∈ S

ℒ

. Clearly, B^J(κ) > 0. For all satisfiable χ ∈ S

ℒ

let B^J (χ) := B^I(χ).

Note that

B^{J} \notin B_{ℒ}

and since B^J(¬τ) > 0 it follows that

B^{J} \in B_{ℒ} \ ℙ_{ℒ}

. Also note that for all n ∈ ℕ and all P ∈ ℙ_$ℒ$ it holds that

S_{g}^{n} (P, B^{I}) = S_{g}^{n} (P, B^{J})

and so

S_{g}^{n} (P, B) \geq S_{g}^{n} (P, B^{I}) = S_{g}^{n} (P, B^{J}) .

Since B^J respects logical equivalence we can apply Case 2A to obtain

P_{Ω}^{†} ≺ B^{J}

. But then

P_{Ω}^{†} ≺ B

. □

Our main minimax theorem (already stated above on Page 2492) then follows immediately from Proposition 17 by applying Lemma 2 and Theorem 3:

Theorem 6 (Regularity minimax). If g is regular and

E_{ℒ}

is finitely generated, then

{minloss B}_{ℒ} = {maxent E}_{ℒ} = ℙ^{†} = {P_{Ω}^{†}} .

If

E_{ℒ} = P_{ℒ}

, then the unique function with greatest entropy is the equivocator (Proposition 7). Thus by Theorem 6,

{minloss B}_{ℒ} = maxent ℙ_{ℒ} = {P_{Ω}^{†}} = {P_{=}} .

Recall that P= assigns all n-states ω ∈ Ω_n the same probability,

P_{=} (ω) = \frac{1}{| Ω_{n} |}

. So, if the agent does not possess any evidence then all n-states ω ∈ Ω_n are all believed to the same degree. Absence of evidence entails symmetric degrees of belief. In other words, the three norms of objective Bayesianism entail an instance of the Principle of Indifference.

Surprisingly, perhaps, symmetry of the weighting function is not necessary to guarantee this instance of the Principle of Indifference on finite sublanguages—see Appendix B.

4.5. Infinite-Language Invariance

So far, we have been working over a fixed predicate language

ℒ

(without quantifiers). One might wonder what would have happened if one had started out with a different such language.

We will investigate this question by considering predicate languages which contain finitely many further relation symbols and/or finitely many further constant symbols than does

ℒ

.

For all languages we consider here, we shall suppose that the ways the constant symbols are ordered are consistent. Furthermore, we suppose that the order types of the constant symbols are ω, the first infinite ordinal. That is, for

ℒ

⊂

ℒ

¹ let t₁, t₂, … be the constant symbols in

ℒ

and let

T^{n e w} : = {t_{1}^{n e w}, \dots, t_{m}^{n e w}}

be the set of constant symbols in

ℒ

¹ which are not in

ℒ

. Then we require that the constant symbols of

ℒ

¹ are ordered such that

for all n ∈ ℕ, t_n appears before t_n₊₁ (consistency),
for all t ∈ T ^new there exists some n ∈ ℕ such that t appears before t_n (order type ω).

The way the constant symbols of

ℒ

¹ are ordered can be thought of as inserting the t ∈ T^new into the ordering of the constant symbols of

ℒ

.

From now on, superscripts are used to refer to such predicate languages, while subscripts continue to refer to their respective finite sublanguages. For example,

ℒ_{n}^{1}

is the finite sublanguage of

ℒ

¹ which contains only the first n constants of

ℒ

¹. For

ℒ

⊂

ℒ

¹, in general, the set of the first n constants of

ℒ

may be different from the set of the first n constants of

ℒ

¹.

Definition 23 (Infinite-Language Invariance). A weighting function g is infinite-language invariant, if and only if the following holds: for all

ℒ

and for all

E_{ℒ}

finitely generated by constraints on the finite sublanguage

ℒ

_K of

ℒ

, if

ℒ

¹ and

ℒ

² are such that

ℒ

⊆

ℒ

¹ ⊆

ℒ

², then for all B ∈ minloss

B_{ℒ^{1}}

there exists a C ∈ minloss

B_{ℒ^{2}}

such that

C_{⇂ ℒ^{1}} = B

.

Infinite-language invariance is motivated by the thought that simply adding new constant or predicate symbols to the language

ℒ

should not change the inferences which are expressible in the original language

ℒ

. Note the following qualification: since each element of the domain is picked out by some member of

ℒ

, one can infer that in

ℒ

′ formed by adding constants to

ℒ

, there must be some constants which name the same individual.

We shall now proceed to show that the weighting functions which we focus on in this paper—the regular weighting functions—are infinite-language invariant.

Lemma 9. If ε, ε′ are non-empty and convex sets of the following form

\begin{array}{l} ε \subseteq {(x_{1}, \dots x_{n}) ϵ ℝ^{n} : \sum_{i = 1}^{n} x_{i} = 1 & x_{i} \geq 0} \\ ε^{'} \subseteq {(y_{1}, z_{1}, y_{2}, z_{2}, \dots, y_{n}, z_{n}) ϵ ℝ^{2 n} : y_{i}, z_{i} \geq 0 & (y_{1} + z_{1}, \dots, y_{n} + z_{n}) ϵ ε}, \end{array}

then for

\begin{array}{l} {(x_{1}^{†}, \dots, x_{n}^{†})} = \arg \sup_{(x_{1}, \dots, x_{n}) ϵ ε^{'}} - \sum_{i = 1}^{n} x_{i} \log x_{i} \\ {(y_{1}^{†}, z_{1}^{†}, \dots, y_{n}^{†}, z_{n}^{†})} = \arg \sup_{(y_{1}, z_{1}, \dots, y_{n}, z_{n}) ϵ ε^{'}} - \sum_{i = 1}^{n} y_{i} \log y_{i} + z_{i} \log z_{i} \end{array}

it holds that

y_{i}^{†} = z_{i}^{†} = \frac{x_{i}^{†}}{2}

for all 1 ≤ i ≤ n.

Proof. That the suprema are unique follows from the convexity of the sets ε, ε′ and the fact that

H_{Ω}^{n}, H_{Ω}^{2 n}

are strictly concave functions on ℙ_n, respectively, ℙ₂_n.

Recall that

ℒ

^U is the language introduced in Lemma 2.

y^{†} = z_{i}^{†} = \frac{x_{i}^{†}}{2}

is a direct consequence of

P_{Ω}^{†}

equivocating beyond

ℒ_{k}^{U}

(Proposition 9). □

Theorem 7. If g is regular, then g is infinite-language invariant.

Proof. Let

E_{ℒ}

be finitely generated by constraints expressible in

ℒ

_K. Let

ℒ

⊆

ℒ

¹ ⊆

ℒ

². By Theorem 6 we obtain minloss

B_{ℒ^{1}} = {maxent E}_{ℒ^{1}} = {P_{Ω}^{†^{1}}}

and minloss

B_{ℒ^{2}} = {maxent E}_{ℒ^{2}} = {P_{Ω}^{†^{2}}}

, where

P_{Ω}^{†^{1}}

and

P_{Ω}^{†^{2}}

are the standard entropy limits on

ℒ

¹, respectively,

ℒ

².

Let K₂ ∈ ℕ be minimal such that

ℒ_{K} \subseteq ℒ_{K_{2}}^{2}

, i.e., the set of the first K₂ constant symbols of

ℒ

² contains the constant symbols {t₁, …, t_K} of

ℒ

. It suffices to show that for all n ≥ K₂ and all

ν \in Ω_{n}^{1}

it holds that

P_{Ω}^{†^{1}} (ν) = P_{Ω}^{†^{2}} (ν)

, where

Ω_{n}^{1}

is the set of n-states of

ℒ

¹. Note that the constants in t₁, …, t_K are in

ℒ_{K_{2}}^{1}

.

Since the standard entropy limits is finite-language invariant (Section 4.2.1) it follows for n ≥ K₂ that

P_{Ω}^{†^{1}} (ν) = P_{Ω n}^{†^{1}} (ν)

, where

{P_{Ω n}^{†^{1}}} = \arg \sup_{P \in E_{n}^{1}} S_{Ω}^{n} (P)

, and

{P_{Ω}^{†^{2}}} (ν) = {P_{Ω n}^{†^{2}}} (ν)

, where

{P_{Ω n}^{†^{2}}} = \arg \sup_{P \in E_{n}^{2}} S_{Ω}^{n} (P)

.

We now obtain from Lemma 9 and Proposition 5 that

P_{Ω_{n}}^{†^{i}} (ν) = P_{Ω}^{†} (ω_{ν}) \frac{1}{2^{| ν | - | ω_{ν} |}}

where ω_ν is the unique maximal state of

ℒ

such that ν ⊨ ω_ν. Thus,

P_{Ω}^{†^{1}} (ν) = P_{Ω}^{†^{2}} (ν)

. □

So, neither adding new redundant names for individuals in the domain to

ℒ

nor adding relation symbols which are not constrained by the agent’s evidence on

ℒ

changes one’s rational beliefs in the sentences φ ∈ S

ℒ

.

Language invariance is an important desideratum for reasoning under uncertainty. We have seen that focussing on regular weighting functions ensures language invariance. We conjecture that, if one imposes the desiderata that g be atomic, inclusive, symmetric, refined and infinite-language invariant, then the standard entropy maximiser will be the belief function with the best loss profile. If this is the case then our results for regular weighting functions, which are strongly refined, are symptomatic of a more general phenomenon.

5. Handling Quantifiers

Thus far, we have shown that, on a language

ℒ

^∄ without quantifiers, if the evidence is finitely generated and the weighting function is regular, then the belief function that has the best lost profile is the probability function in

[E_{ℒ}]

that maximises standard entropy. This provides a justification for all the norms of objective Bayesianism on a language without quantifiers.

As we shall see in Section 5.1, that the language is quantifier free was key here: on a language

ℒ

^∄ with quantifiers, the n-scores become infinite, which makes the comparison of loss profiles impossible. That the evidence is finitely generated is also key: we shall see in Section 6.1 that the minimax result need not hold true if the evidence is not finitely generated.

While the use of scoring rules cannot be readily adapted to a quantified language

ℒ

^∄, we shall see in Section 5.2 that we can nevertheless justify the norms of objective Bayesianism on

ℒ

^∄ if we extend our notion of loss profile and add two further desiderata motivated by the application of objective Bayesianism to inductive logic: that inferences should be language invariant, and that, ceteris paribus, universal hypotheses should be afforded substantial credence.

5.1. Limits to the Minimax Approach

Here we explain why the minimax analysis adopted in Section 4 cannot be applied to the case of a language with quantifier symbols. The problem is that n-score becomes infinite, making it impossible to compare the scores of different belief functions.

There are two ways in which n-score becomes infinite. The first is through a failure of super-regularity. A probability function is super-regular, if it gives every contingent sentence positive probability. Now, many probability functions that seem eminently rational are not super-regular. For example, if one has no evidence,

E_{ℒ} = ℙ_{ℒ}

, then it is plausible that one is rationally entitled (even if not rationally compelled) to adopt the equivocator function P=, which gives each n-state the same probability, as one’s belief function. However, this probability function will give zero probability to a universally quantified sentence such as ∀xUx. More generally, if evidence is finitely generated then no inclusive, symmetric entropy maximiser will be super-regular:

Proposition 18. Let

E_{ℒ}

be finitely generated and let g be symmetric and inclusive. If the sequence

{(P_{n}^{†})}_{n \in ℕ}

has a point of accumulation Q ∈ ℙ_$ℒ$, then Q is not super-regular.

Proof. Let U be a relation symbol in

ℒ

of arity r, say. For all n ∈ ℕ let

φ^{n} : = \underset{_{ω ⊨ \land_{i = 1}^{n} U t_{i}}^{ω \in Ω_{n}}}{\lor} ω,

where t_i denotes the tuple of r repetitions of t_i.

If

P_{K}^{†} (φ^{K}) = 0

, then by the open-mindedness of entropy maximisers

P_{n}^{†} (φ^{K}) = 0

for all n ≥ K. Thus, for all points of accumulation Q ∈ ℙ_$ℒ$ it holds that Q(φ^K) = 0. Hence, Q is not super-regular.

If

P_{K}^{†} (φ^{K}) > 0

, then we apply Proposition 9 to find that for all l ≥ n

\begin{array}{l} P_{l}^{†} (φ^{n}) = P_{l}^{†} (φ^{K}) \frac{| Ω_{K}^{U} |}{| Ω_{n}^{U} |} \\ \leq P_{l}^{†} (φ^{K}) 2^{K - n} \\ \leq 2^{K - n}, \end{array}

Let Q be a point of accumulation of

{(P_{n}^{†})}_{n \in ℕ}

and let

{(P_{n}^{†})}_{n_{j}}

be a subsequence which converges to Q. Since K is fixed we now find

\begin{array}{l} 0 \leq Q (\forall x U x) \\ ^{\underline{\underline{P 3}}} \lim_{j \to \infty} Q (\land_{i = 1}^{n_{j}} U t_{i}) \\ = \lim_{j \to \infty} \lim_{m \to \infty} P_{n_{m}}^{†} (\land_{i = 1}^{n_{j}} U t_{i}) \\ = \lim_{j \to \infty} \lim_{m \to \infty} P_{n_{m}}^{†} (φ^{n_{j}}) \\ \leq \lim_{j \to \infty} 2^{K - j} \\ = 0. \end{array}

Q is not super-regular. □

Now, a failure of super-regularity is not normally problematic—it is simply a well accepted fact that probability theory forces probability 0 (respectively 1) on many sentences which might be true (respectively false). For example, the strong law of large numbers and the various zero-one laws force extreme probabilities. Moreover, the issue of super-regularity did not arise on

ℒ

^∄, where no contingent sentences are given probability 0 by the entropy maximisers considered above. However, a problem does emerge if we try to apply the scoring rule approach to

ℒ

^∄, where super-regularity becomes pertinent. If θ is possible yet is given zero belief by belief function B then the logarithmic loss, −log B(θ), is infinite if θ turns out to be true. Hence, as long as some epistemically possible physical probability function gives positive probability to θ, belief function B will have infinite score. When scores become infinite, they cannot be readily used to compare belief functions. It is clear, for example, that some non-super-regular belief functions will have better loss profiles than others, but this will not be apparent if we define loss profiles in terms of scores. This problem appears to limit the scope of scoring rules to languages without quantifiers.

One might suggest here that the fact that non-super-regular functions lead to infinite scores merely serves to show that one should adopt a super-regular function as one’s belief function. However, there are good grounds for questioning such a conclusion. In particular, consider again the case of a total absence of evidence. As mentioned above, imposing super-regularity rules out the equivocation function P= as a viable belief function. This means that any super-regular function must, in the total absence of evidence, force a skewed distribution on the n-states, for some n. Thus, one is forced to believe some states to a greater degree than others, despite the fact that one has no evidence to distinguish any such state from any other. So super-regularity leads to very counter-intuitive consequences and the infinite score problem suggests that the scoring rule approach breaks down on languages with quantifiers.

There is a second way in which the scores become infinite when quantifiers are admitted into the language. When one admits quantifiers into the language, one introduces the possibility of infinite partitions (Example 1) and it is natural, when defining a scoring rule on such a language, to consider scores on these infinite partitions. If a weighting function is inclusive then for any sentence

θ \in S ℒ^{\exists}

, some partition containing θ will be given positive weight. If it is refined, then any partition that refines this partition will be given positive weight, including any infinite partition which refines this partition. The problem is that, even in the total absence of evidence, every belief function has infinite worst-case expected loss over such a partition:

Proposition 19. If there exists a partition

π_{\infty} \in \prod_{ℒ}

consisting of infinitely many sentences such that g(π_∞) > 0, then for all

B \in B_{ℒ}

it holds that

\sup_{P \in ℙ_{ℒ}} - \sum_{φ \in π} g (π_{\infty}) P (φ) \log B (φ) = + \infty .

Proof. Let π_∞ = {φ₁, φ₂, … }. Let

B \in B_{ℒ}

be arbitrary but fixed.

If there exists a φ ∈ π_∞ such that B(φ) = 0, then any

P \in ℙ_{ℒ}

with P (φ) > 0 satisfies

\sum_{φ \in π} - g (π_{\infty}) P (φ) \log B (φ) = + \infty

.

Now assume that B(φ_n) > 0 for all n ∈ ℕ.

Since

B \in B_{ℒ}

it holds that

\sum_{φ \in π_{\infty}} B (φ) \leq 1

. Thus, there has to exists an infinite set ℕ_B ⊆ ℕ \ {1} such that n ∈ ℕ_B implies

0 < B (φ_{n}) < \frac{1}{n} \leq \frac{1}{2}

. Let

{n_{1}^{B}, n_{2}^{B}, \dots}

be an enumeration of ℕ_B. Let

{m_{2}^{B}, m_{3}^{B}, \dots}

be an enumeration of an infinite subset of ℕ_B such that

0 < B (φ_{m_{k}^{B}}) \leq \frac{1}{e^{(k^{2})}} < 1

and

m_{k}^{B} < m_{k + 1}^{B}

for all k ∈ ℕ \ {1}. Since the

n_{k}^{B}

tend to infinity, such a sequence

{(m_{k}^{B})}_{k \in ℕ \ {1}}

has to exist.

Recall that

\sum_{n \in ℕ} \frac{1}{n^{2}} = \frac{π^{2}}{6}

. Let

P \in ℙ_{ℒ}

be such that for k ≥ 2 it holds that

\begin{array}{l} P (φ_{m_{k}^{B}}) : = \frac{6}{π^{2}} \cdot \frac{1}{k_{2}} \\ P (φ_{1}) : = 1 - \sum_{k = 2}^{\infty} P (φ_{m_{k}^{B}}) = \frac{6}{π^{2}} \\ P (φ_{n}) : = 0 for all n \in ℕ \ {1, m_{2}^{B}, m_{3}^{B}, \dots} . \end{array}

We now explain why such a probability function

P \in ℙ_{ℒ}

exists.

The idea is to define a measure which assigns the set of term structures which are a model of

φ_{m_{k}}^{B}

the value

\frac{6}{π^{2}} \frac{1}{k^{2}}

and assigns value zero to all other term structures which do not model any of the

φ_{m_{k}}^{B}

. The probability of an arbitrary sentence

χ \in S ℒ

is then measure assigned to all term structures in which χ holds. One has to be careful of how to set up this measure. Fortunately, the recipe for doing so is well-known.

We follow [7] (pp. 164) and define a term structure

ℳ

of

ℒ

as a structure with domain {t_n : n ∈ ℕ} and each constant symbol t_n of

ℒ

is interpreted in

ℳ

as itself. We use T

ℒ

to denote the set of term structures of

ℒ

.

Now let

P (T ℒ)

denote the power set of

T ℒ

and put

\begin{matrix} T (θ) : = {ℳ \in T ℒ : ℳ | = θ} \\ R : = {T (θ) : θ \in S ℒ^{∄}} \subseteq P (T ℒ) . \end{matrix}

For a quantified sentence θ = ∃xθ(x) let T(θ) := ∪_i_∈ℕT(θ(t_i)), similarly for the universal quantifier ∀.

Now let μ* be any (finitely additive and normalised to one) outer measure on

P (T ℒ)

such that

μ * (φ_{m_{k}}^{B}) = \frac{6}{π^{2}} \frac{1}{k^{2}}

. Particularly simple such outer measures μ* are measures which for all m_k assign a single particular term structure

ℳ

in which

φ_{m_{k}}^{B}

holds the value

\frac{6}{π^{2}} \frac{1}{k^{2}}

.

Next, define R^∞ to be the smallest subset of

P (T ℒ)

which contains R and is closed under complements and countable unions. We now define a countably additive measure μ^∞ on R^∞ as follows: μ^∞ : R^∞ → [0, 1] such that μ^∞(A) = μ*(A) for all A ∈ R^∞.

Letting P(θ) := μ^∞(T(θ)) defines a probability function as shown in [7] (pp. 168–171). Furthermore, by construction

μ * (φ_{i}) = \frac{6}{i^{2} π^{2}} = P (φ_{i})

.

Having demonstrated the existence of the required probability function P, we now show that, for this function P, B incurs an infinite loss. Intuitively, P(φ_n) can be obtained from the sequence

{(\frac{1}{k^{2}})}_{k \in ℕ}

by inserting zeros and normalising by multiplying with

\frac{6}{π^{2}}

. The idea behind this definition is to ensure that for all k ∈ ℕ there exists a unique n ∈ ℕ_B such that

P (φ_{n}) = \frac{6}{π^{2}} \cdot \frac{1}{k^{2}}

. Furthermore, for these n ∈ ℕ_B it holds that

B (φ_{n}) \leq \frac{1}{e^{(k^{2})}}

. For all other n > 1 we ensure that P(φ_n) vanishes; P(φ₁) is defined in such that Σ_φ_∈π P(φ)=1 holds.

So, when P(φ_n) > 0 and

n \in {m_{2}^{B}, m_{3}^{B}, \dots,}

we have

\begin{array}{l} - P (φ_{n}) \log B (φ_{n}) \geq \frac{6}{π^{2}} \frac{1}{k^{2}} \log e^{(k^{2})} \\ = \frac{6}{π^{2}} \frac{1}{k^{2}} k^{2} \log e \\ = \frac{6}{π^{2}} . \end{array}

Finally, we obtain

\begin{array}{l} - \sum_{φ \in π} g (π_{\infty}) P (φ) \log B (φ) \geq g (π_{\infty}) \sum_{m_{2}^{B}, m_{3}^{B}, \dots} \frac{6}{π^{2}} \\ = + \infty . \end{array}

In particular, even the super-regular belief functions have infinite score on any such partition, so one cannot say that any super-regular function has lower overall score than a non-super-regular function. This result, then, casts further doubt on the suggestion that it might be preferable to adopt a super-regular function as one’s belief function. Moreover, it clearly suggests that an attempt to extend the minimax approach, which is based on scoring rules, to languages with quantifiers will be fraught with difficulty.

5.2. The Probability Norm

We have argued that there is little scope for straightforwardly extending the minimax analysis to languages with quantifiers because of the problem that scores will quickly become infinite and thus incomparable. So we need another approach, if we are to show that the Probability axioms P1-P3, as well as the Calibration and Equivocation norms, are to apply to languages with quantifiers.

Our plan of attack is as follows. First, as noted in Section 4.5, language invariance is an important desideratum. In particular, one would not want one’s degrees of belief on the sentences of a quantifier-free language

ℒ^{∄}

to change if one were to introduce quantifiers into the language. That is, if evidence determines that one should adopt B₁ as one’s belief function on

ℒ^{∄}

and B₂ as one’s belief function on

ℒ^{\exists}

, where both languages contain the same individuals and relation symbols, then one would want B₁ and B₂ to agree on quantifier-free sentences of

ℒ

, i.e., one would want that B₁(θ) = B₂(θ) for each

θ \in S ℒ^{∄}

.

Thus far, we have argued that a belief function on

ℒ^{∄}

, given finitely generated

E

, ought to satisfy the axioms of probability P1 and P2 on

ℒ^{∄}

, as well as the Calibration and Equivocation norms. Given the language invariance desideratum, this implies that the appropriate belief function on

ℒ^{\exists}

, should, when restricted to quantifier-free sentences, satisfy P1, P2 and the Calibration and Equivocation norms. If we can show that the probability axioms P1-3 should also be satisfied on the language

ℒ^{\exists}

as a whole, then degrees of belief in the quantified sentences are uniquely determined by those on the quantifier-free sentences [7] (Theorem 11.2): there is no further role that Calibration or Equivocation can play on the quantified sentences. Thus it suffices to argue for the probability axioms on

ℒ^{\exists}

. As usual, we restrict attention to evidence sets that are finitely generated in the sense of Definition 5, i.e.,

E_{ℒ}

generated by constraints involving sentences of some

ℒ_{K}^{∄}

and regular weighting functions g.

In Theorem 4 we showed that the default loss incurred by adopting belief function B when φ is true is such that L(φ, B) = − log B(φ), modulo some multiplicative constant. This penalises smaller degrees of belief more than larger degrees of belief. As discussed above, there is little scope for using this to measure the overall expected loss incurred by B on

ℒ^{\exists}

, and so we cannot directly extend the notion of loss profile developed in Definition 21 to

ℒ^{\exists}

. However, this default loss function does suggest the following constraint:

(*) Suppose that for all

θ \in S ℒ^{\exists}

, B(θ) ≥ B′(θ), and there is some

φ \in S ℒ^{\exists}

such that B(φ) > B′(φ). Then B has a better loss profile than B′.

In other words, if the default loss incurred by B′ dominates that incurred by B then B has a better loss profile than B′. We can use (*) to extend our notion of loss profile: the two conditions in Definition 21 apply to quantifier-free sentences in

ℒ^{\exists}

, and we add the further condition (*) to constrain the quantified sentences. We shall show that the addition of (*) goes some way towards demonstrating P1-3 on

ℒ^{\exists}

, although we shall have to add a further desideratum in order to complete the derivation.

Definition 24 (Better loss profile on

ℒ^{\exists}

). B has a better loss profile on

ℒ^{\exists}

than B′ if and only if:

B ≺ B′ (as defined in Definition 21), or
B dominates B′ on $ℒ^{\exists}$ and there exists some $φ \in S ℒ^{\exists}$ such that B(φ) > B′(φ).

We write B ≺* B′ to denote that B has a better loss profile on

ℒ^{\exists}

than B′. Clearly, ≺* is asymmetric. We will be interested in those belief functions on

ℒ^{\exists}

that have the best loss profile on

ℒ^{\exists}

, i.e., the minimal elements of ≺*, and define:

minloss * B_{ℒ} : = {B \in B_{ℒ} : t h e r e i s o n B^{'} \in B_{ℒ} s u c h t h a t B^{'} ≺ * B} .

(14)

Note that if B dominates B′ on

ℒ^{\exists}

, then B ≺ B′ cannot hold. ≺ and ≺* are thus consistent.

Proposition 20. All B ∈ minloss*

B_{ℒ}

agree with

P_{Ω}^{†}

on

ℒ^{∄}

.

Proof. Since we assume that g is regular and that

E_{ℒ}

is finitely generated we can apply Theorem 6 to obtain that all all B ∈ minloss

B_{ℒ}

agree with

P_{Ω}^{†}

on

ℒ^{∄}

.

The claim now follows, since B ≺ B′ implies B ≺* B′. □

Proposition 21. If minloss

B_{ℒ} = \emptyset

, then minloss*

B_{ℒ} = \emptyset

.

Proof. ≺ is asymmetric, irreflexive and transitive, Proposition 10; and thus free of cycles. Hence, for all fixed

B^{'} \in B_{ℒ}

there exists some

B \in B_{ℒ}

such that B ≺ B′. This implies B ≺* B′.

Hence, for all

B^{'} \in B_{ℒ}

there exists some

B \in B_{ℒ}

such that B ≺* B′. We obtain minloss*

B_{ℒ} = \emptyset

. □

We shall use

B_{†} \in B_{ℒ}

to denote an arbitrary but fixed belief function in minloss*

B_{ℒ}

. A priori, it is not clear that such a function B_† exists.

The rest of this section does not depend on

E_{ℒ}

, the weighting function g nor the particular probability function the B ∈ minloss

B_{ℒ}

agree with on

ℒ^{∄}

. All that matters is that there exists some probability function

P \in ℙ_{ℒ}

the B ∈ minloss

B_{ℒ}

agree with on

ℒ^{∄}

. As we know, this is the case if

E_{ℒ}

is finitely generated and g is regular.

Definition 25. A sentence

φ \in S ℒ^{\exists}

is called contingent, if and only if φ and ¬φ are satisfiable.

Lemma 10. For all θ,

φ \in S ℒ^{\exists}

such that θ |= φ it holds that B_†(φ) ≥ B_†(θ). In particular, B_†(ψ) = 0 for all contradictions

ψ \in S ℒ^{\exists}

and B_†(χ) = 1 for all tautologies

χ \in S ℒ^{\exists}

.

For θ,

φ \in S ℒ^{\exists}

we have already seen that B_†(φ) ≥ B_†(θ), this followed from B_† satisfying P1 and P2 on

ℒ^{∄}

.

Proof. Case 1. θ is a contradiction.

For a tautology

τ \in S ℒ^{∄}

, {τ, θ} is a partition. Since B_†(τ) = 1 and B_†(τ) + B_†(θ) ≤ 1 it follows that B_†(θ) = 0. Hence, B_†(φ) ≥ 0 = B_†(θ).

Case 2. θ is a tautology.

Let

χ \in S ℒ^{\exists}

be a contradiction. We just proved that B_†(χ) = 0. The only constraints applying to B_†(θ) are of the form B_†(θ) + B_†(χ) ≤ 1 where χ is a contradiction and of the form B_†(θ) ≤ 1. Thus, the only meaningful constraint on B_†(θ) is B_†(θ) ≤ 1. By (*) we have B_†(θ) = 1.

Since θ implies φ, φ has to be a tautology, too. Hence, B_†(φ) = 1 = B_†(θ).

Case 3. θ is contingent.

If φ is a tautology, then B_†(φ) = 1 by the above and we are done.

Note that φ cannot be a contradiction since θ is satisfiable.

Assume from now on that φ is contingent.

Case 3A |= θ ↔ φ.

For all index sets I and all sentences

φ_{i} \in S ℒ^{\exists}

the following are equivalent

${φ} \cup \cup_{i \in I} {φ_{i}} ϵ \prod_{ℒ}$ ,
${θ} \cup \cup_{i \in I} {φ_{i}} ϵ \prod_{ℒ}$

(*) implies that B_†(φ) = B_†(θ).

Case 3B θ, φ and φ ∧ ¬θ are contingent.

Let I be any countable index set and let

φ_{i} \in S ℒ^{\exists}

for i ∈ I be contingent such that

{φ} \cup \underset{i \in I}{\cup} {φ_{i}} ϵ \prod_{ℒ} .

Then by the consistency of θ and φ ∧ ¬θ

{θ \land φ} \cup {φ \land \neg θ} \cup \underset{i \in I}{\cup} {φ_{i}} ϵ \prod_{ℒ} .

And since θ |= φ

{θ} \cup {φ \land \neg θ} \cup \underset{i \in I}{\cup} {φ_{i}} ϵ \prod_{ℒ} .

From normalisation (Definition 1) we now obtain

B_{†} (φ) + \sum_{i \in I} B_{†} (φ_{i}) \leq 1

(15)

B_{†} (θ) + B_{†} (φ \land \neg θ) + \sum_{i \in I} B (φ_{i}) \leq 1.

(16)

Note that the equations in (15) are the only constraints which constrain B_†(φ). In particular, B_†(φ) = B_†(θ) will not violate any constraint in (15).

The question arises whether B_†(ϕ) = B_†(θ) imposes any further constraints?

B_†(ϕ) only imposes constraints on the B_†(φ_i) for i ∈ I. Let i ∈ I be fixed and let J be an index set and

{(ψ_{j})}_{j \in J} \in S ℒ^{\exists}

be such that

{φ_{i}} \cup {φ} \cup \cup_{j \in J} {ψ_{j}} \in \prod_{ℒ}

. Then

{φ_{i}} \cup {θ} \cup {φ \land \neg θ} \cup \cup_{j \in J} {ψ_{j}} \in \prod_{ℒ}

. Thus, B_†(φ) = B_†(θ) does not impose any further constraint on B_†(φ_i) which is not already imposed by B_†(θ).

By (*) we now find B_†(θ) ≤ B_†(φ). □

Corollary 7. B_† respects logical equivalence on

ℒ^{\exists}

.

Proof. If φ,

θ \in S ℒ^{\exists}

are logically equivalent, then B_†(φ) ≤ B_†(θ) ≤ B_†(φ) and thus B_†(φ) = B_†(θ). □

Corollary 8. For all

\exists x θ (x) ϵ S ℒ^{\exists}

it holds that

\lim_{n \to \infty} B_{†} (\lor_{i - 1}^{n} θ (t_{i})) \leq B_{†} (\exists x θ (x)) .

Proof. First note that

\lor_{i = 1}^{n} (θ (t_{i}))

implies

\lor_{i = 1}^{n} (θ (t_{i}))

. Thus,

B_{†} (\lor_{i = 1}^{n} (θ (t_{i})))

is a (not necessarily strictly) increasing sequence in [0, 1] which has a limit. Finally, note that for all

n \in ℕ \lor_{i = 1}^{n} (θ (t_{i}))

implies ∃xθ(x). Hence, B_†(∃xθ(x)) has to be greater or equal than the limit. □

Corollary 9 (Superadditivity of B_† on

ℒ^{\exists}

). If |= ¬ (θ ∧ φ), then B_†(θ) + B_†(ϕ) ≤ B_†(θ ˅ φ).

Proof. If either θ or φ is a contradiction or a tautology, then the Corollary follows trivially.

If θ ˅ φ is a tautology, then the corollary follows trivially, too.

It remains to consider the case of contingent θ ˅ φ. By the above we may assume that θ and φ are contingent. Let I be any countable index set and let

φ_{i} \in S ℒ^{\exists}

for i ∈ I be satisfiable such that

{θ} \cup {φ} \cup \underset{i \in I}{\cup} {φ_{i}} \in \prod_{ℒ} .

Then,

{θ \lor φ} \cup \underset{i \in I}{\cup} {φ_{i}} \in \prod_{ℒ} .

From normalisation (Definition 1) we now obtain

\begin{array}{r} B_{†} (θ) + B_{†} (φ) + \sum_{i \in I} B_{†} (φ_{i}) \leq 1 \\ B_{†} (θ \lor φ) + \sum_{i \in I} B_{†} (φ_{i}) \leq 1. \end{array}

The same reasoning a in Lemma 10 about constraints now yields: B_†(θ) + B_†(φ) ≤ B_†(θ ˅ φ).

Lemma 11. For all

θ \in S ℒ^{\exists}

it holds that B_†(θ) + B_†(¬θ) = 1.

In particular, this means that

B_{†} (\exists x θ (x)) + B_{†} (\forall x \neg θ (x)) = 1 for all \exists x θ (x) ϵ S ℒ^{\exists}

.

Proof. If θ is not contingent, then the lemma holds trivially.

Now assume that θ is contingent and B_†(θ) + B_†(¬θ) < 1.

Case 1 There exist contingent

{(φ)}_{i \in I}, {(ψ)}_{j \in J} \in S ℒ^{\exists}

such that

\begin{array}{r} {θ} \cup \underset{i \in I}{\cup} {φ_{i}} \in \prod_{ℒ} \\ {\neg θ} \cup \underset{j \in J}{\cup} {ψ_{j}} ϵ \prod_{ℒ} \end{array}

with

\begin{array}{r} B_{†} (θ) + \sum_{i \in I} B_{†} (φ_{i}) = 1 \\ B_{†} (\neg θ) + \sum_{j \in J} B_{†} (ψ_{j}) = 1. \end{array}

Note that

\cup_{i \in I} {φ_{i}} \cup \cup_{j \in J} {ψ_{j}} ϵ \prod_{ℒ}

and thus

\sum_{i \in I} B_{†} (φ_{i}) + \sum_{j \in J} B_{†} (ψ_{j}) \leq 1

. Adding the above equations we now obtain

\begin{array}{l} 2 = B_{†} (θ) + \sum_{i \in I} B_{†} (φ_{i}) + B_{†} (\neg θ) + \sum_{j \in J} B_{†} (ψ_{j}) \\ \leq B_{†} (θ) + B_{†} (\neg θ) + 1. \end{array}

B_†(θ) + B_†(¬θ) ≥ 1 follows. Contradiction.

Case 2 For all

π \in \prod_{ℒ}

with θ ∈ π and all

π^{'} \in \prod_{ℒ}

with

\neg θ \in {π^{'}}_{ℒ}

it holds that

\sum_{φ \in π} B_{†} (φ) < 1

and

\sum_{ψ \in π^{'}} B_{†} (ψ) < 1

.

Applying (*) we obtain a contradiction since B_†(θ) or B_†(¬θ) could have been set to a greater number.

Case 3 For all

π \in \prod_{ℒ}

with θ ∈ π it holds that

\sum_{ψ \in π} B_{†} (ψ) < 1

and there exists a partition

π^{'} \in \prod_{ℒ}

with

\neg θ \in {π^{'}}_{ℒ}

such that ∑_φ_∈π′ B_†(φ) = 1.

Let π′ comprise of contingent (φ_i)_i∈I and ¬θ. For

π \in \prod_{ℒ}

with θ ∈ π we have for all finite J ⊆ I that

\underset{j \in J}{\cup} {φ_{j}} \cup {θ \land \neg \underset{j \in J}{\lor} φ_{j}} \cup {ψ \in π : ψ \neq θ} ϵ \prod_{ℒ} .

In the same manner as in the proof of Lemma 10 it follows that B_†(θ) ≥ ∑_j_∈_J B_†(φ_j). Since this holds for all finite J ⊆ I and I can be at most countable, it follows that B_†(θ) ≥ ∑_i_∈_I B_†(φ_j).

From B_†(¬θ) + ∑_i_∈_I B_†(φ_j) = ∑_φ_∈_π′ B_†(φ) = 1 the required contradiction follows:

\begin{array}{l} B_{†} (θ) + B_{†} (\neg θ) \geq \sum_{i \in I} B_{†} (φ_{i}) + B_{†} (\neg θ) \\ = 1. \end{array}

□

(*) is not strong enough to uniquely determine constrain B_† on

ℒ^{\exists}

. We invoke the following further desideratum to pin down B_†: ceteris paribus, prefer belief function B to belief function B′ if B gives greater degree of belief to some universally quantified sentence than does B′. One has to be a bit careful about how one formulates such a principle, in order to specify it in such a way that it can be applied consistently. One can appeal to the concept of prenex normal form in order to formulate this desideratum:

(∀*) Suppose that neither of B, B′ have a better loss profile on

ℒ^{\exists}

than the other. Furthermore, suppose there exists a minimal quantifier rank q such that the following hold: For all

φ \in S ℒ^{\exists}

in prenex normal form with a quantifier rank of q−1 or less it holds that B(φ) = B(φ′) and for all universally quantified

θ \in S ℒ^{\exists}

in prenex normal form of quantifier rank q it holds that B(θ) ≥ B′(θ) and the inequality is strict at least once. Then B is to be preferred to B′.

The motivation behind (∀*) is not in terms of loss. Rather, the motivation stems from the application to inductive logic (see Section 3.3). The use of probability in inductive logic has been roundly criticised for tending to give non-tautological universal laws probability zero, when such laws are widely—and seemingly rationally—believed in science and beyond; see, e.g., Popper [17] (Appendix *vii). Thus there seems good reason to prefer, ceteris paribus, those probability functions which give more credence to universal hypotheses. (There is a flip-side to (∀*). The more credence one gives to a universal statement ∀xθ(x), the less credence one must give to ∃x¬θ(x). One might motivate the latter policy by appeal to Okham’s Razor, which demands scepticism with respect to the existence of entities—particularly new kinds of entity.)

This leaves us with some desiderata that stem from considerations to do with loss, namely the criteria that make up Definition 21—appealing to dominance of loss, dominance of expected loss, and worst-case expected loss—and some desiderata that stem from the application to inductive logic, namely language invariance and (∀*). These desiderata taken together are enough to justify the norms of objective Bayesianism on

ℒ^{\exists}

, as we shall proceed to show in the remainder of this section.

We shall see first that (∀*) is responsible for ensuring that the degree of belief B(∀xθ(x)), which is already constrained to

[0, \inf_{n \in ℕ} \land_{i = 1}^{n} B (θ (t_{i}))]

, is equal to the upper bound. On the other hand, B(∃xθ(x)) comes out to be

\sup_{n \in ℕ} \land_{i = 1}^{n} B (θ (t_{i}))

. An arbitrary belief function B_† ∈ minloss*

B_{ℒ}

which is also optimal according to (∀*) will be denoted by

B_{†}^{\forall}

.

Proposition 22. For all universally quantified sentences

\forall x θ (x) ϵ ℒ^{\exists}

it holds that

B_{†}^{\forall} (\forall x θ (x)) = \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} θ (t_{i}))

.Proof. First note that

\forall x θ (x) | = \land_{i = 1}^{n} θ (t_{i})

for all n ∈ ℕ and we thus obtain from Lemma 10 that

B_{†}^{\forall} (\forall x θ (x)) \leq \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} θ (t_{i}))

.

We now prove by an argument on quantifier ranks that

B_{†}^{\forall} (\forall x θ (x)) = \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} θ (t_{i})) .

Assume for contradiction that there exists a minimal quantifier rank q ≥ 1 and a sentence ∀xψ(x) in prenex normal form of quantifier rank q such that

B_{†}^{\forall} (\forall x ψ (x)) < \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} ψ (t_{i}))

.

We now define a function B′ which will be preferred to

B_{†}^{\forall}

which contradicts our standing assumption that no function is preferred to

B_{†}^{\forall}

. Let

B^{'} (χ) : = B_{†}^{\forall} (χ)

for all sentences

χ \in S ℒ^{\exists}

which are in prenex normal form and have a quantifier rank of q − 1 or less. In particular,

B_{†}^{\forall}

and B′ agree on

ℒ^{∄}

.

For all

φ (x) ϵ ℒ^{\exists}

in prenex normal form of quantifier rank q − 1 we let

B^{'} (\forall x φ (x)) : = \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} φ (t_{i}))

and

B^{'} (\exists x \neg φ (x)) : = \lim_{n \to \infty} B_{†}^{\forall} (\lor_{i = 1}^{n} \neg φ (t_{i})) .

Now arbitrarily extend B′ to a function in

B_{ℒ}

.

Note that

B^{'} (\forall x ψ (x)) > B_{†}^{\forall} (\forall x ψ (x))

and

B^{'} (\exists x \neg ψ (x)) < B_{†}^{\forall} (\exists x \neg ψ (x))

. So, (*) does not discriminate between

B_{†}^{\forall}

and B′. Hence,

B_{†}^{\forall}

and B′ are equally preferable according to ≺*.

B_{†}^{\forall}

and B′ agree on all sentences in prenex normal form of quantifier rank q−1. Since

B_{†}^{\forall} (\forall x φ (x)) \leq \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} φ (t_{i}))

has to hold for all

φ (x) ϵ ℒ^{\exists}

it follows that for φ(x) in prenex normal form of quantifier rank q − 1 that

B_{†}^{\forall} (\forall x φ (x)) \leq B^{'} (\forall x φ (x))

and for ∀xφ(x) = ψ the inequality is sharp. (∀*) now implies that B′ is preferred to

B_{†}^{\forall}

.

Finally, every sentence of the form ∀xθ(x) is logically equivalent to a universally quantified sentence φ = ∀xφ(x) in prenex normal. Note that θ(t) is logically equivalent to φ(t) for all constants t. Hence,

\begin{array}{l} B_{†}^{\forall} (\forall x θ (x)) = B_{†}^{\forall} (\forall x φ (x)) \\ = \lim_{n \to \infty} B_{†}^{\forall} (\land_{n = 1}^{n} φ (t_{i})) \\ = \lim_{n \to \infty} B_{†}^{\forall} (\land_{n = 1}^{n} θ (t_{i})) . \end{array}

□

Proposition 23.

B_{†}^{\forall}

satisfies the axiom P3.

Proof. Applying Lemma 11, Proposition 22 and applying Lemma 11 a second time we find

\begin{array}{l} B_{†}^{\forall} (\exists x θ (x)) = 1 - B_{†}^{\forall} (\forall x \neg θ (x)) \\ = 1 - \lim_{n \to \infty} B_{†}^{\forall} (\land_{i = 1}^{n} \neg θ (t_{i})) \\ = 1 - \lim_{n \to \infty} (1 - B_{†}^{\forall} (\lor_{i = 1}^{n} θ (t_{i}))) \\ = \lim_{n \to \infty} B_{†}^{\forall} (\lor_{i = 1}^{n} θ (t_{i})) . \end{array}

□

The following might be of interest outside the context of this paper since it generalises Gaifman’s Theorem, [5] (Theorem 1).

Proposition 24. If

f : S ℒ^{\exists} \to [0, 1]

satisfies

f(θ) = 1 for all tautologies $θ \in S ℒ^{∄} - [P 1 o n ℒ^{∄}]$ ,
for all mutually exclusive θ, $φ \in S ℒ^{∄}$ it holds that $f (θ \lor φ) = f (θ) + f (φ) - [P 2 o n ℒ^{∄}]$ ,
$f (\exists x θ (x)) = \sup_{m} P (\lor_{i = 1}^{m} θ (t_{i}))$ for all $\exists x θ (x) ϵ S ℒ^{\exists}$ and – [P3]
f respects logical equivalence on $ℒ^{\exists}$ − [P4],

then f is a probability function, i.e.,

f \in ℙ_{ℒ}

.

Clearly, P1 on

ℒ^{∄}

and P4 jointly imply P1.

Proof. First note that f agrees with some probability function on the quantifier free sentences of

ℒ

. By Gaifman’s Theorem, this probability function is unique on

ℒ^{\exists}

; it shall be denoted by P_f.

We now show that f = P_f. We need to show that for all

φ \in S ℒ^{\exists}

that f(φ) = P_f(φ).

First, write φ in prenex normal form, φ_pre. Note that f(φ) = f(φ_pre).

Next, we do a proof by induction on the quantifier-block rank of φ_pre to show that f(φ_pre) = P_f(φ_pre). The quantifier-block rank of φ_pre is the number of alternating quantifier blocks in φ_pre

Base case φ_pre is of quantifier block rank zero, i.e., φ_pre does not contain quantifiers. Then

\begin{array}{l} f (φ) = f (φ_{p r e}) \\ = P_{f} (φ_{p r e}) \\ = P_{f} (φ), \end{array}

where the second equation holds since f and P_f agree on all sentences of

ℒ^{∄}

. The first and the last equation hold since f and P_f respect logical equivalence on

ℒ^{∄}

. This fact will be used without further mention.

Inductive step φ_pre is of quantifier block rank q ≥ 1.

Let us first suppose that

φ_{p r e} = \exists \bar{x} χ (\bar{x})

For q ≥ 2 the first symbol of χ is a universal quantifier, ∀, for q = 1, the first symbol of χ is a relation symbol, a negation symbol or an opening bracket. We find for q = 1

\begin{array}{l} f (φ) = f (φ_{p r e}) \\ = f (\exists \bar{x} χ (\bar{x})) \\ \overset{p 3}{=} \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} f (\lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} χ (t_{i_{1}}, ..., t_{i_{k}})) \\ = \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} P_{f} (\lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} χ (t_{i_{1}}, ..., t_{i_{k}})) \\ \overset{p 3}{=} P_{f} (\exists \bar{x} χ (\bar{x})) \\ = P_{f} (φ_{p r e}) \\ = P_{f} (φ), \end{array}

where we may substitute P_f for f since χ is quantifier-free and we can thus apply the induction hypothesis.

For q ≥ 2

φ_{p r e} = \exists {\bar{x}}_{1} \forall {\bar{x}}_{2} ... Q {\bar{x}}_{q} χ ({\bar{x}}_{1}, {\bar{x}}_{2}, \dots)

, where Q = ∃ for odd q and Q = ∀ for even q.

First, here is an example of two logically equivalent sentences:

⊨ (\lor_{i = 1}^{2} \forall x_{1} \exists x_{2} U x_{1} x_{2} t_{i}) \leftrightarrow (\forall y_{1}^{1} \forall y_{2}^{1} \exists y_{1}^{2} \exists y_{2}^{2} \lor_{i = 1}^{2} U y_{i}^{1} y_{i}^{2} t_{i}) .

Note that the quantifier block rank on of the sentence on the right of “↔” is two. The quantifier block rank has been kept low at the price of larger blocks of quantifiers. Since we are giving a proof by induction on the quantifier block rank, we do not have to worry about paying this price. To denote the larger blocks we will use

\bar{y}

. In general, the greater the number of variables and on the left of an

{\bar{x}}_{i}

, the greater the number of variables in

{\bar{y}}_{i}

.

Now let us compute

\begin{array}{l} f (φ) = f (φ_{p r e}) \\ = f (\exists {\bar{x}}_{1} \forall {\bar{x}}_{2} ... Q {\bar{x}}_{q} χ ({\bar{x}}_{1}, {\bar{x}}_{2}, ... {\bar{x}}_{q})) \\ \overset{p 3}{=} \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} f (\lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} \forall {\bar{x}}_{2} ... Q {\bar{x}}_{q} χ (t_{i_{1}}, ..., t_{i_{k}}, {\bar{x}}_{2}, ... {\bar{x}}_{q})) \\ = \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} f (\forall {\bar{y}}_{2} ... Q {\bar{y}}_{q} \lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} χ (t_{i_{1}}, ..., t_{i_{k}}, {\bar{y}}_{2}, ..., {\bar{y}}_{q})) \\ \overset{I H}{=} \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} P_{f} (\forall {\bar{y}}_{2} ... Q {\bar{y}}_{q} \lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} χ (t_{i_{1}}, ..., t_{i_{k}}, {\bar{y}}_{2}, ..., {\bar{y}}_{q})) \\ = \lim_{n_{1} \to \infty} ... \lim_{n_{k} \to \infty} P_{f} (\lor_{i_{1} = 1}^{n_{1}} ... \lor_{i_{k} = 1}^{n_{k}} \forall {\bar{x}}_{2} ... Q {\bar{x}}_{q} χ (t_{i_{1}}, ..., t_{i_{k}}, {\bar{x}}_{2}, ..., {\bar{x}}_{q})) \\ \overset{P 3}{=} P_{f} (\exists {\bar{x}}_{1} \forall {\bar{x}}_{2} ... Q {\bar{x}}_{q} χ ({\bar{x}}_{1}, {\bar{x}}_{2}, ... {\bar{x}}_{q})) \\ = P_{f} (φ_{p r e}) \\ = P_{f} (φ) . \end{array}

“I H” indicates that we used the induction hypothesis on a sentence of quantifier rank q − 1.

The case of

φ_{p r e}

= ∀xχ(x) is analogous, simply replace the disjunctions by conjunctions.

Theorem 8. If

E_{ℒ}

is finitely generated and g is regular, then

{B_{†}^{\forall}} = {maxent E}_{ℒ} = {P_{Ω}^{^{†}}} .

Proof. By Proposition 24 we only need to convince ourselves that

B_{†}^{\forall}

satisfies P1 on

ℒ^{∄}

, P2 on

ℒ^{∄}

, P3 and P4 in order to conclude that

B_{†}^{\forall} \in P_{L}

. Note that we have done so in Theorem 6, Proposition 23 and Corollary 7. So all

B_{†}^{\forall}

are probability functions.

All

B_{†}^{\forall}

agree on

ℒ^{∄}

with

P_{Ω}^{^{†}}

. Two different probability functions have to disagree on a quantifier-free sentence (Gaifman’s theorem). Hence,

B_{†}^{\forall}

is a unique and equal to

P_{Ω}^{^{†}}

.

We should point out that (∀*) was only used in Proposition 23. We showed that (*) alone is enough to force that

B_{†}

satisfies P1, P2 on

ℒ^{∄}

,

B_{†} (\exists x θ (x)) \geq \lim_{n \to \infty} B_{†} (\lor_{i = 1}^{n} θ (t_{i}))

and P4.

In sum, then, by adding invoking two new considerations, (*) and (∀*), one can show that the Probability norm must hold on a predicate language with quantifiers. Since the Calibration and Equivocation norms are already forced on the quantifier-free sentences, and probabilities on these quantifier-free sentences determine those of the quantified sentences, all the norms of objective Bayesianism hold on

ℒ^{∄}

, assuming that the weighting function is regular and the evidence is finitely generated.

6. More Complex Evidence

The question arises as to which functions have an optimal loss profile when

E_{L}

is not finitely generated. In Section 6.2 we shall present a tractable case and show that in that example the function with maximal standard entropy has the best loss profile. First, in Section 6.1, we shall see that not all examples admit of such an analysis. In particular, we shall analyse an example in some depth in which

{P^{†}} = {maxent E}_{ℒ} but P^{†} 6 \notin {minloss B}_{ℒ}

. Thus, when evidence is not finitely generated, the optimal loss profile may not be achievable by maximising entropy.

6.1. When Losses Cannot Be Minimised

We shall now develop an example in which the minimax theorem fails:

P_{Ω}^{^{†}} \notin {minloss B}_{ℒ},

as we shall see in Proposition 27. However, the entropy identity,

P^{†} = {P_{Ω}^{†}} = {maxent E}_{ℒ}

, does hold (Proposition 25 and Proposition 26). The connection with optimal loss fails to obtain since minloss

B_{ℒ} = \emptyset

(Proposition 30). Thus, there is no belief function with an optimal loss profile in this sort of example. Nevertheless, certain equivocal functions

{\bar{P}}_{N}^{†}

derived from the maximal entropy function come arbitrarily close to having the best loss profile (Proposition 29 and Proposition 31). So, while there is no unique function with the best loss profile, the functions

{\bar{P}}_{N}^{†}

have a very good loss profile.

In the following discussion we shall focus on the most simple possible language,

ℒ = ℒ^{U}

, which contains only one relation symbol, U, which is unary. We focus on this simple language since the minimax results already fail here and considering more expressive languages does not lead to new insights while creating more notational issues. As a technical convenience, we extend the notion of a loss profile to arbitrary functions

f : S ℒ \to [0, 1]

, not merely normalised belief functions.

The example that we shall consider is generated by the following evidence:

E = {\neg U_{1} t_{i} \to \neg U_{1} t_{1} : i = 1, 2, \dots} .

Let

ω_{k}^{n} \in Ω_{n}

be the k-th n-state of

ℒ = ℒ^{U}

, i.e.,

ω_{1}^{n} : = \land_{l = 1}^{n} \neg U t_{1}, ω_{2}^{n} : = \land_{l = 1}^{n - 1} \neg U t_{1} \land U t_{n}, ..., ω_{n^{2}}^{n} : = \land_{l = 1}^{n} U t_{1}

. The set of calibrated probability functions can be characterized in various ways:

\begin{array}{l} E_{L} = {P \in ℙ_{ℒ} : P (\neg U_{1} t_{i} \to \neg U_{1} t_{1}) = 1, i = 1, 2, \dots} \\ = {P \in ℙ_{ℒ} : P (ω^{n}^{+ 1} 1) = P (ω_{1}^{n}) for all n \geq 1} \\ = {P \in ℙ_{ℒ} : P (ω_{i}^{n}) = 0 for 2 \leq i \leq 2^{n -}^{1}, n = 1, 2, ...} \\ = {P \in ℙ_{ℒ} : P (ω_{2}^{n}) = 0 for all n \geq 2} \\ = {P \in ℙ_{ℒ} : P (\neg U t_{1} \land U t_{n}) = 0 for all n \geq 2} \\ = {P \in ℙ_{ℒ} : P (ω_{1}^{{^{n}}^{+ 1}} | ω_{1}^{n}) = 1 for all n \in ℕ} \\ = {P \in ℙ_{ℒ} : P (ω_{2}^{{^{n}}^{+ 1}} | ω_{1}^{n}) = 0 for all n \in ℕ} \\ = {P \in ℙ_{ℒ} : P (\forall x (U t_{1} \lor \neg U x)) = 1} \\ = {P \in ℙ_{ℒ} : P (\exists x (\neg U t_{1} \land U x)) = 0} \end{array}

The last two characterisations employ quantifiers; adding quantifiers to the language enables a finite representation of what is essentially an infinitely generated evidence set. Hence in Definition 5, we specified that an evidence set is finitely generated just if it generated by quantifier-free sentences of some finite sublanguage.

We now begin our analysis of this example:

Proposition 25. If g = g_Ω or if g is symmetric and inclusive, then

ℙ^{^{†}} = {P_{Ω}^{^{†}}}

and

P_{Ω}^{^{†}}

is not open-minded.

Proof. For all

n \in ℕ

E_{n} = {P \in ℙ_{ℒ_{n}^{∄}} : p (ω_{i}^{n}) = 0 for all 2 \leq i \leq 2^{n - 1}} .

Then, by Landes and Williamson [4] (Corollary 6, p. 3574) for symmetric and inclusive g

P_{n}^{^{†}} (ω_{1}^{n}) = P_{n}^{^{†}} (ω_{i}^{n}) = \frac{1}{2^{n - 1} + 1} {for all 2}^{n - 1} + 1 \leq i \leq 2^{n}

and so for all

n \in ℕ

and all 1 ≤ i ≤ 2ⁿ⁻¹

\begin{array}{l} \lim_{n \to \infty} P_{n}^{†} (ω_{1}^{n}) = \lim_{n \to \infty} P_{n}^{†} (ω_{1}^{1}) \\ = \lim_{n \to \infty} \frac{1}{2^{n - 1} + 1} \\ = 0. \end{array}

For all

n \in ℕ

and i ∈ {1, 2ⁿ⁻¹+1,…,2ⁿ}

P^{†} (ω_{1}^{n}) = \frac{1}{2^{n - 1}} .

The result for g = g_Ω follows in the same way as above. □

We shall note for later reference that for all n ≥ 2

H_{Ω}^{n} (P_{n}^{†}) = - \log \frac{1}{2^{n - 1} + 1} > - \log \frac{1}{2^{n - 1}} = H_{Ω}^{n} (P_{Ω}^{†}) .

Proposition 26. If g = g_Ω or if g is regular, then

{maxent E}_{ℒ} = {P_{Ω}^{†}} .

Proof. First note that

[E_{ℒ}] = E_{ℒ}

.

We shall show that for all

Q \in E_{ℒ} \ {P_{Ω}^{†}}

there exists an

N \in ℕ

such that for all n ≥ N we have

H_{Ω}^{n} (P_{n}^{†}) > H_{Ω}^{n} (Q)

and

H_{g}^{n} (P_{Ω}^{†}) > H_{g}^{n} (Q)

.

Since

Q \neq P_{Ω}^{†}

there exists a minimal

k \in ℕ

and a k-state ν ∈ Ω_k such that Q(ν) > P_Ω^†(ν) ≥ 0.

Case 1

ν = ω_{1}^{k}

.

To simplify notation let α := P_k^†(ν) = P_k^†()

α : = P_{k}^{†} (ν) = P_{k}^{†} (ω_{1}^{n}) > 0 for all n \geq 1

Let us now define a function

Q^{'} \in E_{ℒ} \ {P_{Ω}^{†}}

. Note that since we want

Q^{'}

to be a member of

E_{ℒ}

we need to let

Q^{'} (ω_{1}^{n}) : = Q^{'} (ω_{1}^{1}) for all n \in ℕ

. Now let for all

n \in ℕ

\begin{array}{l} Q^{'} (ω_{1}^{n}) : α > 0 \\ Q^{'} (ω_{i}^{n}) : = 0 for 2 \leq i \leq 2^{n - 1} \\ Q^{'} (ω_{i}^{n}) : = \frac{1 - α}{2^{n - 1}} for 2^{n - 1} + 1 \leq i \leq 2^{n} \end{array}

The restriction operator

_{⇂ n}

applied to some belief function B continuous to refer to the restriction of B to

ℒ_{n}^{∄}

, rather than to the restriction to

ℒ_{n}

.

Note that for all n ≥ 1

{{Q^{'}}_{_{⇂ n}}} = \arg \sup_{\underset{P (ω_{1}^{n}) = α}{p \in E_{n}}} H_{g}^{n} (P)

since entropy maximisers assign n-states the same degree of belief whenever possible [4] (Corollary 7, p. 3577). Thus,

H_{g}^{n} (Q^{'}) \geq H_{g}^{n} (Q) for all n \in ℕ . Also, H_{Ω}^{n} (Q^{'}) \geq H_{Ω}^{n} (Q) for all n \in ℕ

.

Let us compute for n ≥ k

\begin{array}{l} H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q^{'}) = - \log (\frac{1}{2^{n - 1}}) - [- α \log (α) - (1 - α) \log (\frac{1 - α}{2^{n - 1}})] \\ = \log (2^{n - 1}) + α \log (α) + (1 - α) (\log (1 - α) - \log (2^{n - 1})) \\ = \log (2^{n - 1}) - (1 - α) \log (2^{n - 1}) + α \log (α) + (1 - α) \log (1 - α) \\ = α (n - 1) \log (2) + α \log (α) + (1 - α) \log (1 - α) . \end{array}

It follows that for all large enough n ∈

ℕ

that

H_{Ω}^{n} (P_{Ω}^{†}) > H_{Ω}^{n} (Q^{'}) \geq H_{Ω}^{n} (Q)

.

For regular g we now find

\begin{array}{l} H_{g}^{n} (P_{Ω}^{†}) - H_{g}^{n} (Q^{'}) = g (π^{n}) . [α (n - 1) \log (2) + α \log (α) + (1 - α) (\log (1 - α)] \\ - \sum_{π \in \prod_{n}} g (π) \sum_{F \in π} ° P_{Ω}^{†} (F) \log (° P_{Ω}^{†} (F)) - ° Q^{'} (F) \log (° Q^{'} (F)) . \end{array}

So, as long as

\sum π \in \prod_{n} \ {π^{n}} g (π)

goes to zero quickly enough it follows that

H_{g}^{n} (P_{Ω}^{†}) > H_{g}^{n} (Q^{'}) \geq H_{g}^{n} (Q)

for large enough n. Corollary 6 shows that this is indeed the case for regular g.

Case 2

ν \in {ω_{2}^{k}, \dots, ω_{2^{{_{k -}}_{1}}}^{k}}

Since Q is assumed to be calibrated,

Q \in E_{ℒ}

, this case cannot occur. Case 3

ν \in {ω_{2^{{_{k -}}_{1}} + 1}^{k}, \dots, ω_{2^{_{k}}}^{k}}

.

Case 3A

Q {ω_{1}^{k}} = 0

.

Then

Q {ω_{1}^{1}} = 0.

But for all n ∈ ℕ

\arg \sup_{\underset{P (ω_{1}^{1} = 0)}{P \in E_{n}}} H_{Ω}^{n} (P) = {P_{Ω ⇂ n}^{†}} = \arg \sup_{\underset{P (ω_{1}^{1} = 0)}{P \in E_{n}}} H_{g}^{n} (P)

Since Q ≠ P_Ω^† it follows that there exists some N ∈ ℕ such that

Q_{⇂ n} \neq P_{Ω ⇂ n}^{†} for all n \geq N

. But then

H_{g}^{n} (P_{Ω}^{†}) > H_{g}^{n} (Q) and H_{Ω}^{n} (P_{Ω}^{†}) > H_{Ω}^{n} (Q) for all n \geq N

.

Case B

Q {ω_{1}^{k}} > 0

.

Then

Q (ω_{1}^{k}) > 0 = P_{Ω}^{†} (ω_{1}^{k})

. Proceed as in Case 1. □

Proposition 27. If g = g_Ω or if g is regular, then

P_{Ω}^{†} \notin {minloss B}_{ℒ}

.

Proof. We here show that there exists an

R \in E_{ℒ}

such that for all n ∈ ℕ it holds that

S_{g}^{n} (R, P^{†}) = S_{Ω}^{n} (R, P^{†}) = \infty

and that there exists an open-minded

Q \in E_{ℒ}

such that for all n ∈ ℕ we have

\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) < \infty

.

Note that the probability function

R \in ℙ_{ℒ}

with

R (ω_{1}^{n}) : = 1

. Then

S_{g}^{n} (R, P_{Ω}^{†}) = S_{Ω}^{n} (R, P_{Ω}^{†}) = \infty for all n \in ℕ

.

We shall now construct an open-minded

Q \in E_{ℒ}

as advertised. For all n ∈ ℕ let

\begin{array}{l} Q (ω_{1}^{n}) : = \frac{1}{2} \\ Q (ω_{i}^{n}) : = 0 for all 2 \leq i \leq 2^{n - 1} \\ Q (ω_{i}^{n}) : = \frac{1}{2^{n}} for all 2^{n - 1} + 1 \leq i \leq 2^{n} . \end{array}

Thus, Q is open-minded and hence

\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) \leq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q) < + \infty

for all n∈ ℕ. □

Note that Condition 1 of Definition 21 is solely responsible for the fact that

P_{Ω}^{†} \notin {minloss B}_{ℒ}

. Condition 2 has played no role here.

So far, we have established that

P^{†} = P_{Ω}^{†}

does not have the best loss profile. The question arises whether there exists a belief function

B \in B_{ℒ}

which is a minimal element of ≺, i.e.,

B \in {minloss B}_{ℒ}

.

Proposition 28. If g = g_Ω, then

{minloss B}_{ℒ} = \emptyset = minloss ℙ_{ℒ} = {minloss E}_{ℒ}

Initially, one might suspect that

{minloss B}_{ℒ} = \emptyset

would be somehow due to the fact that the

S_{Ω}^{n}

do not take beliefs in all sentences into account. This is not the case. As we will see,

minloss ℙ_{ℒ} = \emptyset = {minloss E}_{ℒ}

holds. That is, even when restricting attention to probability functions, whose values on the n-states completely determine degrees of beliefs in all other sentences, we cannot find a function with an optimal loss profile.

Proof. Suppose for contradiction that

Q \in minloss ℙ_{ℒ} \ {P_{Ω}^{†}}

.

If Q is not open-minded, then there exists an N ∈ ℕ, an F ⊆ Ω_N and an

P \in E_{ℒ}

such that °P (F ) > 0 and °Q(F ) = 0. But then there has to exists some ω ∈ Ω_N with ω ∈ F such that P (ω) > 0 = Q(ω) since Q and P are probability functions. Thus, for all n ≥ N there exists some ν ∈ Ω_n such that ν = ω with P (ν) > 0 = Q(ν). But then

S_{Ω}^{n} (P, Q) = + \infty

for all n ≥ N.

In the proof of Proposition 27 we constructed an open-minded function Q⁺ ∈ E_$ℒ$. For Q⁺ we have for all n that

s u p_{p \in E_{ℒ}} S_{Ω}^{n} (P, Q^{+}) < + \infty

. So, any

Q \in minloss ℙ_{ℒ}

has to be open-minded.

Case 1

Q \in minloss ℙ_{ℒ} \ E_{ℒ}

and Q ∉ E_$ℒ$

Since

Q \in ℙ_{ℒ} \ E_{ℒ}

there has to exist a minimal k ≥ 2 such that

Q (ω_{2}^{k}) > 0

.

We next define a probability function

Q^{'} \in E_{ℒ}

with the following construction for all n ≥ 2

\begin{array}{l} Q^{'} (ω_{i}^{l}) : = Q (ω_{i}^{l}) for all 1 \leq l \leq k - 1 and all i \\ Q^{'} (ω_{1}^{n}) : = Q (ω_{1}^{k}) for all n \in ℕ \\ Q^{'} (ω_{1}^{n}) : = 0 for all n \geq k and all 2 \leq i \leq 2^{n - 1} \\ Q^{'} (ω_{i}^{n}) : = Q (ω_{i}^{n}) for all 2^{n - 1} + 1 \leq i \leq 2^{n} . \end{array}

It follows that for all n ≥ k and all

ω \in Ω_{n} \ {ω_{2}^{n}, \dots, ω_{2^{n - 1}}^{n}}

and all

P \in E_{n}

such that P (ω) > 0 it holds that.

Q (ω) \leq Q^{'} (ω)

For all large enough n ∈ N we then find

\begin{array}{l} \sup_{p \in E_{ℒ}} S_{Ω}^{n} (P, Q) \geq - \log \min_{2^{n - 1} + 1 \leq i \leq 2^{n}} {Q (ω)} \\ = - \log \min_{2^{n - 1} + 1 \leq i \leq 2^{n}} {Q^{'} (ω)} \\ = \sup_{p \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'}) \end{array}

Hence, there has to exists a

Q^{'} \in E_{ℒ} \cap minloss ℙ_{ℒ} with Q^{'} \neq P_{Ω}^{†}

.

Case 2 Q

Q \in minloss ℙ_{ℒ} and Q \in E_{ℒ} {P_{Ω}^{†}}

.

Thus,

0 < Q (ω_{1}^{1}) = Q (ω_{1}^{n}) for all n \geq 2.

Let N ≥ 3 be such that

Q (ω_{1}^{N}) > \min {Q (ω_{2^{N - 1} + 1}^{N}), ..., Q (ω_{2^{N}}^{N})}

For n ≥ N let

Ω_{n}^{-} : = \arg \min {Q (ω_{2^{n - 1} + 1}^{n}), ..., Q (ω_{2^{n}}^{n})} \subset Ω_{n}

We now find for all fixed n ≥ N that

\sup_{p \in E_{ℒ}} S_{Ω}^{n} (P, Q) = - \log Q (ω_{-}^{n}) for all ω_{-}^{n} \in Ω_{n}^{-} .

We shall now define a function

R \in E_{ℒ} \ {P_{Ω}^{†}}

by letting for all n ≥ 2:

\begin{array}{l} R (ω_{1}^{n}) : = \frac{Q (ω_{1}^{n})}{2} = \frac{Q (ω_{1}^{1})}{2} \\ R (ω_{i}^{n}) : = 0 for all 2 \leq i \leq 2^{n - 1} \\ R (ω_{i}^{n}) : = Q (ω_{i}^{n}) + \frac{Q (ω_{1}^{1})}{2} \frac{2}{| Ω_{n} |} > Q (ω_{i}^{n}) for all 2^{n - 1} + 1 \leq i \leq 2^{n} \end{array}

That is,

R = \frac{Q + P_{Ω}^{†}}{2}

.

For large enough M ∈ ℕ it holds for all n ≥ M that

R (ω_{1}^{n}) > \min {R (ω_{2^{n - 1} + 1}^{n}), ..., R (ω_{2^{n}}^{n})}

Furthermore, for all n ≥ max{M, N} it holds that

\arg \min {Q (ω_{2^{n - 1} + 1}^{n}), \dots, Q (ω_{2^{n}}^{n})} = \arg \min {R (ω_{2^{n - 1} + 1}^{n}), \dots, R (ω_{2^{n}}^{n})}

and hence for all large enough fixed n ∈ ℕ and all

ω_{-}^{n} \in Ω_{n}^{-}

\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, R) = - \log R (ω_{-}^{n}) < \log Q (ω_{-}^{n}) = \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) .

Thus, R has a better loss profile than Q. Hence,

Q \notin minloss ℙ_{ℒ} and Q \notin {minloss E}_{ℒ}

.

Finally, let us consider loss profiles for

B \in B_{ℒ} \ ℙ_{ℒ} .

.

Case 3

B \in {minloss B}_{ℒ} and B \notin ℙ_{ℒ}

..

For all

P \in ℙ_{ℒ}

, the expression

S_{Ω}^{n} (P, B)

only depends on the degrees of belief B assigns to sentences which represent an n-state. So, the degree of belief in a sentence φ ∈ S

ℒ

which does not n-represent an n-state are ignored by

S_{Ω}^{n} (P, B)

for all n and all

P \in ℙ_{ℒ}

. If B agrees with some probability function

P \in ℙ_{ℒ}

on all sentences of S

ℒ

^∄ which n-represent an n-state, then B and P are equally preferable according to ≺. As we saw above, for all

P \in ℙ_{ℒ}

there exists some

Q \in ℙ_{ℒ}

with Q ≺ P. Thus, B cannot be a minimal element of ≺.

We can hence assume that for all

P \in ℙ_{ℒ}

there exists some sentence

φ \in S ℒ_{n}^{∄}

which n-represents an n-state such that B(φ) 6= P(φ). Since no

P \in ℙ_{ℒ}

is dominated, it follows that B(ϕ) < P(φ).

First define a function B₀ as follows:

\begin{array}{l} B_{0} (φ) : = \underset{ρ ω = φ}{\inf_{ρ \in ϱ n}} B (ρ ω), if such an n \in ℕ and such a ρ \in ϱ_{n} exist, \\ B_{0} (φ) : = 0 otherwish . \end{array}

B₀, which does not agree with any probability function on

ℒ^{∄}

has been constructed in such a way that B and B₀ are equally preferred according to ≺.

Next define a function B⁺ by first letting for all fixed N ∈ ℕ

B^{+} (φ) : = \sup_{n \geq N} \sum_{\begin{matrix} v \in Ω_{n} \\ v = ω \end{matrix}} B_{0} (v)

for all sentences

φ \in S ℒ_{N}^{\exists}

which are logically equivalent to an N-state. Put B⁺(ψ) := 0 for all other

ψ \in S ℒ^{\exists}

.

Since B⁺ dominates B₀ the loss profile of B⁺ cannot be worse than that of B₀. Furthermore, note that for all N ∈ ℕ, all ω ∈ Ω_N and all n > N it holds that

B^{+} (ω) \geq \sum_{\begin{matrix} v \in Ω_{n} \\ v | = ω \end{matrix}} B^{+} (v) .

Let

α : = \lim_{n \to \infty} \sum_{ω \in Ω_{n}} B^{+} (ω)

. For α = 0 it follows by the usual reasoning that B⁺ cannot have an ideal loss profile. This leads to a contradiction in the usual way.

For 1 ≥ α > 0 define a function B^∞ by first letting for all sentences

φ \in S ℒ^{\exists}

which are logically equivalent to some n-state ω

B^{\infty} (φ) : = \frac{1}{α} \lim_{n \to \infty} \sum_{\begin{matrix} v \in Ω_{n} \\ v | = φ \end{matrix}} B^{+} (v) .

For all other sentences

φ \in S ℒ^{\exists}

let B^∞(φ) := 0.

Observe that for all k ∈ ℕ and all ω ∈ Ω_k

\begin{array}{l} B^{\infty} (ω) = \frac{1}{α} \lim_{n \to \infty} \sum_{\begin{matrix} v \in Ω_{n} \\ v | = ω \end{matrix}} B^{+} (v) \\ = \frac{1}{α} \lim_{n \to \infty} \sum_{\begin{matrix} λ \in Ω_{k + 1} \\ λ | = ω \end{matrix}} \sum_{\begin{matrix} v \in Ω_{n} \\ v | = λ \end{matrix}} B + (v) \\ = \sum_{\begin{matrix} λ \in Ω_{k + 1} \\ λ | = ω \end{matrix}} B^{\infty} (λ) . \end{array}

Finally, we note that B^∞ agrees with some

P \in ℙ_{ℒ}

on all sentences in

S ℒ^{\exists}

which represent a state. Then B cannot have a better loss profile than P. As we saw in Case1 and Case2, for all

P \in ℙ_{ℒ}

there exists a

Q \in ℙ_{ℒ}

which has a strictly better loss profile than P. This contradicts B ∈ minloss

B_{ℒ}

. □

Denote by

{\bar{P}}_{N}^{†}

the unique probability function in

E_{ℒ}

satisfying for all n ∈ ℕ

\begin{array}{l} {\bar{P}}_{N}^{†} (ω_{1}^{n}) = P_{N}^{†} (ω_{1}^{n}) = P_{N}^{†} (ω_{1}^{1}) = \frac{1}{2^{N - 1} + 1} \\ {\bar{P}}_{N}^{†} (ω_{i}^{n}) = 0 for all 2 \leq i \leq 2^{n - 1} \\ {\bar{P}}_{N}^{†} (ω_{i}^{n}) = (1 - \frac{1}{2^{N - 1} + 1}) \cdot \frac{2}{| Ω_{n} |} = \frac{1}{\frac{| Ω_{N} |}{2} + 1} \frac{| Ω_{N} |}{| Ω_{n} |} for all 2^{n - 1} + 1 \leq i \leq 2^{n} . \end{array}

That is,

{\bar{P}}_{N}^{†}

agrees with

{\bar{P}}_{N}^{†}

on

ℒ_{N}

and equivocates beyond

ℒ_{N}

as much as possible while satisfying

{\bar{P}}_{N}^{†} \in E_{ℒ}

Proposition 29. For all ϵ > 0 there exists an N ∈

ℕ

such that for all n ≥ N

\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{n}^{†}) \leq ϵ .

Proof. For all large enough N ∈

ℕ

and even larger n ∈

ℕ

we find

\begin{array}{l} 0 \leq \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{n}^{†}) \\ = - \log (\frac{\frac{| Ω_{N} |}{2}}{\frac{| Ω_{N} |}{2} + 1} \frac{2}{| Ω_{n} |}) + \log (\frac{1}{\frac{2^{n}}{2} + 1}) \\ = - \log (\frac{2^{N - 1}}{2^{N - 1} + 1}) + \log (\frac{2^{n}}{2}) + \log (\frac{1}{2^{n - 1} + 1}) \\ = - \log (\frac{2^{N - 1}}{2^{N - 1} + 1}) + \log (\frac{2^{n - 1}}{2^{n - 1} + 1}) . \end{array}

For ϵ > 0 let N > 2 be such that

0 < - \log \frac{2^{N - 1}}{2^{N - 1} + 1} < ϵ

. Then for all n ≥ N it holds that

0 > \log \frac{2^{n - 1}}{2^{n - 1} + 1} > \log \frac{2^{N - 1}}{2^{N - 1} + 1}

. For n ≥ N large enough we now obtain

\begin{array}{l} 0 \leq \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{n}^{†}) \\ = - \log (\frac{2^{N - 1}}{2^{N - 1} + 1}) + \log (\frac{2^{N - 1}}{2^{N - 1} + 1}) \\ < ϵ . \end{array}

□

Having considered loss for

g = g_{Ω}

we now investigate loss for regular

g

.

Proposition 30. If

g

is regular, then minloss

B_{ℒ} = \emptyset

.

Proof. We will show that ≺ has no minimal element. Suppose for contradiction that B ∈

B_{ℒ}

is such a minimal element.

Define a function

B^{'} : S ℒ \to [0, 1]

by

\begin{array}{l} B' (φ) : = 0 for all φ \in for which there exists an n \in with \lor_{i = 2}^{2^{n - 1}} ω_{i}^{n} = φ \\ B' (ψ) : = B (ψ) else . \end{array}

B′ and B are equally preferable according to ≺ since P (φ) = 0 for all P ∈

E_{ℒ}

and all such φ.

For all φ ∈

S ℒ^{∄}

let n_φ be the minimal n such that φ ∈

S ℒ_{n_{φ}}^{∄}

. Now define a function B^inf by first letting

B^{\inf} (φ) = \inf_{\begin{array}{l} ψ \in S ℒ_{n_{φ}}^{∄} \\ ⊨ φ \leftrightarrow ψ \end{array}} B^{'} (ψ) .

Put B^inf(φ) := B′(φ) for all other φ ∈

S ℒ^{\exists}

. For all φ ∈

S ℒ

it holds that B^inf(φ) ≤ B(φ). Furthermore, B^inf is equally preferable to B′ according to ≺. We now consider cases to show that there is a function with a strictly better loss profile than B^inf, which contradicts our assumption that B ∈ minloss

B_{ℒ}

.

Case A There exists some N ∈

ℕ

such that for all n ≥ N, B^inf and

P_{Ω}^{†}

agree on all n-states. Since

B \neq P_{Ω}^{†}

it holds that

B^{\inf} \neq P_{Ω}^{†}

and hence

B_{\frac{1}{2}} : = \frac{B^{\inf} + P_{Ω}^{†}}{2} \neq P_{Ω}^{†}

. Thus, for all n ≥ N

B_{\frac{1}{2}}

and

P_{Ω}^{†}

agree on all n-states. But then for all n ≥ N all F ⊆ Ω_n and all ρ ∈ ϱ_n

B^{\inf} (ρ F) \leq B_{\frac{1}{2}} (ρ F)

. Hence, for all P ∈

ℙ_{ℒ}

it holds that

S_{g}^{n} (P, B_{\frac{1}{2}}) \leq S_{g}^{n} (P, B^{\inf})

.

From the above we have that for all n ≥ N there exists an F ⊆ Ω_n such that

F \ {ω_{2}^{N}, \dots, ω_{2^{N - 1}}^{N}} = \emptyset

and such that

B^{\inf} (ρ F) < B_{\frac{1}{2}} (ρ F)

for some ρ. Thus, there exists some P ∈

E_{ℒ}

with °P(F) > 0. Then

S_{g}^{n} (P, B_{\frac{1}{2}}) < S_{g}^{n} (P, B^{\inf})

for this P ∈

ℙ_{ℒ}

and all n ≥ N.

Thus,

B_{\frac{1}{2}} ≺ B^{\inf}

by Condition 2 of Definition 21.

Case B There exist infinitely many n ∈

ℕ

where B^inf and

P_{Ω}^{†}

agree on all n-states and infinitely many n ∈

ℕ

many where they do not agree on all n-states.

Since

P_{Ω}^{†}

is a probability function it follows that for all n ∈

ℕ

, all F ⊆ Ω_n and all ρ ∈ ϱ_n

B^{\inf} (ρ F) \leq P_{Ω}^{†} (ρ F)

has to hold. Now proceed as in Case A.

Case C The number of n ∈

ℕ

for which B^inf and

P_{Ω}^{†}

agree on all n-states is finite (possibly zero).

Case C1 There exists an infinite set J ⊆

ℕ

, J = {j₁, j₂, … }, such that lim_i₋_→∞

\sum_{ω \in Ω_{j_{i}}} B^{\inf} (ω) = 1

.

If

P_{Ω}^{†}

dominates B^inf, we are done.

If

P_{Ω}^{†}

does not dominate B^inf, then define a function B₁ ∈

ℙ_{ℒ}

by letting for all n ∈

ℕ

and all F ⊆ Ω_n

° B_{1} (F) : = \lim_{i \to \infty} \sum_{\begin{array}{l} ω \in Ω_{j_{i}} \\ ω \in F \end{array}} B^{\inf} (ω)

and requiring that B₁ satisfies logical equivalence on L^∄. For all φ ∈

S ℒ^{\exists} \ S ℒ^{∄}

use Gaifman’s condition to ensure that B₁ is a probability function.

Since we assumed that

P_{Ω}^{†}

does not dominate B^inf

B_{1} \neq P_{Ω}^{†}

holds. Furthermore, B₁ dominates B^inf.So, the loss profile of B₁ ∈

ℙ_{ℒ}

is at least equally good as that of B.

We complete this proof by showing that

ℙ_{ℒ}

∩ minloss

B_{ℒ} = \emptyset

.

Now suppose for contradiction that there exists a function Q ∈

ℙ_{ℒ}

∩ minloss

B_{ℒ}

such that

Q (ω_{2}^{n}) > 0

for some n ≥ 2, i.e., Q ∉

E_{ℒ}

. It needs to hold that

Q (ω_{1}^{n}) > 0

for all n ∈

ℕ

(open-mindedness).

Let k ≥ 2 be minimal such that

Q (ω_{2}^{k}) > 0

. Now define a function R ∈

ℙ_{ℒ}

by letting for all n > k

\begin{array}{l} R (ω_{i}^{k}) : = \frac{Q (ω_{i}^{k}) + P_{Ω}^{†} (ω_{i}^{k})}{2} for all 1 \leq i \leq 2^{k} \\ R (ω_{i}^{n}) : = R (ω_{1}^{k}) = \frac{Q (ω_{1}^{k})}{2} = \frac{Q (ω_{1}^{k}) + P_{Ω}^{†} (ω_{1}^{k})}{2} for all n > k \\ R (ω_{2^{n - k + 1}}^{n}) : = \frac{Q (ω_{2}^{k}) + P_{Ω}^{†} (ω_{2}^{k})}{2} = \frac{Q (ω_{2}^{k})}{2} > 0 \\ R (ω_{i}^{n}) : = \frac{Q (v) + P_{Ω}^{†} (v)}{2} for all 2^{n - 1} + 1 \leq i \leq 2^{n} where v \in Ω_{k} with ω_{i}^{n} ⊨ v \\ R (ω_{i}^{n}) : = 0 otherwise . \end{array}

That is, R is the arithmetic mean of Q and

P_{Ω}^{†}

on

ℒ_{k}

. Beyond

ℒ_{k}

, R equivocates under the k-states which imply Ut₁. For such n-states

R (ω_{i}^{n}) = \frac{Q (v) + \frac{1}{2^{n - 1}}}{2}

holds. Beyond

ℒ_{k}

, there are only two n-states which imply ¬Ut₁ which are assigned non-zero probability,

w_{1}^{n}

and

w_{2^{n - k + 1}}^{n}

.

We now show that R has a strictly better loss profile than Q what contradicts Q ∈ minloss

B_{ℒ}

.

Let

v_{k}^{-}

∈ arg

\min_{ω \in {ω_{2^{k - 1} + 1^{, \dots,}}^{k} ω_{2^{k}}^{k}}}

. Trivially,

0 < Q (v_{k}^{-}) < \frac{1}{2^{k - 1}}

. Next note that for all n ≥ k which are large enough it holds that

\min_{ω \in {ω_{1}^{n}, ω_{2^{n - 1} + 1}^{n}, \dots, ω_{2^{n}}^{n}}} R (ω) = \frac{\frac{1}{2^{k - 1}} + Q (v_{k}^{-})}{2} \cdot \frac{| Ω_{k} |}{| Ω_{n} |}

and that

\min_{ω \in {ω_{2^{n - 1} + 1}^{n}, \dots, ω_{2^{n}}^{n}}} Q (ω) Q (v_{k}^{-}) \cdot \frac{| Ω_{k} |}{| Ω_{n} |} .

We now find for all large enough n > k that

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, R) \geq g (π^{n}) \log (Q (v_{k}^{-}) \cdot \frac{| Ω_{k} |}{| Ω_{n} |}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, R) \\ \geq g (π^{n}) (- \log (Q (v_{k}^{-}) \cdot \frac{| Ω_{k} |}{| Ω_{n} |}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, R)) \\ - \sup_{P \in E_{ℒ}} - \sum_{π \in Π_{n} \ {π^{n}}} g (π) \sum_{F \in π} ° P (F) \log ° R (F) . \end{array}

Whenever °P (F) > 0 with F ⊆ Ω_n, then °R(F) is bounded from below by

\frac{1}{2^{n}}

. Hence, the last term in the above sum converges to zero, since g is regular.

We now obtain the contradiction as follows: there exists some ϵ > 0 such that for all large enough n ≥ k it holds that

\begin{array}{l} - \log (Q (v_{k}^{-}) \cdot \frac{| Ω_{k} |}{| Ω_{n} |}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, R) \\ = - \log (Q (v_{k}^{-}) \cdot \frac{| Ω_{k} |}{| Ω_{n} |}) + \log (\frac{\frac{1}{2^{k - 1}} + Q (v_{k}^{-})}{2} \cdot \frac{| Ω_{k} |}{| Ω_{n} |}) \\ = \log (\frac{\frac{1}{2^{k - 1}} + Q (v_{k}^{-})}{2}) - \log Q (v_{k}^{-}) \\ \geq ϵ . \end{array}

We have thus shown that if

ℙ_{ℒ}

∩ minloss

B_{ℒ} \neq \emptyset

, then there exists some Q ∈

E_{ℒ}

∩ minloss

B_{ℒ}

.

Case C1A

Q (w_{1}^{1}) = 0

. Then Q has infinite worst-case expected loss for all n ∈

ℕ

and we are done.

Case C1B

Q (ω_{1}^{1}) > 0

.

By open-mindedness,

Q (ω_{1}^{1}) < 1

has to hold.

For all n ∈

ℕ

let

ω_{-}^{n}

∈ arg

\min_{ω \in {ω_{2^{n - 1} + 1^{, \dots,}}^{n} ω_{2^{n}}^{n}}} Q (ω)

From Q ∈

E_{ℒ}

we now obtain that for all large enough n there exists a probability function R ∈ arg

\sup_{P \in E_{ℒ}}

S_{Ω}^{n} (P, Q)

such that

R (ω_{-}^{n}) = 1

.

Next, define a probability function Q′ ∈

E_{ℒ}

where

Q^{'} (ω_{1}^{n}) : = Q^{'} (ω_{1}^{1}) : = Q^{'} (ω_{1}^{1})

and Q′ equivocates over Ut₁,

Q^{'} (ω_{i}^{n}) : = Q (ω_{2}^{1}) \frac{| Ω_{1} |}{| Ω_{n} |}

for all n ∈

ℕ

and for all 2ⁿ⁻¹ + 1 ≤ i ≤ 2ⁿ. Assume for contradiction that Q ≠ Q′.

We next show that Q′ ≺ Q. This contradicts Q ∈ minloss

B_{ℒ}

. To this end let us note that for all large enough n

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q^{'}) \leq \sup_{P \in E_{ℒ}} g (π^{n}) S_{Ω}^{n} (P, Q^{'}) + \sup_{P \in E_{ℒ}} - \sum_{π \in \prod_{n} \ {π^{n}}} g (π) \sum_{F \in π} ° P (F) \log ° Q^{'} (F) \\ \leq - g (π^{n}) \log Q^{'} (ω_{2^{n}}^{n}) + \sup_{P \in E_{ℒ}} - \sum_{π \in \prod_{n} \ {π^{n}}} g (π) \sum_{F \in π} ° P (F) \log Q^{'} (ω_{2^{n}}^{n}) \\ = - g (π^{n}) \log Q^{'} (ω_{2^{n}}^{n}) - \log (Q^{'} (ω_{2}^{1}) \frac{| Ω_{1} |}{| Ω_{n} |}) \cdot \sum_{π \in \prod_{n} \ {π^{n}}} g (π) . \end{array}

Since whenever °P (F) > 0, then °Q′(F) is bounded from below by

Q^{'} (ω_{2^{n}}^{n})

.

Thus, for all large enough n we have

\begin{array}{l} 0 \leq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q^{'}) - g (π^{n}) \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'}) \\ \leq - \log (Q^{'} (ω_{2}^{1}) \frac{| Ω_{1} |}{| Ω_{n} |}) \cdot \sum_{π \in \prod_{n} \ {π^{n}}} g (π) . \end{array}

g is regular, hence, this last term converges to zero. We thus obtain

\lim_{n \to \infty} \sup_{P \in E_{ℒ}} g (π^{n}) S_{Ω}^{n} (P, Q^{'}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q^{'}) = 0.

(17)

Since Q ≠ Q′, Q, Q′ ∈

E_{ℒ}

and

Q (ω_{1}^{1}) = Q^{'} (ω_{1}^{1})

, there has to exist some minimal k ∈ ℕ a minimal

i ≥ 2^k⁻¹ + 1 such that

Q (ω_{i}^{k}) < Q^{'} (ω_{i}^{k})

. We now find for all large enough n that

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q) - g (π^{n}) \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'}) \geq g (π^{n}) \cdot (\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q) \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'})) \\ \geq g (π^{n}) \cdot (- \log Q (ω_{-}^{n}) + \log Q^{'} (ω_{2^{n}}^{n})) \\ \geq g (π^{n}) \cdot (- \log (Q (ω_{i}^{n}) \frac{| Ω_{k} |}{| Ω_{n} |}) + \log (Q^{'} (ω_{2^{i}}^{k}) \frac{| Ω_{k} |}{| Ω_{n} |})) \\ \geq g (π^{n}) \cdot (- \log Q (ω_{i}^{k}) + \log Q^{'} (ω_{i}^{k})) \\ > 0. \end{array}

Recall that there exists 0 < a ≤ b such that for all n ∈ ℕ a ≤ g(πⁿ) ≤ b holds. Hence, there exists some constant c > 0 such that

g (π^{n}) (- \log Q (ω_{i}^{k}) + \log Q^{'} (ω_{i}^{k})) \geq c > 0

. From (17) we conclude that for all large enough n

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q^{'}) > 0

holds. Thus, Q′ ≺ Q. So, Q ∉ minloss

B_{ℒ}

.

To complete the proof of Case C1B we show that there exists some N ∈ ℕ such that

{\bar{P}}_{N}^{†}

has a strictly better loss profile than Q′.

Let N ∈ ℕ be such that

{\bar{P}}_{N}^{†} (ω_{1}^{1}) < Q^{'} (ω_{1}^{1})

. Analogous to the above it holds that

\lim_{n \to \infty} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} g (π^{n}) S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) = 0.

(18)

It hence suffices to show that there exists some ε > 0 such that for large enough N ∈ ℕ and all n ≥ N

g (π^{n}) \cdot (\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†})) > ϵ .

We now recall that

Q^{'} (ω_{1}^{1}) > {\bar{P}}_{N}^{†} (ω_{1}^{1})

. The required inequality follows for large enough n ∈ ℕ

\begin{matrix} \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, Q^{'}) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) = - \log (\frac{1 - Q^{'} (ω_{1}^{1})}{2^{n - 1}}) + \log (\frac{1 - {\bar{P}}_{N}^{†} (ω_{1}^{1})}{2^{n - 1}}) \\ > ϵ . \end{matrix}

Hence,

{\bar{P}}_{N}^{†} ≺ Q^{'}

.

Case C2 There exist an α > 0 and an minimal N₁ such that for all

n \geq N_{1} \sum_{ω \in Ω_{n}} B^{\inf} (ω) \leq 1 - α

holds.

We may assume that B^inf is open-minded on

ℒ

^∄. Thus there has to exist some minimal N ≥ N₁ such that

0 < P_{n}^{†} (ω_{1}^{1}) < B^{\inf} (ω_{1}^{1})

for all n ≥ N. For all large enough n ≥ N we now find

\begin{array}{l} \frac{1}{g (π^{n})} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{\inf}) \geq \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, B^{\inf}) \\ = \max_{ω \in Ω_{n} \ {ω_{2^{n - 1} + 1}^{n}, \dots, ω_{2^{n}}^{n}}} - \log B^{\inf} (ω) \\ = - \log (\max_{ω \in Ω_{n} \ {ω_{2^{n - 1} + 1}^{n}, \dots, ω_{2^{n}}^{n}}} B^{\inf} (ω)) \\ \geq - \log \frac{1 - α - B^{\inf} (ω_{1}^{1})}{2^{n - 1}} . \end{array}

Using (18) we find for all large enough n ∈ ℕ

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B^{\inf}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) \geq g (π^{n}) \cdot (- \log \frac{1 - α - B^{\inf} (ω_{1}^{1})}{2^{n - 1}} + \log \frac{1 - P_{N}^{†} (ω_{1}^{1})}{2^{n - 1}}) \\ + (\log (| Ω_{n} |) - \log (| Ω_{n} |) - \log ({\bar{P}}_{N}^{†} (ω_{2^{N}}^{N}))) \cdot \sum_{^{π \in Π_{n} \ {π^{n}}}} g (π) \\ > 0. \end{array}

□

Proposition 31. For all regular g and all ϵ > 0 there exists an N ∈ ℕ such that for all n ≥ N

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) \leq ϵ .

Proof. Let ϵ > 0 be fixed. By (18) it suffices to show that there exists some N ∈ ℕ such that for all n ≥ N it holds that

\begin{array}{l} 0 \leq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, {\bar{P}}_{N}^{†}) \\ \leq g (π^{n}) \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) - g (π^{n}) \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, {\bar{P}}_{N}^{†}) \\ \leq ϵ . \end{array}

Now simply note that we have proved this already in Proposition 29. □

Hence, for all ϵ > 0 there exists some N ∈ ℕ such that for all n ≥ N and all Q ∈

B_{ℒ}

\begin{matrix} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, Q) \leq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{N}^{†}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{N}^{†}) \\ > ϵ . \end{matrix}

Although,

{\bar{P}}_{N}^{†}

is not a minimal element of ≺, the losses incurred by adopting any other B ∈

B_{ℒ}

can only be marginally better, eventually.

Thus, for fixed k and δ > 0 there exists an N ∈ ℕ such that for all

φ \in S ℒ_{k} | {\bar{P}}_{N}^{†} (φ) - P_{Ω}^{†} (φ) | < δ

. Hence, belief functions with an arbitrarily good loss can be found within an (Euclidean) neighbourhood of

P_{Ω}^{†}

.

Since the

{\bar{P}}_{N}^{†}

are probability functions, there does not exist a B ∈

B_{ℒ}

which dominates

{\bar{P}}_{N}^{†}

on

ℒ

^∃ or on

ℒ

^∄. Furthermore, the

{\bar{P}}_{N}^{†}

are optimal according to (∀*). The

{\bar{P}}_{N}^{†}

thus are almost optimal in all the senses we here considered.

In essence, the phenomenon of minloss

B_{ℒ} = \emptyset

arises from

{\bar{P}}_{N + 1}^{†}

having a strictly better loss profile than

{\bar{P}}_{N}^{†}

but the limit of the sequence

{({\bar{P}}_{N}^{†})}_{n \in ℕ}

is

P_{Ω}^{†}

, which is not open-minded. This phenomenon is reminiscent of min{x ∈ ℝ : 0 < x < 1} = ∅, where it is possible to get ever closer to zero but it is impossible to reach it.

6.2. When Losses Can Be Minimised

The analysis of Section 6.1, shows that there can be no general minimax theorem which covers any evidence that is not finitely generated. On the other hand, we shall see in this section that for certain natural cases evidence which cannot be finitely generated, minimax theorems do obtain.

Let

ℒ

contain only one m-ary relation symbol, U, and c ∈ [0, 1]. Let

ν_{1}^{n} : = \land_{1 \leq i_{1}, \dots, i_{m}}_{\leq n} \neg U t_{i_{1}} t_{i 2} \dots t_{i_{m}} \in Ω_{n}

and let

ν_{2}^{n}, \dots, ν_{| Ω_{n} |}^{n}

be an enumeration of the remaining n-states. We shall consider the following example:

\begin{array}{l} E_{ℒ} = {P \in ℙ_{ℒ} : \lim_{n \to \infty} P (ν_{1}^{n}) = c} \\ = {P \in ℙ_{ℒ} : P (\forall x_{1} x_{2} \dots x_{m} \neg U x_{1} x_{2} \dots x_{m}) = c} . \end{array}

Slightly less general versions of

E_{ℒ}

have attracted recent interest in the literature [18] (Example 3, p. 95), [19] (Example 3.5, p. 172) and [1] (Example 5.7, p. 99). We here consider relations symbols U of arbitrary arity, while previously U was taken to be unary.

First of all, if c = 0 and g is symmetric and inclusive, then P= ∈

E_{ℒ}

and we immediately obtain that

P_{=} = P_{Ω}^{†}

and

{P_{=}} = {maxent E}_{ℒ} = {minloss B}_{ℒ}

.

We shall assume from now on that c > 0.

Proposition 32. For symmetric and inclusive g it holds that

ℙ^{†} = {P^{†}} = {P_{Ω}^{†}}

and

P_{Ω}^{†} (ν_{1}^{n}) = c + \frac{1 - c}{| Ω_{n} |}

and

P_{Ω}^{†} (ν_{i}^{n}) = \frac{1 - c}{| Ω_{n} |}

for all n ∈ ℕ and all 1 ≤ I ≤ |Ω_n|.

Proof. For all n ≥ 2 and symmetric and inclusive g_n it holds that

P_{n}^{†} (ν_{2}^{n}) = P_{n}^{†} (ν_{2}^{n}) = P_{n}^{†} (ν_{2 + i}^{n})

for all 1 ≤ i ≤ |Ω_n| − 2 by [4] (Corollary 7, p. 3577). Thus, there exists some λ_n ≥ 0 such that

P_{n}^{†} (ν_{1}^{n}) = λ_{n}

and

P_{n}^{†} (ν_{k}^{n}) = \frac{1 - λ_{n}}{| Ω_{n} |^{- 1}}

for all 2 ≤ k ≤ |Ω_n|.

For all n ∈ ℕ, now define a function P₁ ∈

[E_{n}]

by

P_{1} (ν_{1}^{n}) : = 1

. Then, define a convex combination of the equivocator on

E_{n}

and P₁ by

P_{λ_{n}} : = λ_{n} P_{1} + (1 - λ_{n}) P_{= ⇂_{n}}

.. Recall that g_n is equivocator-preserving (Proposition 7) and that

H_{g}^{n}

is strictly concave on ℙ_n (Lemma 1). Thus,

H_{g}^{n} (P_{λ_{n}}) > H_{g}^{n} (P_{{λ^{'}}_{n}})

for all

0 \leq λ_{n} < {λ^{'}}_{n} \leq 1

.

On the one hand g-entropy strictly increases with decreasing λ_n on the other hand

P_{n}^{†} \in [E_{ℒ}]

imposes the constraint

P_{n}^{†} (ν_{1}^{n}) \geq c

.. Let N ∈ ℕ be minimal with

| Ω_{N} | > \frac{1}{c}

Then for all n ≥ N it holds that

P_{n}^{†} (ν_{1}^{n}) \geq c

. and

P_{n}^{†} (ν_{2}^{n}) = P_{n}^{†} (ν_{2 + i}^{n}) = \frac{1 - c}{| Ω_{n} | - 1}

for all 1 ≤ i ≤ |Ω_n| −2.

For all r ≥ N it follows that

\begin{array}{l} P_{r}^{†} (ν_{2}^{N}) = P_{n}^{†} (ν_{2 + i}^{r}) \frac{| Ω_{n} |}{| Ω_{N} |} = \frac{1 - c | Ω_{N} |}{| Ω_{r} | - 1 | Ω_{N} |} \\ P_{r}^{†} (ν_{1}^{N}) = 1 - (| Ω_{N} | - 1) \cdot P_{r}^{†} (ν_{2}^{N}) . \end{array}

Thus, for all r ≥ N we find

\begin{array}{l} P^{†} (ν_{2}^{r}) = \lim_{n \to \infty} P_{n}^{†} (ν_{2}^{r}) = \frac{1 - c}{| Ω_{r} |} \\ P^{†} (ν_{1}^{r}) = \lim_{n \to \infty} P_{n}^{†} (ν_{2}^{r}) \\ = 1 - (| Ω_{r} | - 1) P^{†} (ν_{2}^{r}) \\ = \frac{| Ω_{r} | - (| Ω_{r} | - 1) (1 - c)}{| Ω_{r} |} \\ = c + \frac{1 - c}{| Ω_{r} |} \\ = c + P^{†} (ν_{2}^{r}) . \end{array}

Thus, for all n ∈ ℕ

P^{†} (ν_{1}^{n}) = c + \frac{1 - c}{| Ω_{n} |}

and

P^{†} (ν_{2}^{n}) = \frac{1 - c}{| Ω_{n} |}

.

We now show that P^† is indeed a probability function. We need to show that

\sum \underset{ν ⊨ ω}{ν \in Ω_{n + 1}} P^{†} (ν) = P^{†} (ω)

for all n ∈ ℕ and all ω ∈ Ω_n:

\begin{array}{l} P^{†} (ν_{i}^{n}) = \frac{1 - c}{| Ω_{n} |} = \frac{| Ω_{n + 1} | 1 - c}{| Ω_{n} | | Ω_{n + 1} |} = \frac{| Ω_{n + 1} |}{| Ω_{n} |} P^{†} (ν_{i}^{n + 1}) for all 2 \leq i \leq | Ω_{n} | \\ P^{†} (ν_{1}^{n}) = c + \frac{1 - c}{| Ω_{n} |} \\ = c + \frac{1 - c}{| Ω_{n + 1} |} + (\frac{| Ω_{n + 1} |}{| Ω_{n} |} - 1) \frac{1 - c}{| Ω_{n + 1} |} \\ = P^{†} (ν_{1}^{n + 1}) + (\frac{| Ω_{n + 1} |}{| Ω_{n} |} - 1) P^{†} (ν_{2}^{n + 1}) . \end{array}

Finally, observe that

ℙ_{n}^{†} = \arg \sup_{P \in E_{n}} H_{Ω}^{n} (P)

. Hence,

ℙ^{†} = {P_{Ω}^{†}}

. □

Proposition 33. If g = g_Ω or if g is regular, then maxent

E_{ℒ} = {P_{Ω}^{†}}

.

Proof. Let Q ∈

E_{ℒ} \ {P_{Ω}^{†}}

. For regular g, it suffices to show that there exists an N ∈ ℕ such that for all

n \geq N H_{g}^{n} (Q) < H_{g}^{n} (P_{Ω}^{†})

holds.

Since

Q \neq P_{Ω}^{†}

there has to exist a minimal N ∈ ℕ and an N-state ω′ ∈

Ω_{N} \ {ν_{1}^{N}}

such that

Q (ω^{'}) \neq P_{Ω}^{†} (ω^{'})

.

Now define a function Q′: S

ℒ

→ [0, 1] by requiring that Q′ respects logical equivalence, Q and Q′ agree on S

ℒ

_N,

$Q^{'} (ν^{'}) : = \frac{| Ω_{N} |}{| Ω_{n} |} Q (ω^{'})$ Q(ω⁰) for all n > N all ν′ ∈ Ω_n with ν′ ⊨ ω′,
$Q^{'} (ν_{1}^{n}) : = Q (ν_{1}^{n})$ for all n > N and
$Q^{'} (ν) : = \frac{1 - Q (ν_{1}^{n}) - Q (ω^{'})}{| Ω_{n} | - 1 - \frac{| Ω_{n} |}{| Ω_{N} |}}$ for all n > N and all ν ∈ Ω_n \ ${ν_{1}^{n}}$ with ν ⊭ ω′

In general, Q′ is not a probability function because

Q^{'} (ν_{1}^{n}) < \sum_{\underset{ω ⊨ ν_{1}^{n}}{ω \in Ω_{n + 1}}} Q^{'} (ω) .

Note that for all

n \geq N H_{Ω}^{n} (Q) \leq H_{Ω}^{n} (Q^{'})

holds.

We now show that for all large enough n

H_{Ω}^{n} (Q^{'}) < H_{Ω}^{n} (P_{Ω}^{†})

holds. Let us first compute

\begin{matrix} - H_{Ω}^{n} (Q^{'}) = Q (ν_{1}^{n}) \log (Q (ν_{1}^{n})) + (\frac{| Ω_{n} |}{| Ω_{N} |} \frac{| Ω_{N} |}{| Ω_{n} |} Q (ω^{'})) \log \frac{Q (ω^{'}) \cdot | Ω_{N} |}{| Ω_{n} |} \\ + (1 - Q (ν_{1}^{n}) - Q (ω^{'})) \log \frac{1 - Q (ν_{1}^{n}) - Q (ω^{'})}{| Ω_{n} | - 1 - \frac{| Ω_{n} |}{| Ω_{N} |}} \\ = Q (ν_{1}^{n}) \log (Q (ν_{1}^{n})) + Q (ω^{'}) \cdot (\log (Q (ω^{'})) + \log (\frac{| Ω_{N} |}{| Ω_{n} |})) \\ = + (1 - Q (ν_{1}^{n}) - Q (ω^{'})) \cdot (\log (\frac{1 - Q (ν_{1}^{n}) - Q (ω^{'})}{| Ω_{N} | - \frac{| Ω_{N} |}{| Ω_{n} |} - 1}) + \log (\frac{| Ω_{N} |}{| Ω_{n} |})) \\ + (1 - Q (ν_{1}^{n}) - Q (ω^{'})) \cdot \log (\frac{1 - Q (ν_{1}^{n}) - Q (ω^{'})}{| Ω_{N} | - \frac{| Ω_{N} |}{| Ω_{n} |} - 1}) . \end{matrix}

Since

H_{Ω}^{n} (P_{Ω}^{†}) = - (c + \frac{1 - c}{| Ω_{n} |}) \log (c + \frac{1 - c}{| Ω_{n} |}) - (| Ω_{n} | - 1) \frac{1 - c}{| Ω_{n} |} \log (\frac{1 - c}{| Ω_{n} |})

we now find with

\lim_{n \to \infty} Q^{'} (v_{1}^{n}) = c

that

\begin{array}{l} \lim_{n \to \infty} H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q^{'}) = - c \log (c) + \lim_{n \to \infty} (- (| Ω_{n} | - 1) \frac{1 - c}{| Ω_{n} |} \log (\frac{1 - c}{| Ω_{n} |}) \\ + Q (ν_{1}^{n}) \log (Q (ν_{1}^{n})) + Q (ω^{'}) \log (Q (ω^{'})) + (1 - Q (ν_{1}^{n})) (\log (\frac{| Ω_{N} |}{| Ω_{n} |})) \\ + (1 - Q (ν_{1}^{n}) - Q (ω^{'})) \cdot (\log (\frac{1 - Q (ν_{1}^{n}) - Q (ω^{'})}{| Ω_{N} | - \frac{| Ω_{N} |}{| Ω_{n} |} - 1}))) \\ = - c \log (c) - (1 - c) \log (1 - c) + \lim_{n \to \infty} (\frac{(1 - c) (| Ω_{N} | - 1)}{| Ω_{n} |} \log (| Ω_{n} |) \\ + c \log (c) + Q (ω^{'}) \log (Q (ω^{'})) + (1 - Q (ν_{1}^{n})) (\log (\frac{| Ω_{N} |}{| Ω_{n} |}))) \\ + (1 - c - Q (ω^{'})) \cdot (\log (\frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) \\ = - (1 - c) \log (1 - c) + Q (ω^{'}) \log (Q (ω^{'})) \\ + (1 - c - Q (ω^{'})) \cdot (\log (\frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) \\ + \lim_{n \to \infty} (\frac{(1 - c) (| Ω_{n} | - 1)}{| Ω_{n} |} \log (| Ω_{n} |) \\ + (1 - Q (ν_{1}^{n})) (\log (| Ω_{N} |) - \log (| Ω_{n} |))) \\ = - (1 - c) \log (\frac{1 - c}{| Ω_{N} |}) + Q (ω^{'}) \log (Q (ω^{'})) \\ + (1 - c - Q^{'} (ω^{'})) \cdot (\log (\frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) \\ + \lim_{n \to \infty} \frac{(1 - c) (| Ω_{n} | - 1)}{| Ω_{n} |} \log (| Ω_{n} |) - (1 - Q (ν_{1}^{n})) \log (| Ω_{n} |) \\ \overset{Q (ν_{n}^{1}) \geq c}{\geq} - (1 - c) \log (\frac{1 - c}{| Ω_{N} |}) + Q (ω^{'}) \log (Q (ω^{'})) \\ + (1 - c - Q (ω^{'})) \cdot (\log (\frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) \\ + \lim_{n \to \infty} \frac{(1 - c) (| Ω_{n} | - 1)}{| Ω_{n} |} \log (| Ω_{n} |) - (1 - c) \log (| Ω_{n} |) \\ = - (1 - c) \log (\frac{1 - c}{| Ω_{N} |}) + Q (ω^{'}) \log (Q (ω^{'})) \\ + (1 - c - Q (ω^{'})) \cdot (\log \frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) \\ + (1 - c) \lim_{n \to \infty} (\frac{| Ω_{n} | - 1}{| Ω_{n} |} - 1) \log (| Ω_{n} |) \\ = - (1 - c) \log (\frac{1 - c}{| Ω_{N} |}) \\ + Q (ω^{'}) \log (Q (ω^{'})) + (1 - c - Q (ω^{'})) \cdot (\log (\frac{1 - c - Q (ω^{'})}{| Ω_{N} | - 1})) . \end{array}

Since

Q (ω^{'}) \neq \frac{1 - c}{| Ω_{N} |}

there exists some ϵ > 0 such that for all large enough n

H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q^{'}) > ϵ > 0.

This establishes the result for g = g_Ω.

We now turn to regular g.

\begin{array}{r} H_{g}^{n} (P_{Ω}^{†}) - H_{g}^{n} (Q) \geq H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q) - \sum_{π \in Π_{n} \ {π^{n}}} g (π) \sum_{f \in π} ° Q (F) \log ° Q (F) \\ \geq H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q^{'}) - \sum_{π \in Π_{n} \ {π^{n}}} g (π) \sum_{f \in π} ° Q (F) \log ° Q (F) . \end{array}

The last sum goes to zero since g is regular, Corollary 6. Eventually,

H_{Ω}^{n} (P_{Ω}^{†}) - H_{Ω}^{n} (Q^{'})

is greater some ϵ > 0 as we established in the first part of the proof. Thus, for all large enough n ∈ ℕ and all

Q \in E_{ℒ} \ {P^{†}}

we have

H_{g}^{n} (P_{Ω}^{†}) - H_{g}^{n} (Q) > 0.

□

Lemma 12. The following three conditions are equivalent for all large enough n ∈ ℕ and inclusive and symmetric g

$P' ϵ \arg \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†})$
$P^{'} (ν_{1}^{n}) = c$
$P^{'} \in \arg \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†})$

Proof. Note that for all P ∈ ℙ_$ℒ$

\begin{array}{l} S_{g}^{n} (P, P_{Ω}^{†}) = - \sum_{π \in Π_{n}} g (π) \sum_{F \in π} ° P (F) \log ° P_{Ω}^{†} (F) \\ = - \sum_{ν \in Ω_{n}} P (ν) \sum_{\underset{^{ν \in F}}{F \subseteq Ω_{n}}} γ_{n} (F) \log ° P_{Ω}^{†} (F) \\ = - P (ν_{1}) (γ_{n} (ν_{1}) \log P_{Ω}^{†} (ν_{1}) + \sum_{{ν_{1} \in F}_{| F | \geq 2}^{F \subseteq Ω_{n}}} γ_{n} (F) \log ° P_{Ω}^{†} (F)) \\ + \sum_{i = 2}^{| Ω_{n} |} - P (ν_{i}) (γ_{n} (ν_{i}) \log P_{Ω}^{†} (ν_{i}) + \sum_{{ν_{i} \in F}_{| F | \geq 2}^{F \subseteq Ω_{n}}} γ_{n} (F) \log ° P_{Ω}^{†} (F)) . \end{array}

The term between the last set of brackets () does not depend on i. So,

S_{g}^{n} (P, P_{Ω}^{†})

only depends on P(ν₁) but not on how P distributes probabilities among the other n-states.

For large enough N ∈ ℕ it holds that

P_{Ω}^{†} (ν_{1}) > P_{Ω}^{†} (ν_{2}) = P_{Ω}^{†} (ν_{i})

for all 3 ≤ i ≤ |Ω_n|.

Since g is symmetric, γ_n(F) is only a function of the size of F, |F|, it follows that every

P^{'} \in \arg \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†})

assigns as little probability as possible to ν₁. Since we require that

P \in E_{ℒ}

it follows that P′(ν₁) = c.

The result for

S_{Ω}^{n}

follows as above by noting that for g = g_Ω it holds that γ_n(ν) = 1 for all n-states ν ∈ Ω_n and γ_n(F) = 0 otherwise. □

Adapting Joyce’s notion of truth-directedness [14] we define:

Definition 26 (Chance-directed scoring rule). A function F^f: [0, 1] × [0, 1] → [0, +∞] of the form F^f (x, y) = x · f(y) + (1 − x) · f(1 − y) is called chance-directed, if and only if for all x ∈ [0, 1], all 0 ≤ λ < 1 and all y ∈ [0, 1] \ {x}

\begin{array}{l} F^{f} (x, y) = x \cdot f (y) + (1 - x) \cdot f (1 - y) \\ > x \cdot f ((1 - λ) x + λ y) + (1 - x) \cdot f (1 - (1 - λ) x - λ y) \\ = F^{f} (x, (1 - λ) x + λ y) \end{array}

holds. For a scoring rule F^f this formalises the idea that beliefs which are closer to the chances on two mutually exclusive and exhaustive events are strictly better scored.

In particular, F^f(x, y) = −x log y − (1 − x) log(1 − y) is chance-directed. The score improves by simultaneously moving y closer to x and 1 − y closer to 1 − x.

Proposition 34. If g is regular, then all B ∈ minloss

B_{ℒ}

agree with

P_{Ω}^{†}

on

ℒ

^∄.

Proof. If c = 1, then

| E_{ℒ} | = 1

and maxent

E_{ℒ} = {P_{Ω}^{†}}

follows trivially. By Theorem 5 we have that for every function

B^{'} \in \arg \inf_{B \in B_{ℒ}} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B)

it holds that

{B^{'}}_{⇂ n} = P_{Ω ⇂ n}^{†}

. Thus, all B ∈ minloss

B_{ℒ}

agree with

P_{Ω}^{†}

on

ℒ

^∄.

We now focus on 0 < c < 1.

From the above lemma we obtain

\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†}) = - c \log (c + \frac{1 - c}{| Ω_{n} |}) - (1 - c) \log (\frac{1 - c}{| Ω_{n} |}) .

We now follow the structure of the proof of Proposition 16 for fixed 0 < c < 1. Let B ∈ minloss

B_{ℒ}

.

Case1

B \in ℙ_{ℒ} \ {P_{Ω}^{†}}

.

Case1A

B \in [E_{ℒ}] \ {P_{Ω}^{†}}

.

If there exists an n ∈ ℕ such that

B (ν_{1}^{n}) > P_{Ω}^{†} (ν_{1}^{n})

, then

\sum_{ν \in Ω_{n} \ {ν_{1}^{n}}} B (ν) < \sum_{ν \in Ω_{n} \ {ν_{1}^{n}}} P_{Ω}^{†} (ν)

. If there exists an m ∈ ℕ such that

B (ν_{1}^{m}) > P_{Ω}^{†} (ν_{1}^{m})

, then there has to exist some k > m such that

\sum_{\underset{ν ⊨ ν_{1}^{k - 1}}{ν \in Ω^{k} \ {ν_{1}^{k}}}} B (ν) < \sum_{\underset{ν ⊨ ν_{1}^{k - 1}}{ν \in Ω^{k} \ {ν_{1}^{k}}}} P_{Ω}^{†} (ν) .

Since

B \neq P_{Ω}^{†}

either such an n ∈ ℕ or such a k ∈ ℕ has to exist, possibly both exist. Overall, there has to exist some N ∈ ℕ, a

ν^{N} \in Ω_{N} \ {ν_{1}^{N}}

and an ϵ > 0 such that

B (ν^{N}) + ϵ = P_{Ω}^{†} (ν^{N})

.

For large enough n ∈ ℕ, depending on B,

ℒ

and c, it holds that

\begin{array}{l} \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, B) \geq - c \log B (ν_{1}^{n}) - (1 - c) \log (B (ν^{N}) \frac{| Ω_{N} |}{| Ω_{n} |}) \\ > - c \log B (ν_{1}^{n}) - (1 - c) \log ((B (ν^{N}) + \frac{ϵ}{2}) \frac{| Ω_{N} |}{| Ω_{n} |}) \\ \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†}) = - c \log (c + \frac{1 - c}{| Ω_{n} |}) - (1 - c) \log (P_{Ω}^{†} (ν^{N}) \frac{| Ω_{N} |}{| Ω_{n} |}) . \end{array}

Since we may assume that

B (ν_{1}^{n})

converges in n to c

B \in E_{ℒ}

we now find

\begin{array}{l} \lim_{n \to \infty} \frac{\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†})}{1 - c} \\ \geq \lim_{n \to \infty} - \log (B (ν^{N}) \frac{| Ω_{N} |}{| Ω_{n} |}) + \log (P_{Ω}^{†} (ν^{N}) \frac{| Ω_{N} |}{| Ω_{n} |}) \\ > - \log (B (ν^{N}) + \frac{ϵ}{2}) + \log P_{Ω}^{†} (ν^{N}) \\ > 0. \end{array}

Whether this limit exists or not, we have thus established that for large enough n ∈ ℕ there exists a lower bound of the sequence

{(\sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†}))}_{n \in ℕ}

which is strictly positive, since we take N ∈ ℕ to be fixed here.

For all fixed n ∈ ℕ let

{P^{'}}_{n} \in E_{ℒ}

be such that

{P^{'}}_{n} (ω_{1}^{n}) : = c

and

{P^{'}}_{n} (ω_{2}^{n}) : = 1 - c

. Note that

{P^{'}}_{n} \in \arg \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†})

for all large enough n and

{P^{'}}_{n} \in \arg \sup_{P \in E_{ℒ}} S_{Ω}^{n} (P, P_{Ω}^{†})

for all large enough n, Lemma 12.

To simplify notation let

R_{n} : = \sum_{π \in Π_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° {P^{'}}_{n} (F) \log ° P_{Ω}^{†} (F)

. With this notation we have for all large enough n ∈ ℕ

\begin{array}{l} 0 \leq R_{n} = \sum_{π \in Π_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° {P^{'}}_{n} (F) \log ° P_{Ω}^{†} (F) \\ \leq \sum_{π \in Π_{n} \ {π^{n}}} - g (π) \sum_{F \in π} ° {P^{'}}_{n} (F) \log \frac{1 - c}{| Ω_{n} |} \\ = \sum_{π \in Π_{n} \ {π^{n}}} - g (π) \log \frac{1 - c}{| Ω_{n} |} \\ = (\log (| Ω_{n} |) - \log (1 - c)) \cdot \sum_{π \in Π_{n} \ {π^{n}}} g (π) . \end{array}

By our standing assumption on g (regularity), we obtain that R_n converges to zero. We now find

\begin{array}{r} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - g (π^{n}) S_{Ω}^{n} ({P^{'}}_{n}, P_{Ω}^{†}) - R_{n} \\ \geq g (π^{n}) (S_{Ω}^{n} ({P^{'}}_{n}, B) - S_{Ω}^{n} ({P^{'}}_{n}, P_{Ω}^{†})) - R_{n} . \end{array}

Because g(πⁿ) is bounded and R_n converges to zero, we obtain for all large enough n ∈ ℕ that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) > 0.

Case1B

B \notin [ℙ_{ℒ}] \ [E_{ℒ}]

.

Case1Bi

\lim_{n \to \infty} B (ν_{1}^{n}) > c

.

Let us first note that this limit has to exist, because

B (ν_{1}^{n})

is a (not necessarily strictly) decreasing sequence bounded from below by c. Let

b_{1} : = \lim_{n \to \infty} B (ν_{1}^{n}) > c

.

Note that there has to exist some N ∈ ℕ such that for all n ≥ N it holds that

B (ν_{1}^{n}) > P_{Ω}^{†} (ν_{1}^{n})

. For all n ≥ N there has to exist some

ν \in Ω_{n} \ {ν_{1}^{n}}

such that

B (ν) < P_{Ω}^{†} (ν)

. Then, for all n ≥ N

\begin{array}{l} \frac{1}{g (π^{n})} \cdot \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) \geq - c \log B (ν_{1}^{n}) - (1 - c) \log \frac{1 - B (ν_{1}^{n})}{| Ω_{n} | - 1} \\ = - c \log B (ν_{1}^{n}) - (1 - c) (\log (1 - B (ν_{1}^{n})) + \log \frac{1}{| Ω_{n} | - 1}) \\ > - c \log (c + \frac{b_{1} - c}{2}) - (1 - c) \log (1 - c - \frac{b_{1} - c}{2}) + (1 - c) \log \frac{1}{| Ω_{n} | - 1} \\ = c \log (c + \frac{b_{1} + c}{2}) - (1 - c) \log \frac{1 - c - \frac{b_{1} - c}{2}}{| Ω_{n} | - 1} \\ = - c \log (\frac{b_{1} + c}{2}) - (1 - c) \log \frac{1 - \frac{b_{1} + c}{2}}{| Ω_{n} | - 1}, \end{array}

where the strict inequality follows from chance-directedness. We now find

\begin{array}{l} \lim_{n \to \infty} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) \\ > \lim_{n \to \infty} g (π^{n}) (- c \log (\frac{b_{1} + c}{2}) - (1 - c) \log (\frac{1 - \frac{b_{1} + c}{2}}{| Ω_{n} | - 1}) \\ + c \log (c + \frac{1 - c}{| Ω_{n} |}) + (1 - c) \log (\frac{1 - c}{| Ω_{n} |})) - R_{n} \\ = \lim_{n \to \infty} g (π^{n}) (- c \log (\frac{b_{1} + c}{2}) - (1 - c) \log (\frac{1 - \frac{b_{1} + c}{2}}{| Ω_{n} | - 1} \cdot | Ω_{n} |) \\ + c \log (c + \frac{1 - c}{| Ω_{n} |}) + (1 - c) \log (1 - c)) \\ = \lim_{n \to \infty} g (π^{n}) (- c \log (\frac{b_{1} + c}{2}) - (1 - c) \log (1 - \frac{b_{1} + c}{2}) \\ + c \log (c + \frac{1 - c}{| Ω_{n} |}) + (1 - c) \log (1 - c)) \\ = (\lim_{n \to \infty} g (π^{n})) \cdot (- c \log (\frac{b_{1} + c}{2}) - (1 - c) \log (1 - \frac{b_{1} + c}{2}) \\ + c \log (c) + (1 - c) \log (1 - c)) \\ > 0, \end{array}

where the last line follows from the fact that the standard logarithmic scoring rules is strictly proper, i.e., Equation (11) holds.

Case1Bii

\lim_{n \to \infty} B (ν_{1}^{n}) < c

.

Let

b_{2} : = \lim_{n \to \infty} B (ν_{1}^{n}) < c

, b₂ exists for the same reasons b₁ exists. Note that there has to exist some N ∈ ℕ such that for all n ≥ N it holds that

B (ν_{1}^{n}) < b_{2} + \frac{c - b_{2}}{2} < c < P_{Ω}^{†} (ν_{1}^{n})

. Using chance-directedness we find for all n ≥ N

\begin{array}{l} \frac{1}{g (π^{n})} \cdot \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) \geq - c \log B (ν_{1}^{n}) - (1 - c) \log \frac{1 - B (ν_{1}^{n})}{| Ω_{n} | - 1} \\ > - c \log (c + \frac{b_{2} - c}{2}) - (1 - c) \log \frac{1 - c - \frac{b_{2} - c}{2}}{| Ω_{n} | - 1} \\ = - c \log (\frac{b_{2} - c}{2}) - (1 - c) \log \frac{1 - \frac{b_{2} - c}{2}}{| Ω_{n} | - 1} . \end{array}

Now proceed as in Case1Bi.

Case2

B \in B_{ℒ}

\

ℙ_{ℒ}

and B respects logical equivalence on

ℒ^{∄}

.

Case2A There exists a

P_{B} \notin ℙ_{ℒ}

such that for all n ∈ ℕ and all F ⊆ Ω_n it holds that °B(F) ≤ °P_B(F).

Since

B \notin ℙ_{ℒ}

there has to exists an N ∈ ℕ and an F′ ∈ Ω_N such that °B(F′) < °P_B(F′).

Case2Ai

P_{B} = P_{Ω}^{†}

and no other

P \in ℙ_{ℒ}

is such that °B(F) ≤ °P (F) for all n and all F ⊆ Ω_n. Follows as does Case2Ai in Proposition 16.

Case2Aii There exists a

P_{B} \in ℙ_{ℒ}

such that

P_{B} \neq P_{Ω}^{†}

.

Then for all n ≥ N and all P ∈ [

E_{ℒ}

] it holds that

S_{g}^{n} (P, B) - S_{g}^{n} (P, P_{B}) \geq 0

. For all large enough n ∈ ℕ it holds by Case1 that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{B}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) > 0.

Thus,

\begin{matrix} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) \underset{P \in E_{ℒ}}{- \sup} S_{g}^{n} (P, P_{Ω}^{†}) \geq \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{B}) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) \\ > 0 . \end{matrix}

Case2B There does not exist a

P_{B} \in ℙ_{ℒ}

such that for all n ∈ ℕ and all F ⊆ Ω_n it holds that °B(F) ≤ °P_B(F).

As in Case2B in Proposition 16 we obtain that there has to exist an α > 0 and a N ∈ ℕ such that for all n ≥ N it holds that

\sum_{ω \in Ω_{n}} B (ω) \leq 1 - α

.

We have for n ≥ N that

\begin{array}{r} \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) \underset{P \in E_{ℒ}}{- \sup} S_{g}^{n} (P, P_{Ω}^{†}) = \sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - g (π^{n}) S_{Ω}^{n} ({P^{'}}_{n}, P_{Ω}^{†}) - R_{n} \\ \geq g (π^{n}) (S_{Ω}^{n} ({P^{'}}_{n}, B) - S_{Ω}^{n} ({P^{'}}_{n}, P_{Ω}^{†})) - R_{n} . \end{array}

To complete the proof we will now show that there exists some β > 0, which depends on

E_{ℒ}

and g but does not depend on the particular n ≥ N, such that

S_{Ω}^{n} ({P^{'}}_{n}, B) - S_{Ω}^{n} ({P^{'}}_{n}, P_{Ω}^{†}) > β > 0

. Since g(πⁿ) is bounded, we then obtain that

\sup_{P \in E_{ℒ}} S_{g}^{n} (P, B) - \sup_{P \in E_{ℒ}} S_{g}^{n} (P, P_{Ω}^{†}) > 0

for all large enough n ∈ ℕ.

We show that for all large enough n ∈ ℕ that

- \sum_{ω \in Ω_{n}} {P^{'}}_{n} (ω) \log f (ω) - c \log (c + \frac{1 - c}{| Ω_{n} |}) - (1 - c) \log (\frac{1 - c}{| Ω_{n} |}) \geq β

for all functions f : Ω_n→ [0, 1] such that

\sum_{ω \in Ω_{n}} f (ω) \leq 1 - α

.

The minimum obtains, if and only if

f (ω) = (1 - α) {P^{'}}_{n} (ω)

for all ω ∈ Ω_n as we saw in Proposition 16. Thus, the minimum obtains for

f (ν_{1}^{n}) = (1 - α) (c + \frac{1 - c}{| Ω_{n} |})

and

f (ν_{i}^{n}) = (1 - α) \frac{1 - c}{| Ω_{n} |}

for all other

ν_{i}^{n} \in Ω_{n}

. Let us now compute

\begin{array}{l} - \sum_{ω \in Ω_{n}} {P^{'}}_{n} (ω) \log f (ω) = - c \log ((1 - α) (c + \frac{1 - c}{| Ω_{n} |})) - (1 - c) \log (\frac{(1 - c) (1 - α)}{| Ω_{n} | - 1}) \\ = - c (\log (c + \frac{1 - c}{| Ω_{n} |}) + \log (1 - α)) - (1 - c) (\log (\frac{1 - c}{| Ω_{n} | - 1}) + \log (1 - α)) \\ = - c \log (c + \frac{1 - c}{| Ω_{n} |}) - (1 - c) \log (\frac{1 - c}{| Ω_{n} | - 1}) - \log (1 - α) . \end{array}

For n approaching infinity we find

\begin{array}{l} \lim_{n \to \infty} - \sum_{ω \in Ω_{n}} {P^{'}}_{n} (ω) \log f (ω) + c \log (c + \frac{1 - c}{| Ω_{n} |}) + (1 - c) \log (\frac{1 - c}{| Ω_{n} |}) \\ = - \log (1 - α) \end{array}

which is strictly greater some β > 0 as required.

Case3

B \in B_{ℒ} \ ℙ_{ℒ}

and B does not respect logical equivalence on

ℒ^{∄}

.

Simply proceed as in Case2 in Theorem 17. □

Theorem 9.

{maxent E}_{ℒ} = {P_{Ω}^{†}} = {B_{†}^{\forall}} .

Proof. Since all

B \in {minloss B}_{ℒ}

agree with

P_{Ω}^{†}

on

ℒ^{∄}

, all

B_{†} \in {minloss B}_{ℒ}

agree with

{P_{Ω}^{†}}

on

ℒ^{∄}

; as we noted in Proposition 20.

Recall that Theorem 8 does not depend on the particular probability function, as we stated on Page 2508. We can thus apply Theorem 8 to infer that

{maxent E}_{ℒ} = {P_{Ω}^{†}} = {B_{†}^{\forall}} .

□

7. Conclusion

In this paper we have set out to provide a unified justification of the three norms of objective Bayesianism in the setting in which the underlying language is a first-order predicate language. We have seen that an approach based on scoring rules can be used to justify the norms on sentences without quantifiers: if the evidence is finitely generated, then the belief function with the best loss profile is a probability function in the set of those calibrated with evidence which has maximum standard entropy, as long as the scoring rule used in the definition of loss profile is defined in terms of a regular weighting function. One can extend this line of argument to handle sentences with quantifiers if one extends the notion of loss profile and imposes two extra desiderata: (i) language invariance and (ii) that one should not give universal hypotheses less credence than the maximum forced by the evidence.

Finally, we saw that this line of justification also applies in some cases in which evidence is not finitely generated. However, we investigated another case in which the justification does not apply because the evidence is such that there is no belief function with the best loss profile. The most one can ask in such a situation is for a belief function that has a sufficiently good loss profile. We saw that in this case one can use standard entropy maximisers to determine belief functions which are arbitrarily close to optimal.

We would identify two main questions for further research. First, it remains an open question as to whether, when the evidence is not finitely generated, a construction appealing to standard entropy maximisers always leads to belief functions that are arbitrarily close to optimal. Second, it would be interesting to investigate the extent to which one can relax the condition that a weighting function should be regular. We speculated that it may be the case that language invariance can be used in place of the condition that the weighting function be strongly refined, but we have little evidence, at this stage, to warrant apportioning a high degree of belief to this claim.

Appendix

A. Non-maximal entropies and non-minimal losses

In Section Section 4.4 we gave a number of minimax theorems for finitely generated evidence. As we saw in Section Section 6.1 the case of evidence which is not finitely generated is more complex. Entropy limits incur, in certain cases, infinite worst case expected loss.

While the minimax theorems relate entropy maximisers (respectively entropy limits) to loss minimisers (respectively belief functions with the best loss profile), these theorems do not tell us much about the general relation between entropy and loss. In particular, the minimax theorems leave open the question as to whether an improvement in loss profile is always accompanied by greater entropy. In this section we will show that this is not the case, by appealing to an example involving a set of calibrated probability functions

E_{ℒ} \subset ℙ_{ℒ}

which is finitely generated and two probability functions, Q,

R \in E_{ℒ}

, such that Q has a better loss profile than R but has lower entropy than R.

In contrast to Section 6.1, our functions Q and R are open-minded. So, all losses we consider are finite. The fact that R has greater entropy than Q but also incurs a greater loss is thus not due to taking logarithms of zero.

For the sake of simplicity, we shall consider

ℒ = ℒ^{U}

.

Proposition 35. There exist regular weightings g, a finitely generated set

E_{ℒ} \subset ℙ_{ℒ}

and probability functions Q,

R \in E_{ℒ}

such that for all

n \in ℕ

\begin{matrix} H_{g}^{n} (Q) < H_{g}^{n} (R) \\ _{P \in E_{ℒ}}^{\sup} S_{g}^{n} (P, Q) <_{P \in E_{ℒ}}^{\sup} S_{g}^{n} (P, R) . \end{matrix}

The standard weighting g_Ω is another such weighting.

Thus, Q has a better loss profile than R, Q ≺ R, but Q also has lower entropy than R, R ≫ Q. Proof. Let

E_{ℒ} = {P \in ℙ_{ℒ} : P (ω_{2}^{2}) = P (ω_{3}^{2}) = 0 & P (ω_{2}^{1}) \geq 0.495} .

We now define R and Q as follows for n ≥ 3:

\begin{array}{l} R (ω_{1}^{1}) : = 0.505 = : R (ω_{1}^{2}) and R (ω_{2}^{1}) : = 0.495 = : R (ω_{4}^{2}) \\ R (ω_{i}^{n}) : = 0.505 \cdot \frac{4}{| Ω_{n} |} for all 1 \leq i \leq \frac{1}{4} | Ω_{n} | \\ R (ω_{i}^{n}) : = 0 for all \frac{1}{4} | Ω_{n} | + 1 \leq i \leq \frac{3}{4} | Ω_{n} | \\ R (ω_{i}^{n}) : = 0.495 \cdot \frac{4}{| Ω_{n} |} for all \frac{3}{4} | Ω_{n} | + 1 \leq i \leq | Ω_{n} | \\ Q (ω_{1}^{1}) : = 0.490 = : R (ω_{1}^{2}) and R (ω_{2}^{1}) : = 0.510 = : R (ω_{4}^{2}) \\ Q (ω_{i}^{n}) : = 0.490 \cdot \frac{4}{| Ω_{n} |} for all 1 \leq i \leq \frac{1}{4} | Ω_{n} | \\ Q (ω_{i}^{n}) : = 0 for all \frac{1}{4} | Ω_{n} | + 1 \leq i \leq \frac{3}{4} | Ω_{n} | \\ Q (ω_{i}^{n}) : = 0 .510 \cdot \frac{4}{| Ω_{n} |} for all \frac{3}{4} | Ω_{n} | + 1 \leq i \leq | Ω_{n} | . \end{array}

That is, Q and R equivocate beyond

ℒ_{2}

.

We find for n = 1

\begin{array}{l} g (π^{1}) \cdot H_{Ω}^{1} (Q) = H_{g}^{1} (Q) = - g (π^{1}) (0.49 \log (0.49) + 0.51 \log (0.51)) \\ \approx 0.6929 \cdot g (π^{1}) \\ < 0.6931 \cdot g (π^{1}) \\ \approx - g (π^{1}) (0.505 \log (0.505) + 0.495 \log (0.495)) \\ = H_{g}^{1} (R) = g (π^{1}) \cdot H_{Ω}^{1} (R) \end{array}

and

\begin{array}{l} g (π^{1}) \cdot_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, Q) =_{P \in E_{ℒ}}^{\sup} S_{g}^{1} (P, Q) = - g (π^{1}) (0.495 \log (0.51) + 0.505 \log (0.49)) \\ \approx 0.6935 \cdot g (π^{1}) \\ < 0.7032 \cdot g (π^{1}) \\ \approx - g (π^{1}) \log (0.495) \\ =_{P \in E_{ℒ}}^{\sup} S_{g}^{1} (P, R) = g (π^{1}) \cdot_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, R) . \end{array}

Having established the result for n = 1 we shall now to the general case for n ≥ 2.

For n = 2 note that

\begin{matrix} H_{Ω}^{2} (Q) = H_{Ω}^{1} (Q) < H_{Ω}^{1} (R) = H_{Ω}^{1} (R) \\ _{P \in E_{ℒ}}^{\sup} S_{Ω}^{2} (P, Q) =_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, Q) <_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, R) =_{P \in E_{ℒ}}^{\sup} S_{Ω}^{2} (P, R) . \end{matrix}

For n ≥ 3 we have

\begin{array}{l} H_{Ω}^{n} (Q) = - (0.49 \cdot \frac{4}{| Ω_{n} |}) \frac{| Ω_{n} |}{4} \log (0.49 \cdot \frac{4}{| Ω_{n} |}) - (0.51 \cdot \frac{4}{| Ω_{n} |}) \frac{| Ω_{n} |}{4} \log (0.51 \cdot \frac{4}{| Ω_{n} |}) \\ = - 0.49 \log (0.49) - 0.51 \log (0.51) + \log \frac{| Ω_{n} |}{4} \\ = H_{Ω}^{1} (Q) + \log \frac{| Ω_{n} |}{4} \end{array}

and in the same way we find

H_{Ω}^{n} (R) = H_{Ω}^{1} (R) + \log \frac{| Ω_{n} |}{4}

Furthermore,

\begin{array}{l} _{P \in E_{ℒ}}^{\sup} S_{Ω}^{n} (P, Q) = - 0.505 \log (0.49 \cdot \frac{4}{| Ω_{n} |}) - 0.495 \log (0.51 \cdot \frac{4}{| Ω_{n} |}) \\ = - 0.49 \log (0.49) - 0.51 \log (0.51) + \frac{| Ω_{n} |}{4} \\ =_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, Q) + \log \frac{| Ω_{n} |}{4} \\ _{P \in E_{ℒ}}^{\sup} S_{Ω}^{n} (P, R) = - \log (0.495 \cdot \frac{| Ω_{n} |}{4}) \\ =_{P \in E_{ℒ}}^{\sup} S_{Ω}^{1} (P, R) = + \log \frac{| Ω_{n} |}{4} . \end{array}

This establishes the result for g = g_Ω.

The result follows for such general weightings g which converge quickly enough to the standard weighting g_Ω so that all further terms are negligible.

B. Symmetry and equivocator preservation

Recall Definition 14: g is called equivocator-preserving, if and only if g_n is equivocator-preserving for all

n \in ℕ

, i.e., if and only if

ℙ_{n}^{†} = {Q_{⇂ n} : Q \in \arg_{P \in ℙ_{ℒ}}^{\sup} H_{g}^{n} (P)} = {P_{= ⇂ n}} .

So, if g is equivocator-preserving and if

P_{=} \in [E_{ℒ}]

, then

P_{=} \max {imises sup}_{P \in E_{ℒ}} H_{g}^{n} (P)

and thus

ℙ^{†} = {P_{=}} = maxent E_{ℒ} = {minloss B}_{ℒ}

. We know from Proposition 7 that inclusive and symmetric g are equivocator-preserving.

Interestingly, we shall see that there exist non-symmetric g_n which are equivocator-preserving. This answers the question posed at the bottom of Landes and Williamson [4] (p. 3574) in the negative.

Proposition 36 (Non-symmetric equivocator preservation). For all

n \in ℕ

such that |Ω_n| ≥ 4 there exist inclusive, equivocator-preserving and non-symmetric weighting functions g_n. The set of such weighting functions g_n is convex.

Proof. By Landes and Williamson [4] (Lemma 9 p. 3573) it holds that g_n is inclusive and equivocator-preserving, if and only if

y (ω) : = \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} \sum_{_{ω \in F}^{π \in \prod_{κ}}} - g_{n} (π) (1 - \log | Ω_{n} | + \log | F |) = c

for some constant, c.

Note that we can simply this expression as follows

\begin{array}{l} \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} \sum_{_{ω \in π}^{π \in \prod_{n}}} - g_{n} (π) (1 - \log | Ω_{n} | + \log | F |) \\ = (1 - \log | Ω_{n} | \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} \sum_{_{ω \in π}^{π \in \prod n}} - g_{n} (π) + \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} \sum_{_{ω \in F}^{π \in \prod_{π}}} - g_{n} (π) \log | F | \\ = (1 - \log | Ω_{n} | \sum_{π \in \prod_{n}} - g_{n} (π) + \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} \log (| F |) \cdot \sum_{_{F \in π}^{π \in \prod_{n}}} - g (π) \\ = (1 - \log | Ω_{n} | \sum_{π \in \prod_{n}} - g_{n} (π) + \sum_{_{ω \in F}^{F \subseteq Ω_{n}}} - γ_{n} \log | F | . \end{array}

The first sum does not depend on ω. Thus, y(ω) is constant, if and only if

z (ω) : = \sum_{ω \in F}^{F \subseteq Ω_{n}} - γ_{n} (F) \log | F |

is constant.

Let us now define an inclusive and non-symmetric weighting function

{g^{'}}_{n}

which satisfies this condition. Let k be such that

Ω_{n} = {ω_{1}, \dots, ω_{2^{κ}}}

and put

\begin{array}{l} {g^{'}}_{n} ({ω_{1}, \dots, ω_{2^{κ - 1}}}, {ω_{2^{κ - 1} + 1}, \dots, ω_{2^{κ}}}) : \frac{1}{2} \\ {g^{'}}_{n} (π) : = 1 for all other π \in \prod_{n} . \end{array}

Clearly,

{g^{'}}_{n}

is inclusive (g_n(π) > 0 for all π ∈ Π_n), non-symmetric (there are two partitions π, π′ such that the classes of π and π′ have the same number of elements but

{g^{'}}_{n} (π) \neq {g^{'}}_{n} (π^{'})

) and z(ω) is constant (since

\sum_{ω \in F}^{F \subseteq Ω_{n}} \log (| F |) \cdot \sum_{F \in π}^{π \in Π_{κ}} - g_{n} (π)

is invariant under permutations of n-states).

Addressing the second part of the proof: For inclusive and equivocator-preserving g_n it holds that

H_{g}^{n}

is a strictly concave function on

ℙ_{n}

and

\sup_{P \in ℙ_{n}} H_{g}^{n} (P)

always obtains for P_=⇂_n.

ℙ_{n}

is convex. Hence, the unique maximum of every convex combination of such g_n obtains for P_=⇂_n.

In general, computing a function which maximises

H_{g}^{n} (P)

for

P \in [E_{ℒ}]

is a non-trivial computational problem, even for g = g_Ω. The only widely shared intuition is that P= ought to be the function in

ℙ_{ℒ}

which has greatest entropy. Imposing symmetry is sufficient—but, as we have just seen, not necessary—to ensure that this constraint is satisfied. Imposing symmetry has further structural consequences such as: if

E_{ℒ}

is invariant under renaming of states, then so is

P_{n}^{†}

; see Landes and Williamson [4] (Appendix B.3) for details.

C. Key notation

Here we summarise key notation, for ease of reference.


Symbol	Reference	Meaning
⟨ ⟩	Page 2460	Convex hull
[ ]	Page 2463	Closure
$ℒ^{\exists}$	Page 2463	Predicate language with quantifiers
$ℒ^{^{∄}}$	Page 2463	Predicate language without quantifiers
$B_{ℒ}$	Definition 1	(Normalised) belief functions on sentences $S ℒ$
$ℙ_{ℒ}$	Page 2465	Probability functions on $S ℒ$
$ℙ_{n}$	Page 2469	Probability functions on $S ℒ_{n}$
Ω_n	Page 2463	n-states
P=	Page 2473	Equivocator function in $ℙ_{ℒ}$ , P=(ω) = 1/\|Ω_n\| for each ω ∈ Ω_n
$Π_{ℒ}$	Page 2464	Partitions of sentences $S ℒ$
Π_n	Definition 4	Partitions of propositions $P Ω_{n}$
Π	Definition 4	All partitions of propositions, $\cup_{n = 1}^{\infty} Π_{n}$
πⁿ	Page 2470	{{ω} : ω ∈ Ω_n}, the finest partition of Ω_n
$E_{ℒ}$	Page 2464	Calibrated belief functions on $S ℒ$
$E_{n}$	Page 2469	Restrictions of these functions to $S ℒ_{n}^{∄}$
°B	Page 2467	Belief function on propositions induced by B defined on sentences
g	Definition 6	Weighting function
g_Ω	Page 2470	Standard weighting function
$H_{g}^{n}$	Definition 9	n-entropy
$H_{Ω}^{n}$	Definition 10	Standard (Shannon) entropy
maxent $E_{ℒ}$	Page 2472	Calibrated functions on $ℒ$ with maximal entropy
$ℙ_{n}^{†}$	Page 2472	Calibrated functions on $ℒ_{n}$ with maximum n-entropy
$P_{n}^{†}$	Page 2472	Unique such function
$ℙ^{†}$	Definition 11	Limit points of maximum n-entropy functions
P^†	Page 2472	Unique such entropy limit
$P_{Ω}^{†}$	Page 2472	Standard entropy limit
ϱ_n	Definition 19	n-representations
$S_{g}^{n}$	Page 2484	Logarithmic n-score of a belief function wrt a probability function
$S_{g, ρ}^{n}$	Page 2484	Representation-relative n-score
minloss $B_{ℒ}$	Definition 21	Belief functions on $ℒ^{^{∄}}$ with the best loss profile
Minloss^* $B_{ℒ}$	Definition 24	Belief functions on $ℒ^{\exists}$ with the best loss profile

Acknowledgments

This research was conducted as a part of the project, From objective Bayesian epistemology to inductive logic. We are grateful to the UK Arts and Humanities Research Council for funding this research and to Bas Lemmens for helpful comments.

Author Contributions

Both authors conceived the idea, did the analysis and wrote the paper. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References and Notes

Williamson, J. In defence of objective Bayesianism; Oxford University Press: Oxford, UK, 2010. [Google Scholar]
There are several alternatives to the objective Bayesian account of strength of belief, including subjective Bayesianism, imprecise probability, the theory of Dempster-Shafer belief functions and related theories. Here we only have the space to motivate objective Bayesianism, not to assess these other views.
Taking the convex hull may mean that a calibrated belief function does not satisfy the known constraints on physical probability. For example, if θ is known to be a statement about the past then it is known that its physical probability is 0 or 1; bel is not constrained to be 0 or 1, however, unless it is also known whether or not θ is true. Similarly, it may be known that two propositions are probabilistically independent with respect to physical probability; this need not imply that they are probabilistically independent with respect to epistemic probability. See Williamson [1] (pp. 44–45) for further discussion of this point.
Landes, J.; Williamson, J. Objective Bayesianism and the Maximum Entropy Principle. Entropy 2013, 15, 3528–3591. [Google Scholar]
Gaifman, H. Concerning Measures in First Order Calculi. Isr. J. Math. 1964, 2, 1–18. [Google Scholar]
Williamson, J. Lectures on Inductive Logic; Oxford University Press: Oxford, UK, 2015. [Google Scholar]
Paris, J.B. The Uncertain Reasoner’s Companion; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
Williamson, J. Probability logic. In Handbook of the Logic of Argument and Inference: the Turn toward the Practical; Gabbay, D., Johnson, R., Ohlbach, H.J., Woods, J., Eds.; Elsevier: Amsterdam, The Netherlands, 2002; pp. 397–424. [Google Scholar]
Haenni, R.; Romeijn, J.W.; Wheeler, G.; Williamson, J. Probabilistic Logics and Probabilistic Networks; Synthese Library, Springer: Dordrecht, The Netherlands, 2011. [Google Scholar]
Cover, T.M.; Thomas, J.A. Elements of Information Theory; John Wiley and Sons: New York, NY, USA, 1991. [Google Scholar]
Rudin, W. Principles of Mathematical Analysis, 3 ed.; McGraw-Hill: New York, USA, 1973. [Google Scholar]
de Finetti, B. Theory of Probability; Wiley: London, UK, 1974. [Google Scholar]
Joyce, J.M. A Nonpragmatic Vindication of Probabilism. Philos. Sci. 1998, 65, 575–603. [Google Scholar]
Joyce, J.M. Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief. In Degrees of Belief; Huber, F., Schmidt-Petri, C., Eds.; Synthese Library 342; Springer: New York, NY, USA, 2009. [Google Scholar]
Grünwald, P.; Dawid, A.P. Game Theory, Maximum Entropy, Minimum Discrepancy, and Robust Bayesian Decision Theory. Ann. Stat. 2004, 32, 1367–1433. [Google Scholar]
Savage, L.J. Elicitation of Personal Probabilities and Expectations. J. Am. Stat. Assoc. 1971, 66, 783–801. [Google Scholar]
Popper, K.R. The Logic of Scientific Discovery; Routledge: London, UK, 1999. [Google Scholar]
Barnett, O.; Paris, J.B. Maximum Entropy Inference with Quantified Knowledge. Logic J. IGPL 2008, 16, 85–98. [Google Scholar]
Williamson, J. Objective Bayesian Probabilistic Logic. J. Algorithm. 2008, 63, 167–183. [Google Scholar]

© 2015 by the authors; licensee MDPI, Basel, Switzerland This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Landes, J.; Williamson, J. Justifying Objective Bayesianism on Predicate Languages. Entropy 2015, 17, 2459-2543. https://doi.org/10.3390/e17042459

AMA Style

Landes J, Williamson J. Justifying Objective Bayesianism on Predicate Languages. Entropy. 2015; 17(4):2459-2543. https://doi.org/10.3390/e17042459

Chicago/Turabian Style

Landes, Jürgen, and Jon Williamson. 2015. "Justifying Objective Bayesianism on Predicate Languages" Entropy 17, no. 4: 2459-2543. https://doi.org/10.3390/e17042459

APA Style

Landes, J., & Williamson, J. (2015). Justifying Objective Bayesianism on Predicate Languages. Entropy, 17(4), 2459-2543. https://doi.org/10.3390/e17042459

Article Menu

Justifying Objective Bayesianism on Predicate Languages

Abstract

1. Introduction

2. Beliefs over Propositions

3. Beliefs over Sentences of a Predicate Language

3.1. Norms

3.2. Belief and Probability

3.3. Application to Inductive Logic

4. Quantifier-Free Languages

4.1. Weighting Functions

4.2. Entropy

4.2.1. The Standard Entropy Limit

4.2.2. General Entropies

4.3. Loss and Expected Loss

4.4. Minimax Theorems

4.4.1. Minimax on Finite Sublanguages

4.4.2. Minimax for Inductive Logic

4.5. Infinite-Language Invariance

5. Handling Quantifiers

5.1. Limits to the Minimax Approach

5.2. The Probability Norm

6. More Complex Evidence

6.1. When Losses Cannot Be Minimised

7. Conclusion

Appendix

A. Non-maximal entropies and non-minimal losses

B. Symmetry and equivocator preservation

C. Key notation

Acknowledgments

Author Contributions

Conflicts of Interest

References and Notes

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI