Next Article in Journal
Thermodynamic Analysis of Double-Stage Compression Transcritical CO2 Refrigeration Cycles with an Expander
Next Article in Special Issue
Maximum Entropy Applied to Inductive Logic and Reasoning
Previous Article in Journal
Information Geometry on Complexity and Stochastic Interaction
Previous Article in Special Issue
Maximum Entropy and Probability Kinematics Constrained by Conditionals
Article Menu

Export Article

Entropy 2015, 17(4), 2459-2543; doi:10.3390/e17042459

Article
Justifying Objective Bayesianism on Predicate Languages
Department of Philosophy, School of European Culture and Languages, University of Kent, Canterbury CT2 7NF, UK
*
Author to whom correspondence should be addressed.
Academic Editor: Kevin H. Knuth
Received: 11 February 2015 / Accepted: 9 April 2015 / Published: 22 April 2015

Abstract

: Objective Bayesianism says that the strengths of one’s beliefs ought to be probabilities, calibrated to physical probabilities insofar as one has evidence of them, and otherwise sufficiently equivocal. These norms of belief are often explicated using the maximum entropy principle. In this paper we investigate the extent to which one can provide a unified justification of the objective Bayesian norms in the case in which the background language is a first-order predicate language, with a view to applying the resulting formalism to inductive logic. We show that the maximum entropy principle can be motivated largely in terms of minimising worst-case expected loss.
Keywords:
objective Bayesianism; g-entropy; predicate language; scoring rule; minimax

1. Introduction

Objective Bayesianism holds that the strengths of one’s beliefs should satisfy three norms [1,2]:

  • Probability. The strengths of one’s beliefs should satisfy the axioms of probability: if bel is one’s belief function, which assigns a degree of belief to each sentence of one’s language, then bel ∈ ℙ, the set of probability functions defined on the sentences of one’s language.

  • Calibration. The strengths of one’s beliefs should fit one’s evidence: bel E, the set of belief functions compatible with one’s evidence. In particular, the strengths of one’s beliefs should be calibrated with physical probabilities, insofar as one has evidence as to what the physical probabilities are: if one’s evidence determines just that the physical probability function P* lies in some non-empty set ℙ* of probability functions, then bel E = 〈ℙ*〉, where 〈ℙ*〉 is the convex hull of ℙ* [3].

  • Equivocation. The strengths of one’s beliefs should otherwise equivocate sufficiently between the basic possibilities that one can express: bel is some function in E that is sufficiently equivocal. Note that entropy is often used as a measure of the extent to which a probability function equivocates.

These three norms are usually justified in rather different ways. The Probability norm is usually justified as being required if one is to avoid sure loss—the Dutch book argument. The Calibration norm needs to hold if one is to avoid loss in the long run when one repeatedly bets on similar events. It has also been argued that the Equivocation norm should hold if one is to minimise worst-case expected loss. See Williamson [1] (Chapter 3) for discussion of these justifications. Unfortunately, these justifications do not cohere particularly well, because the betting set-up and the notion of loss differ in each case—for the Probability norm, the notion of loss is sure single-case loss, where losses may be positive or negative; for the Calibration norm it is almost-sure (i.e., probability 1) long-run loss, positive or negative; for the Equivocation norm, it is worst-case expected loss, where the loss is positive and logarithmic. Furthermore, a justification for the order in which the norms are applied is missing. In particular, the justification of the Equivocation norm presumes that belief is probabilistic; for this justification to work, some argument is needed for the claim that avoiding sure loss should be prioritised over minimising worst-case expected loss; but there is as yet no such argument. The question thus arises as to whether a single, unified justification can be given for the three norms, in order to circumvent the above problems.

Landes and Williamson [4] provided a single, unified justification for the situation in which one’s beliefs are defined over propositions, construed as subsets of a finite set Ω of outcomes. It turns out that all three norms must hold if one is to minimise worst-case expected loss: one’s belief function should be a probability function in E = 〈ℙ*〉 that has sufficiently high entropy. This line of argument will be described in Section 2. Landes and Williamson [4] went on to extend this unified justification to the situation in which beliefs are defined over sentences of a propositional language, formed by recursively applying the usual propositional connectives ¬, ˄, ˅, →, ↔ to a finite set of propositional variables.

In this paper we shall show that a similar justification goes through for the situation in which beliefs are defined over sentences of a first-order predicate language, with the use of predicate, constant and variable symbols as well as the quantifiers ∀, ∃. In Section 3 we shall formulate the norms of objective Bayesianism in the context of a predicate language. In Section 4 we shall provide a justification for maximising entropy when the language in question is a predicate language without quantifier symbols and when the evidence set is finitely generated. In Section 5, we shall extend this line of argument to predicate languages that contain quantifier symbols. In Section 6 we shall investigate the case of evidence which is not finitely generated. Key concepts and notation are collected in Appendix C for ease of reference.

The key technical results in this paper are Theorem 3, Theorem 6, Theorem 7, and Theorem 8. These results all suppose that the available evidence is finitely generated (in the sense of Definition 5). The first two jointly show that, on a quantifier-free predicate language, the belief function with the best loss profile is the calibrated probability function which has maximal entropy. Theorem 7 implies that adding new constant or predicate symbols to the language does not change the inferences one draws which are expressible in the original language. Theorem 8 extends Theorem 3 and Theorem 6 to predicate languages with quantifiers. En route to proving Theorem 8, we improve on Gaifman’s Unique Extension Theorem [5] (Theorem 1) in Proposition 24.

The case of evidence which cannot be finitely generated is more involved. We consider a case in which no belief function has an optimal loss profile in Proposition 28 and Proposition 30. While there are no functions with the best loss profile in that case, we show in Proposition 29 and Proposition 31 that probability functions in a neighbourhood of the calibrated function with maximal entropy have arbitrarily good loss profiles. We also discuss a case in which the belief function with best loss profile does indeed turn out to be the calibrated probability function which has maximal entropy, see Theorem 9.

2. Beliefs over Propositions

Here we will recap the relevant results of Landes and Williamson [4], to which the reader is referred for further details and motivation. In this section we will be concerned solely with a finite set Ω of possible outcomes. We shall suppose that each member ω of Ω is a state ±A1 ˄˄ ±An of a finite propositional language L = {A1,∆, An}. A proposition F is a subset of Ω. Let Π be the set of all partitions of Ω. We take {∅, Ω}, {Ω} ∈ Π. In order to limit the proliferation of partitions, we suppose that the only partition in which ∅ occurs is {∅, Ω}.

Given a belief function bel : P Ω 0 that is not zero everywhere, we normalise by dividing each degree of belief by max π Π F π b e l ( F ) to form a belief function, B : P Ω [ 0 , 1 ], with degrees of belief in the unit interval. The set of normalised belief functions is

B : = { B : P Ω [ 0 , 1 ] : F π B ( F ) 1 for all π Π and F π B ( F ) = 1 for some π } .

On the other hand, the set of probability functions is

: = { B : P Ω [ 0 , 1 ] : F π B ( F ) = 1 for all π Π } B ,
where ⊂ denotes strict subset inclusion. The inclusion is strict since the following normalised belief function B is not in ℙ, B(∅) = 1 and B(F ) = 0 for all ∅ ⊂ F ⊆ Ω. Since {Ω} is a partition we have P (Ω) = 1 and since {Ω, ∅} is a partition it holds that P (∅) = 0 for all P ∈ ℙ.

Let L(F, B) be the loss incurred by adopting belief function B when proposition F turns out to be true. Arguably, in the absence of knowledge of the true loss function, the loss function L should be taken to be logarithmic, as we shall now see. Consider the following four conditions on a default loss function L:

  • L1. L(F, B) = 0 if B(F ) = 1.

  • L2. L(F, B) strictly increases as B(F) decreases from 1 towards 0.

  • L3. L(F, B) depends only on B(F), not on B(F 0) for F 0 6= F.

    To express the next condition we need some notation. Suppose = 1 2: say that = { A 1 , , A n } , 1 = { A 1 , , A m } , 2 = { A m + 1 , , A n } for some 1 ≤ m < n. Then ω ∈ Ω takes the form ω1 ˄ ω2 where ω1 ∈ Ω1 is a state of 1, and ω2 ∈ Ω2 is a state of 2. Given propositions F1 ⊆ Ω1 and F2 ⊆ Ω2 we can define F1 × F2 := {ω = ω1 ˄ ω2 : ω1 ∈ F1, ω2 ∈ F2}, a proposition of . Given a fixed belief function B such that B(Ω) = 1, 1 and 2 are independent sublanguages, written 1 B 2, if B(F1 × F2) = B(F1) · B(F2) for all F1 ⊆ Ω1 and F2 ⊆ Ω2, where B(F1) := B(F1 × Ω2) and B(F2) := B1 × F2). The restriction B 1 of B to 1 is a belief function on 1 defined by B 1 ( F 1 ) = B ( F 1 ) = B ( F 1 × Ω 2 ), and similarly for 2.

  • L4. Losses are additive when the language is composed of independent sublanguages: if = 1 2 for 1 B 2, then L ( F 1 × F 2 , B ) = L 1 ( F 1 , B 1 ) + L 2 ( F 2 , B 2 ), where L1, L2 are loss functions defined on 1, 2 respectively.

Theorem 1. If a loss function L satisfies L1–4 then L(F, B) = −k log B(F) for some constant k > 0 that does not depend on .

When we consider the notion of expected loss, we see that this concept depends on the weight given to the various partitions under consideration. Let g : Π → ℝ0 be a function that assigns a weight to each partition. Then the g-expected loss or g-score of a belief function B B with respect to a probability function P ∈ ℙ is defined by

S g L ( P , B ) : = π Π g ( π ) F π P ( F ) L ( F , B ) ,
for any weighting function g that is inclusive in the sense that for any proposition F, some partition π containing F is given positive weight. We adopt the usual convention that 0 log 0 = 0. This ensures that S g L ( P , B ) is well-defined. Theorem 1 allows us to focus attention on logarithmic g-score,
S g ( P , B ) : = π Π g ( π ) F π P ( F ) log B ( F ) .

An important property of a scoring rule is that arg inf B B S g L ( P , B ) = { P } for all P ∈ ℙ. That is, for fixed P ∈ ℙ, S g L ( P , B ) is uniquely minimised by B = P. This property is known as strict propriety.

Proposition 1 (Strict Propriety). Sg is strictly proper.

By analogy with the generalised notion of scoring rule, we get a similar generalisation of entropy, g-entropy:

H g ( B ) : = π Π g ( π ) F π B ( F ) log B ( F ) .

The standard entropy function corresponds to the special case in which g = gΩ, the (non-inclusive) weighting function that gives weight 1 to the partition {{ω} : ω ∈ Ω} of states and weight 0 to all other partitions.

It turns out that, if there is such a function, the probability function that minimises worst-case g-score, where the worst case is taken over physical probability functions in the set E = * , is the probability function in E that has maximum g-entropy:

Theorem 2. As noted above, E is taken to be convex and g inclusive. There is a unique member of arg sup P E H g ( P ), which we shall denote by P g . Furthermore,

arg sup P E H g ( P ) = arg inf B B sup P E S g ( P , B ) = { P g } .

Throughout this paper we use arg sup P E (and arg inf P E) to refer to the points in the closure [ E] of E that achieve the supremum (respectively infimum) whether or not these points are in E. (This convention shall also apply mutatis mutandis to suprema and infima over sets of belief functions defined on predicate languages later in this paper.)

The above theorem concerns the minimisation of worst-case g-score. If one replaces the minimisation of worst-case g-score by a more fine-grained criterion (which breaks ties between belief functions with the same worst-case g-score), then an analogue of the above theorem holds: there exists a unique belief function which is best with respect to this criterion and this function is P g , which maximises g-score in [ E]. When we move to predicate languages we will consider such a refinement in Definition 21.

3. Beliefs over Sentences of a Predicate Language

3.1. Norms

In this section we introduce the norms of objective Bayesianism as they apply to strength of belief in sentences formulated in a predicate language. This framework is presented in more detail in Williamson [1] (Chapter 5). It is this set of norms that we seek to justify in terms of the loss that a belief function exposes one to.

We shall take to be a first-order predicate language with finitely many relation symbols U1, …, Us, countably many constant symbols t1, t2, …, but no function or equality symbols. We will consider languages with and without the existential quantifier symbol, using the notation and to disambiguate where needed. We shall assume, as is usual in this setting, that each individual in the domain of discourse is picked out by a some constant symbol. The sentences S of are formed by recursively applying the usual connectives and the existential quantifier, if present. In = , universally quantified sentences may be defined in terms of existentially quantified sentences as usual via ∀(x) := ¬x¬θ(x). Note that S coincides with the set of quantifier-free sentences of . We shall also be interested in the finite sublanguages n, for n≥1, which are identical to except that they have only finitely many constant symbols t1, …, tn.

We shall list the atomic sentences of , i.e., sentences of the form Ut where U is a relation symbol and t is a tuple of constant symbols of the corresponding arity, as A1, A2, …, ordered in such a way that atomic sentences that can be expressed in n + 1 but not in n occur after the atomic sentences A1, …, Arn of n, for each n. Ωn will denote the set of n-states, i.e., sentences of the form ± A 1 ± A r n. We shall use Greek letters, such as θ, to denote sentences of , and Roman letters, e.g., F, to denote propositions expressed by such sentences. We shall construe propositions as sets of n-states, F ⊆ Ωn for some n (see Section 2).

The norms of objective Bayesianism can then be explicated thus:

Probability. The strengths of one’s beliefs should be representable by a probability function, i.e., a function P : S that satisfies the properties:

  • P1. P (τ) = 1 for all tautologies τ.

  • P2. If ⊨¬(φ ˄ ψ) then P (φ ˅ ψ) = P (φ) + P (ψ).

  • P3. P ( x θ ( x ) ) = sup m P ( i = 1 m θ ( t i ) ).

(Clearly P3 is only applicable in the case = .)

Calibration. One’s degrees of belief should satisfy constraints imposed by one’s evidence. Assuming all evidence is evidence of physical probabilities, P should lie in the set E L = * , the convex hull of the set of epistemically possible physical probability functions.

Equivocation. One’s degrees of belief should otherwise be sufficiently equivocal. Again, one can explicate this by saying that one’s belief function should have sufficiently high entropy. Here P has higher entropy than Q if there is some N such that for all n≥N, H Ω n ( P ) > H Ω n ( Q ), where H Ω nis standard entropy on n, H Ω n ( P ) : = ω Ω n P ( ω ) log P ( ω ).

The key question we attempt to answer here is: can these norms be given a unified justification in terms of avoiding avoidable loss?

3.2. Belief and Probability

A (non-normalised) belief function bel : S 0 is a function that maps any sentence of the language to a non-negative real number. For technical convenience we shall focus our attention on normalised belief functions, which are defined below.

A (countable) set of mutually exclusive sentences π S is called exhaustive if, for all interpretations under which the constants exhaust the universe of , there exists a sentence θ ∈ π such that θ. This means that it is not possible for all θ ∈ π to be false at the same time. In order to control the number of partitions, we shall assume that the only partitions in which contradictions κ occur are the partitions of the form {τ, κ}, for some tautology τ. Let Π denote the set of partitions of .

Example 1 (Infinite partitions). Even though does not contain a symbol for equality and every element of a partition is a sentence of , which is of finite length, infinite partitions such as the following do exist:

π : = { x ¬ U 1 x } k = 1 { U 1 t k l = 1 k 1 ¬ U 1 t l } .

(Here it is presupposed that contains a unary predicate symbol U1.) On the other hand, it turns out that there are no infinite partitions in [6] (§2.5).

We take it that it is a matter of convention on which scale beliefs are measured. For convenience, we want to normalise this scale to the unit interval, [0, 1], so that all belief functions are considered on the same scale.

Definition 1 (Normalised belief function). Let M : = sup π Π φ π b e l ( φ ). Then define the normalisation of bel as B ( φ ) : = b e l ( φ ) M, if M > 0. For a function f assigning every φ S the same value v ∈ ℝ≥0 we write f ≡ v. We shall consider bel ≡ 0 as normalised. The set of normalized belief functions on S then is

B : = { B : S [ 0 , 1 ] : } φ π B ( φ ) 1 f o r a l l π Π a n d φ π B ( φ ) = 1 f o r s o m e π Π } { B 0 } .

For the normalisation of bel, B, it holds that B ≡ 0, if and only if M = +∞ or bel ≡ 0.

We will be particularly interested in the following subset of functions:

: = { P : S [ 0 , 1 ] : φ π P ( φ ) = 1 for all π Π } .

These are the probability functions:

Proposition 2. P , if and only if P : S [ 0 , 1 ] satisfies the axioms of probability:

  • P1. P (τ) = 1 for all tautologies τ S .

  • P2. If ⊨ ¬(φ ˄ ψ) then P (φ ˅ ψ) = P (φ) + P (ψ).

  • P3. P ( x θ ( x ) ) = sup m P ( i = 1 m θ ( t i ) ).

Proof. First we shall see that P satisfies the axioms of probability.

  • P1. For any tautology τ ∈ SL it holds that P (τ) = 1 because {τ} is a partition in ΠL. P (κ) = 0 for all contradictions κ because {τ, κ} is a partition in ΠL and P (τ) = 1.

  • P2. Suppose that φ, ψ S are such that ⊨ ¬(φ ˄ ψ). We shall proceed by cases to show that P (φ ˅ ψ) = P (φ) + P (ψ). In the first three cases one of the sentences is a contradiction, in the last two cases there are no contradictions.

    • φ and ⊨ ¬ψ, then ⊨ φ ˅ ψ. Thus by the above P (φ) = 1 and P (ψ) = 0 and hence P (φ ˅ ψ) = 1 = P (φ) + P (ψ).

    • ¬φ and ⊨ ¬ψ, then ⊨ ¬φ ˄ ¬ψ. Thus P (φ ˅ ψ) = 0 = P (φ) + P (ψ).

    • ¬φ, ⊭ φ, and ⊨ ¬ψ, then {φ ˅ ψ, ¬φ ˅ ψ} and {φ, ¬φ ˅ ψ} are both partitions in Π . Thus P (φ ˅ ψ) + P (¬φ ˅ ψ) = 1 = P (φ) + P (¬φ ˅ ψ). Putting these observations together we now find P (φ ˅ ψ) = P (φ) = P (φ) + P (ψ).

    • ¬φ, ⊭ ¬ψ and ⊨ φ ↔ ¬ψ, then {φ, ψ} is a partition and φ ˅ ψ is a tautology. Hence, P (φ) + P (ψ) = 1 and P (φ ˅ ψ) = 1. This now yields P (φ) + P (ψ) = P (φ ˅ ψ).

    • ¬φ, ⊭ ¬ψ and ⊭ φ ↔ ¬ψ, then none of the following sentences is a tautology or a contradiction φ, ψ, φ˅ψ, ¬(φ˅ψ). Since {φ, ψ, ¬(φ˅ψ)} and {φ˅ψ, ¬(φ˅ψ)} are both partitions in ΠL we obtain P (φ) + P (ψ) = 1 − P (¬(φ ˅ ψ)) = P (φ ˅ ψ). So P (φ) + P (ψ) = P (φ ˅ ψ).

  • P3. For the rest of this proof we only have to consider = .

If ⊨ ∃(x), then P (∃(x)) = 1.

Furthermore, the set {θn : n ∈ ℕ} with θ n : = θ ( t n ) j = 1 n 1 ¬ θ ( t j ) is exhaustive. Note that i = 1 n θ ( t i ) i = 1 n θ i. P1 and P2 are well-known to imply that logically equivalent sentences are assigned the same probability; see [7] (Proposition 2.1.c). Hence, P ( i = 1 n θ ( t i ) ) = P ( i = 1 n θ i ).

The θi are mutually exclusive. We obtain from P2 that P ( i = 1 n θ i ) = i = 1 n P ( θ i ). Next, define a set Θ := {θn : θn satisfiable} which consists of exhaustive, satisfiable and mutually exclusive sentences. Hence Θ is a partition in Π . We finally obtain

1 = θ Θ P ( θ ) lim n i = 1 n P ( θ n ) = lim n P ( i = 1 n θ i ) = lim n P ( i = 1 n θ ( t i ) ) 1.

P1 and P2 are also well-known to imply that if ⊨ χ → ψ then P (χ) ≤ P (ψ), see [7] (Proposition 2.1.c). Since i = 1 n θ ( t i ) i = 1 n + 1 θ ( t i ) we obtain P ( i = 1 n θ ( t i ) ) P ( i = 1 n + 1 θ ( t i ) ). P ( i = 1 n θ i ) ) n is a (not necessarily strictly) increasing sequence. Then

1 = lim n P ( i = 1 n θ ( t i ) ) = sup n P ( i = 1 n θ ( t i ) ) .

The second equality holds also when 1 > lim n P ( i = 1 n θ ( t i ) ).

If neither ⊨ ∃(x) nor ⊨ ¬(x), then {∀x¬θ(x),∃(x)} is a partition. We consider two cases.

In the first case the set { x ¬ θ ( x ) , θ ( t 1 ) , θ ( t 2 ) ¬ θ ( t 1 ) , , θ ( t k ) ¬ j = 1 k 1 θ ( t j ) , } is not a partition.

For example, this set fails to be a partition for θ(x) = ¬Ut2 ˄ Ux: the sentence θ(t2) ˄ ¬θ(t1) = ¬Ut2˄Ut2˄¬(¬Ut2˄Ut1) is a contradiction and hence it cannot be contained in a partition π consisting of infinitely many sentences.

¬ i = 1 m θ ( t i ) cannot be a contradiction since ¬θ(x) is satisfiable and ¬ θ ( x ) ¬ i = 1 m θ ( t i ). If ¬ i = 1 m θ ( t i ) is a tautology, then all θn with n ≤ m are contradictions. Hence, for all m ∈ ℕ the set { ¬ i = 1 m θ ( t i ) } , { θ n : n m and θ n is satisfiable } is a partition, as is { ¬ i = 1 m θ ( t i ) , i = 1 m θ ( t i ) }. Furthermore, {∀x¬θ(x)} ∪ {θn : θn is satisfiable} is a partition.

Recalling that P(κ) = 0 for all contradictions κ we obtain k = 1 m P ( θ k ) = P ( i = 1 m θ ( t i ) ) and

P ( x θ ( x ) ) = lim m P ( i = 1 m θ ( t i ) ) .

It remains to show that

lim m P ( i = 1 m θ ( t i ) ) = sup m P ( i = 1 m θ ( t i ) ) .

This follows as we saw above in (3).

In the second case the set {∀x¬θ(x), θ(t1), θ(t2) ˄ ¬θ(t1), …, θ(tk) ˄ ¬ j = 1 k 1 θ(tj),…} is a partition. Recall that {∀x¬θ(x), ∃(x)} is also a partition. We obtain as in the first case that

P ( x θ ( x ) ) = k = 1 P ( θ ( t k ) ¬ j = 1 k 1 θ ( t j ) ) = lim m k = 1 m P ( θ k ) = sup m P ( i = 1 m θ ( t i ) ) .

For the converse, note that P1–3 imply that P is a probability measure on S , and so additive over countable partitions (§2 in [8]; §2.5 in [6]). Hence P . □

Another key feature of probability functions is that they respect logical equivalence:

Definition 2 (Respecting logical equivalence). For a sublanguage of we say that a function f : S [ 0 , 1 ] respects logical equivalence on , if and only if for all φ, ψ S with φ ↔ ψ it holds that f(φ) = f(ψ). For = we simply say that f respects logical equivalence.

Proposition 3. The probability functions P respect logical equivalence.

Proof. Suppose P and assume that φ, ψ S are logically equivalent. Observe that {φ, ¬φ} and {ψ, ¬φ} are partitions in Π . Hence,

P ( φ ) + P ( ¬ φ ) = 1 = P ( ψ ) + P ( ¬ φ ) .

But then P (φ) = P (ψ).

Thus, the P assign logically equivalent sentences the same probability. □

If a belief function B : S [ 0 , 1 ] respects logical equivalence, it gives sentences which express the same proposition the same degree of belief. Hence, for any n ∈ ℕ, B induces a function °B defined over the propositions F ⊆ Ωn (c.f., Section 2). °B is defined by:

° B ( F ) : = B ( F ) = B ( ω Ω n ω F ω ) .

We will use the notation °nB to avoid ambiguity in cases where n varies.

The notion of a dominated belief function will prove useful in what follows:

Definition 3 (Dominated belief function). B B \ is dominated by a probability function P , if and only if for all φ S it holds that B(φ) ≤ P (φ).

Note that if B is dominated by P, then B ≠ P, and thus B(φ) < P (φ) has to hold at least for one sentence φ.

Proposition 4. There exist B B \ which are not dominated.

Proof. Let U be a relation symbol in of arity a ≥ 1, say. Let Ut1t be a well-formed formula of 2, i.e., t is a a − 1 tuple with consisting only of t1 and t2. Let O4 := {Ut1t ˄ Ut2t, Ut1t ˄ ¬Ut2t, ¬Ut1t ˄ Ut2t, ¬Ut1t ˄ ¬Ut2t}.

Let B : S [ 0 , 1 ] be such that

B ( φ ) : = { 1 100 iff φ ω for some ω O 4 50 100 iff φ ω ν for different ω , ν O 4 99 100 iff φ ¬ ω for some ω O 4 1 iff φ is a tautolog y 0 otherwise

Clearly, B B . We now that there does not exist a P such that B(φ) ≤ P (φ) for all show φ ∈ SL.

Note that

ω O 4 B ( ¬ ω ) = 3 + 96 100
and that for all P it holds that
ω O 4 P ( ¬ ω ) = ω O 4 ( P ( ¬ ω ) + P ( ω ) ) ω O 4 P ( ω ) = 4 1 = 3.

Note for later reference that for all n ≥ 3 and ω ∈ O4, {¬ω} ∪ {ν ∈ Ωn : νω} is a partition. So, ν Ω n ν ω B ( ν ) 1 100 has to hold. Hence, ν Ω n B ( ν ) 4 100.

Thus far we have considered partitions of sentences. We shall also need to consider partitions of propositions:

Definition 4 (Partitions of propositions). Let Πn be the set of partitions on Ωn. As in Section 2, we taken} andn, ∅} to be partitions and we suppose that there is no further partition containing ∅.

We then define the set of partitions: Π : = n = 1 Π n.

We use πn to denote the partition of n-states {{ω} : ω ∈ Ωn}.

Note that F1 := {ω ∈ Ω1 : ωU1t1} and F2 := {ω ∈ Ω2 : ω U1t1} are different propositions, where U1 is a unary predicate symbol. F1 is a member of {F1, F ¯ 1} ∈ Π1 and F2 is a member of {F2, F ¯ 2} ∈ Π2, but not vice versa. So {F1, F ¯ 1} and {F2, F ¯ 2} are different partitions, even if these partitions are intuitively equivalent.

3.3. Application to Inductive Logic

We shall be particularly interested in the use of objective Bayesianism over predicate languages to provide semantics for inductive logic.

Inductive logic typically seeks to answer questions of following form [9] (§1.1):

φ 1 X 1 , , φ k X k | ψ ?

This asks, if premiss sentences φ1, …, φk of have probabilities in sets X1, …, Xk ⊆ [0, 1] respectively, which probability or set of probabilities should attach to the conclusion sentence ψ?

The answer to this question will depend on the semantics given to the inductive entailment relation |≈ [9] (Part I). One natural option is to give the entailment relation objective Bayesian semantics, denoted by|≈°. Here the premisses are construed as statements about chance, i.e., P*(φ1) ∈ X1, …, P*(φk) ∈ Xk, and the question concerns rational belief: if one’s total evidence is captured by the premisses, to what extent should one believe the conclusion sentence ψ? Applying the norms of objective Bayesianism,

φ 1 X 1 , , φ k X k | ° ψ Y
holds just in case P (ψ) ∈ Y for every P E that has maximal entropy, where
E = φ 1 X 1 , , φ k X k : = { P * : P * ( φ 1 ) ϵ X 1 , , P * ( φ k ) ϵ X k } .

This application of objective Bayesian epistemology to inductive logic is an example in which E is generated by constraints involving only sentences of some finite sublanguage n. We will be particularly interested in the case where φ1, …, φk are quantifier-free sentences, i.e., sentences of n for some n.

Let n be the set of probability functions on n , and let

E n : = { P n n : P n = P n , P E }
where Pn is the restriction of P to S n . Note that,
P n ( θ ) : = ω Ω n ω θ P ( ω )
for all θ S n .

To ease the reading we also let n : = { P n n }.

Definition 5 (Finitely generated evidence set). E is finitely generated if it takes the form E = { P : P n E n } for some n ∈ ℕ, where E n n . Thus, E is generated by constraints involving only some φ 1 , φ 2 , ϵ S n and no other sentences.

From now on, for finitely generated E , the letter K is used to denote the smallest number n such that E is generated by constraints on n .

Note that an evidence set E which is not finitely generated may not be recapturable from { E 1 , E 2 , }. For instance, for

E = { P : lim n P ( i = 1 n U t i ) = 0 }
the following two facts hold simultaneously:
  • E

  • E n n for all n ∈ N.

4. Quantifier-Free Languages

We would like to develop an analogue of Theorem 2 for beliefs defined over the sentences of a predicate language: we would like to show that belief functions which minimise worst-case expected loss are probability functions in E that maximise entropy. The main difficulty in moving from the finite domain of propositions to countably many sentences of a predicate language is to ensure that worst-case expected loss is finite where possible, so that these losses can be compared and a belief function can be chosen that minimises worst-case expected loss. For this reason we proceed in two steps. First, in this section, we shall consider the case in which the predicate language has no quantifier symbol, i.e., = ; comparing worst-case expected loss is more straightforward in this case. Then, in Section 5, we shall examine how far our approach can be extended to handle predicate languages with quantifiers.

First, in Section 4.1 we define the notion of a weighting function. This allows us to define and analyse the concept of entropy of a probability function on = in Section 4.2. In Section 4.3 we introduce the idea of the loss profile of a belief function. Finally in Section 4.4 we show that, in various natural scenarios, the belief function that has the best loss profile is the probability function, from all those calibrated with evidence, that has maximal standard entropy.

4.1. Weighting Functions

Definition 6 (Weighting function). A weighting function on n, gn : Πn0, maps partitions π ∈ Πn to non-negative real numbers. A weighting function on , g : Π → ℝ0, is defined over partitions of propositions of all finite sublanguages. A weighting function on can be thought of as a family of weighting functions gn on n, where n ranges over the natural numbers. Given a fixed weighting function g on , we shall take g n : = g Π n for each n ∈ ℕ. A (general) weighting function g is taken to be defined over each predicate language = . Different languages = , = have different sets of relation symbols.

A weighting function g is atomic if for each and each n, gn depends only on the number of atomic propositions in n, not on the structure of those atomic propositions. Thus if and are such that n and m have the same number of atomic propositions, then g m = g m . In this paper we shall suppose that all weighting functions are atomic; hence there will be no need to superscript a weighting function on or n by the particular language .

We call g inclusive, if and only if it attaches positive weight to each proposition, i.e., if and only if for all n and all F ⊆ Ωn it holds that

π Π n F π g ( π ) > 0 .

As in Section 2, g is symmetric if for each n it is invariant under permutations of the states of n. It is refined if for each n it gives no less weight to a refinement π′ ∈ Πn of a partition π ∈ Πn than to π itself. For example, the partition weighting gΠ gives weight 1 to each partition, gΠ(π) = 1 for all π ∈ Π. The proposition weighting g P Ω gives weight 1 to each partition of size 2 and weight 0 to all other partitions; this amounts to giving weight 1 to each proposition. The standard weighting gΩ gives weight 1 to the partition πn of n-states, for each n, and weight 0 to all other partitions. These weighting functions are all symmetric. The partition and proposition weightings are inclusive, but the standard weighting is not. The partition and standard weightings are refined, but the proposition weighting is not.

Definition 7 (Strongly refined weighting function). g is strongly refined if and only if it satisfies the following properties:

  • g is refined: in each finite sublanguage, if partition π′ is a refinement of partition π, then g(π′) ≥ g(π).

  • Each finite sublanguage receives the same total weight: for all n, π Π n g ( π ) is constant.

  • A state partition on a richer language should not receive less weight than one one a less rich language: if m < n then g(πm) ≤ g(πn)

  • Non-state-partitions receive finite total weight: the following limit exists (i.e., is finite),

lim k n = 1 k π Π n \ { π n } g ( π ) .

Throughout this paper we will be particularly interested in the following weighting functions:

Definition 8 (Regular weighting function). g is regular if it is atomic, inclusive, symmetric and strongly refined.

4.2. Entropy

Definition 9 (n-entropy). Given a weighting function g and n ∈ ℕ, we define the n-entropy H g n : [ 0 , ] by:

H g n ( P ) : = π Π n g ( π ) F π ° P ( F ) log ° P ( F ) .

Recall that, for a probability function P (or indeed any belief function that respects logical equivalence) defined on sentences, °P is the function induced by P over the domain of propositions. Note that by our convention, 0 log 0 = 0 = 1 log 1. Thus, for all n ∈ ℕ,

g ( { Ω n } ) P ( Ω n ) log P ( Ω n ) = 0 = g ( { Ω n , } ) ( P ( ) log P ( ) + P ( Ω n ) log P ( Ω n ) ) .

In calculating n-entropy we may thus ignore all partitions which contain Ωn.

Definition 10 (Standard entropy). For the standard weighting gΩ we denote the corresponding n-entropy by H Ω n. We refer to H Ω n as standard entropy (on Ln). H Ω n ( P ) is the well-known Shannon Entropy of the n-states of P :

H Ω n ( P ) = ω Ω n P ( ω ) log P ( ω ) .

For a fixed weighting function g, we say that P has greater entropy than Q , written PQ, if the n-entropy of P eventually dominates that of Q, i.e., if there is some N ∈ ℕ such that for all n ≥ N, H g n ( P ) > H g n ( Q ).

This relation ≫ for comparing entropy is preferable to an alternative notion posed in terms of the limiting behaviour of the n-entropy of P and Q, which says that P has greater entropy than Q just when lim n H g n ( P ) > lim n H g n ( Q ). This is because the limiting behaviour is not fine-grained enough to distinguish greater from lesser entropy: n-entropy will often tend to infinity for both P and Q, and, even where the limiting n-entropy of P and Q are both finite, these limits may be equal even though the entropy of P is intuitively greater than that of Q, insofar as the n-entropy of P eventually dominates that of Q. See Williamson [1] (§5.5) for further discussion of these comparative notions of entropy.

We will be particularly interested in the probability functions in [ E ] with maximal entropy:

maxent E : = { P [ E ] : there is no Q [ E ] such that Q P } .

We shall also consider entropy maximisers on finite sublanguages. We shall use the notation:

n : = arg sup P E n H g n ( P ) .

(The members of this set are defined only on the sentences of n, not on the sentences of the language as a whole.) Note that for convex E , E n is convex for all n ∈ ℕ and that H g n is a strictly concave function on E n for inclusive g. If g is inclusive, then H g n is strictly concave on n. Hence n contains a unique element, which we will denote by n .

Let us consider the set of limit points of the entropy maximisers on finite sublanguages:

Definition 11 (Entropy limit). A probability function is a limit point of the entropy maximisers on finite sublanguages if it is arbitrarily close to infinitely many such maximisers. We denote the set of such limit points by:

: = { P : ϵ > 0 , infinite I , n I , φ S n , | P ( φ ) P n ( φ ) | < ϵ } .

Whenever consists only of a single function we shall denote that function by and refer to as the entropy limit.

One important desideratum for a procedure for choosing a rational belief function, particularly in the context of inductive logic, is language invariance. We shall consider two notions of language invariance: the following notion defined in terms of finite sublanguages, and a second form of language invariance, introduced in Definition 23, which we term infinite-language invariance.

Definition 12 (Finite-language invariant weighting function). A weighting function g : Π → ℝ0 is finite-language invariant, if and only if the following holds: for all E finitely generated by constraints on K, if n and m are such that K n m, then for all Q arg sup P E H g n ( P ) there exists some R arg sup P E H g m ( P ) such that Qn =Rn

4.2.1. The Standard Entropy Limit

Standard entropy, i.e., entropy with respect to the standard weighting gΩ, is the subject of a substantial literature. We here collect the features of standard entropy most relevant for our purposes.

Firstly, gΩ is finite-language invariant; see, e.g., [7]. If E is finitely generated and g = gΩ, then n contains a unique element. Furthermore, there exists a unique function P ∈ [ E ] such that for all n ≥ K Pn n holds. This function P is the entropy limit with respect to the standard weighting gΩ; it will be called the standard entropy limit and denoted by P Ω . Henceforth we use P Ω to denote the standard entropy limit on , rather than on Ω as in Section 2.

Definition 13 (Open-minded belief function). We say that a belief function B B is open-minded on , if and only if for all φ S for which there exists some P [ E ] such that P (φ) > 0 it holds that B(φ) > 0. For = we say that the belief function B B is open-minded.

The following proposition lists further important properties of P Ω which we shall make frequent use of in the following two properties—see [7] (p. 95) for a proof of the first property.

Proposition 5. P Ω satisfies the following properties:

  • P Ω is open-minded.

  • For a finitely generated E , for all n ≥ K and all ν ∈ Ωn, ω ∈ ΩK with ν ω it holds that P Ω ( ν ) = P Ω ( ω ) | Ω K | | Ω n |.

The second property will follow from Proposition 9 and from the fact that gΩ is language invariant. Let ν be a consistent conjunction of pairwise different literals such that νω for some n-state ω with n ≥ K. Denoting by |ν|, |ω| the number of literals in ν, respectively, ω, it follows from the second property in Proposition 5 that P Ω ( ν ) = P Ω ( ω ) 2 | ω | 2 | ν |.

4.2.2. General Entropies

The question remains as to how the functions on with maximal entropy, i.e., the members of maxent E , relate to the entropy maximisers P n n on the finite sublanguages n. We shall explore this question here.

Proposition 6. [ E ].

Proof. Let P ∈ ℙ. Thus, for all sentences φ S , P(φ) is the limit of a sequence ( P n ) n I such that P n [ E n ] and I ⊆ ℕ is infinite. Since [ E ] and all the [ E n] are closed, P ∈ [ E ].

Of particular interest is the most equivocal probability function of , which is called the equivocator and denoted by P=. P= is uniquely defined by the requirement that for all n ∈ ℕ it assigns all n-states ω ∈ Ωn the same probability, P = ( ω ) = 1 | Ω | The restriction of P= to ℙn is denoted by P= ⇂n.

In certain cases ℙ will only contain a single limit point ℙ.

Definition 14. [4] (Definition 16, p. 3573.) A weighting function gn on n is called equivocator-preserving, if and only if

n = { Q n : Q arg sup P H g n ( P ) } = { P = n } .
g is called equivocator-preserving, if and only if gn is equivocator-preserving for all n ∈ ℕ.

Proposition 7. If P= ∈ [ E ] and if g is symmetric and inclusive, then = {P=}.

Proof. By Landes and Williamson [4] (Corollary 6, p. 3574) we have

arg sup P E H g n ( P ) = { P : P n = P = n } .

It follows that

lim n arg sup P E H g n ( P ) = { P = }
and hence ℙ = {P=}. □

So, if g is symmetric and inclusive, then g is equivocator-preserving. In Appendix B we shall show that there exist non-symmetric g which are equivocator-preserving.

Definition 15 (State-inclusive weighting function). Given , we call a weighting function g : Π → [0, 1] state-inclusive on n, if and only if for each state ω ∈ Ωn there exists a π ∈ Πn such that {ω} ∈ π and g(π) > 0. A weighting function g : Π → [0, 1] is state-inclusive, if and only if it is state-inclusive on each n. It is eventually state-inclusive, if and only if there exists a J ∈ ℕ such that for all n ≥ J, g is state-inclusive on n.

For example, if g(πn) > 0 for all n ∈ ℕ, then g is state-inclusive. Moreover, inclusive implies state-inclusive.

Lemma 1. If g is state-inclusive on n, then H g n is strictly concave onn.

Proof. Let P, Q ∈ ℙn be different and λ ∈ (0, 1). Since for all π ∈ Πn we have F π ° P ( F ) = 1 = F π ° Q ( F ) we find using the strict concavity of −x · log x on [0, 1]

H g n ( λ P + ( 1 λ ) Q ) = π Π n g ( π ) F π ( λ ° P ( F ) + ( 1 λ ) ° Q ( F ) ) log ( λ ° P ( F ) + ( 1 λ ) ° Q ( F ) ) π Π n g ( π ) F π ( λ ° P ( F ) log ( λ ° P ( F ) ) ) + ( ( 1 λ ) ° Q ( F ) log ( ( 1 λ ) ° Q ( F ) ) ) = H g n ( λ P ) + H g n ( ( 1 λ ) Q ) .

The inequality is strict, if and only if there exists some π ∈ Πn with g(π) > 0 such that there is some F ∈ π with °P (F ) ≠ °Q(F). Since P, Q are different probability functions, there exists some ω ∈ Ωn such that P (ω) ≠ Q(ω). Since g is state-inclusive, g(π) > 0 for some π ∈ Πn with {ω} ∈ π. Hence, the inequality is strict. □

Proposition 8. If E is finitely generated, and g is eventually state-inclusive and language invariant, then consists of a single probability function and for all φ S it holds that lim n P n ( φ ) = P ( φ ).

Proof. Recall that E is expressible by constraints in K and let J as in Definition 15. Let n ≥ max{J, K}.

By the above Lemma 1, H g n is strictly concave on ℙn. Since E n is convex, arg sup P E n H g n ( P ) contains a single element. Hence, Q, R arg sup P E H g n ( P ) agree on S n.

Since g is language invariant, we have arg sup P E H g m ( P ) arg sup P E H g l ( P ) for all n ≤ l ≤ m.

For all φ S , there exists an s ∈ ℕ such that φ S s. Hence, for l, m ≥ max{J, K,s} it holds that for R arg sup P E H g m ( P ) and Q arg sup P E H g l ( P ) that R(φ) = Q(φ). □

For instance, standard entropy [4] (Equation 80), the substate weighting and other examples generated by Landes and Williamson [4] (Lemma 8) are eventually state-inclusive and language invariant. Note that these weighting functions are not inclusive.

Definition 16. We say that Hg is strictly concave, if and only if for all n ∈ ℕ, H g n is strictly concave onn.

Proposition 9 (Equivocation beyond n). Let E be finitely generated and let g be symmetric. If Hg is strictly concave, then for all n ≥ K and all ν, μ ∈ Ωn such that there exists an ω ∈ ΩK with νω and μω it holds that

P n ( ν ) = P n ( μ ) = P n ( ω ) · | Ω K | | Ω n |
for all P n n .

We call such ν, μ ∈ Ωn extensions of ω ∈ ΩK and say that P n equivocates beyond K. In particular, P n equivocates beyond Kup to n.

Proof. Let n > K and let P ∈ [ E ] be such that there exist ν, μ ∈ Ωn with P (ν) ≠ P(μ) such that there exists an ω ∈ ΩK with νω and μω. Assume for contradiction that P arg sup R E H g n ( R ).

Now define a probability function Q by first specifying Q on the n-states. Let

Q ( ν ) : = P ( μ ) Q ( μ ) : = P ( ν ) Q ( η ) : = P ( η ) for all η Ω n \ { ν , μ } .

For a λ ∈ Ωr with r ≥ n we let Q ( λ ) : = Q ( ξ ) | Ω n | | Ω r | where ξ ∈ Ωr is the unique r-state such that λξ.

By construction, Q and P agree on S K. Since E is finitely generated, it follows that Q ∈ [ E ]. Furthermore, Qn can be obtained from Pn by a renaming of n-states and it holds that QnPn. Since gn is symmetric it holds that H g n ( P ) = H g n ( Q ). Since [ E ] is convex and H g n is strictly concave, neither Pn nor Qn can maximise H g n over [ E n].

This contradicts P maximising H g n over [ E n].

Corollary 1. Let E be finitely generated. If H g n is strictly concave onn for n ≥ K and if g is symmetric, then for n ≥ K the following maximisation problem

m a x i m i s e : H g n ( P ) s u b j e c t t o : P [ E ]
can be understood as an optimisation problem in the variables P (ω) with ω ∈ ΩK. In particular, the number of variables does not grow as n tends to infinity.

Proof. Follows immediately from the above proposition by noting that P n arg sup P E H g n ( P ) equivocates beyond K up to n.

This corollary shows that in order to compute P n for n ≥ K one needs to solve an optimisation problem on ΩK. If g is not language invariant, then, in general, the objective function of the optimisation problem changes as n changes. So, in general, ( P n )K varies with n.

Corollary 2. Under the assumptions of Proposition 9 it holds that for F ⊆ Ωn and ν, μ ∈ Ωn, ° P n (F) =° P n (Fν,μ), where Fν,μ is the result we obtain by replacing ν by μ and vice versa in F.

Proof. For an η ∈ Ωn denote by ωη ∈ ΩK the unique K-state such that ηωη. Now simply note that by Proposition 9

° P n ( F ) = η Ω n η F P n ( η ) = η Ω n η F P n ( ω η ) | Ω K | | Ω n | = η Ω n η F ν , μ P n ( ω η ) | Ω K | | Ω n | = ° P n ( F ν , μ ) .

Corollary 3. Let E be finitely generated. For all n ≥ K and all P equivocating beyond K up to n it holds for all K ≤ k ≤ n − 1 that

H Ω k + 1 ( P ) = H Ω k ( P ) log | Ω k | | Ω k + 1 | .

If g is symmetric and Hg is strictly concave, then

H Ω K ( P k ) H Ω K ( P Ω ) = H Ω k + 1 ( P k + 1 ) H Ω k + 1 ( P Ω ) .

Proof. For ν ∈ Ωk+1 let ων ∈ Ωk be the unique k state such that νων. For K ≤ k ≤ n − 1 we now find for a P equivocating beyond K up to n

H Ω k + 1 ( P ) = ν Ω k + 1 P ( ν ) log P ( ν ) = ν Ω k + 1 P ( ν ) log ( P ( ω ν ) | Ω k | | Ω k + 1 | ) = log | Ω k | | Ω k + 1 | ν Ω k + 1 P ( ν ) log P ( ω ν ) = log | Ω k | | Ω k + 1 | ν Ω k + 1 ν Ω k + 1 ν ω P ( ν ) log P ( ω ν ) = log | Ω k | | Ω k + 1 | ω Ω k log P ( ω ) ( ν Ω k + 1 ν ω P ( ν ) ) = log | Ω k | | Ω k + 1 | ω Ω k P ( ω ) log P ( ω ) = log | Ω k | | Ω k + 1 | + H Ω k ( P ) .

The second part of the proof follows directly by observing that P Ω and P n equivocate beyond K up to n by Proposition 9. □

Corollary 4. Let E be finitely generated. For all n ≥ K and all P not equivocating beyond K up to n it holds that H Ω n ( P ) < H Ω K ( P ) log | Ω K | | Ω n |.

Proof. There has to exist at least one ξ ∈ ΩK such that there exist ν, λ ∈ Ωn with νξ and λξ such that P (ν) ≠ P (λ). Since P is a probability function it holds that P ( ξ ) = ν Ω n ν ξ P ( ν ). We thus find sing the log-sum inequality (see, e.g., Theorem 2.7.1 in [10])

P ( ξ ) log ( | Ω K | | Ω n | ) P ( ξ ) log P ( ξ ) = P ( ξ ) log ( | Ω K | | Ω n | P ( ξ ) ) = ν Ω n ν ξ ( | Ω K | | Ω n | P ( ξ ) ) log ( | Ω K | | Ω n | P ( ξ ) ) > ν Ω n ν ξ P ( ν ) log P ( ν ) .

If ξ ∈ ΩK is such that for all ν, λ ∈ Ωn with νξ and λξ it holds that P (ν) = P (λ), then the above calculation holds with the exception that the inequality is in fact an equality.

We hence find by summing over all ω ∈ ΩK

H Ω K ( P ) log ( | Ω K | | Ω n | ) = ω Ω K P ( ω ) ( log ( | Ω K | | Ω n | ) + log P ( ω ) ) > ω Ω K ν Ω n ν ξ P ( ν ) log P ( ν ) = ν Ω n ν ξ P ( ν ) log P ( ν ) . = H Ω n ( P ) .

Corollary 5. Let EL be finitely generated. If g is symmetric and if for all n ≥ K H g n is strictly concave onn, then

.

Proof. By Corollary 1, P n [ E n ] is uniquely determined by P n ( ω ) for ω ∈ ΩK. That is, we can understand ( P n ) n as sequence taking values in [ 0 , 1 ] | Ω K | | Ω K | and [ 0 , 1 ] | Ω K | is compact. Hence, the sequence ( ( P n ) K ) n has point of accumulation, Q, with Q ∈ [ E ]. Let I ⊆ ℕ be infinite such that lim i I , i P n i ( ω ) = Q ( ω ) for all ω ∈ ΩK.

Recall that for n > K that P n equivocates under K up to n. We now extend Q to a probability function in [ E ] by defining it on the n-states ν ∈ Ωn for n > K as follows: Q ( ν ) : = | Ω K | | Ω n | Q ( ω ν ) = | Ω K | | Ω n | lim i I , i P n i ( ω ν ). Hence, Q equivocates beyond K.

Consider some φ S . It follows that there is some r ≥ K such that φ S r. For ν ∈ Ωr denote by ων the unique element of ΩK such that νων.

We thus find

lim i i I P n i ( φ ) = lim i i I ν Ω r ν φ P n i ( ν ) = ν Ω r ν φ lim i i I P n i ( ν ) = ν Ω r ν φ | Ω K | | Ω n | lim i i I P n i ( ω ν ) = ν Ω r ν φ | Ω K | | Ω n | Q ( ω ν ) = ν Ω r ν φ Q ( ν ) = Q ( φ ) .

We now turn our attention to the calibrated functions with maximal entropy, maxent E . Our aim is to show that maxent E = = { P Ω } holds for regular g.

Lemma 2. If g is regular, then

lim n log ( | Ω n | ) π Π n \ { π n } g ( π ) = 0.

Proof. Since g is total it is in particular g defined for the language U which only contains a single relation symbol which is unary. When needed, we shall add a superscript U express that we consider U.

Now define a sequence (an)n∈ by

a n : = π Π n U \ { π n } g ( π ) .

By the Cauchy condensation test [11] (p. 61, Theorem 3.27) for (not necessarily strictly) decreasing sequences we have that

n = 1 a n < k = 0 2 k a 2 k < .

Since the series on the left converges by the assumption on finite weights, so does the right, and that implies that lim k 2 k a 2 k = 0.

For n ∈ ℕ let k ∈ ℕ be such that 2k ≤ n < 2k+1. Since an is (not necessarily strictly) decreasing a n a 2 k. Hence,

0 n a n 2 k + 1 a n 2 k + 1 a 2 k = 2 ( 2 k a 2 k ) .

The right hand side converges to 0 by Cauchy’s condensation test (6). Thus,

0 = lim n n a n = lim n n log 2 ( 2 ) a n = lim n n log 2 ( 2 ) a n = lim n log 2 ( | Ω n U | ) a n = lim n log ( | Ω n U | ) a n = lim n log ( | Ω n U | ) π Π n U \ { π n } g ( π )

Now if is some other language in our sense different from U, then for all n ∈ ℕ there exists an mn > n such that | Ω n | = | Ω n U |. This in turn implies the existence of a canonical bijections fn identifying Πn with Π m n U which respect the structure of partitions.

Because g is atomic it follows that for all π ∈ Πn that g(π) = g(fn(π)) holds. Thus,

a n = π Π n \ { π n } g ( π ) = π Π m n U \ { π m n } g ( π ) .

We then observe that the sequence ( log ( | Ω n U | ) π Π n \ { π n } g ( π ) ) n is a subsequence of ( log ( | Ω n U | ) π Π n U \ { π n } g ( π ) ) n Hence,

0 = lim n log ( | Ω m n U | ) π Π m n U \ { π m n } g ( π ) = lim n log ( | Ω n | ) π Π n \ { π n } g ( π ) .

Lemma 3. If g is strongly refined and state-inclusive, then there exist 0 < a ≤ b < +∞ such that for all n ∈ ℕ, g(πn) ∈ [a, b].

Proof. For every ω ∈ Ω1 there exists some π ∈ Π1 which contains {ω} with g(π) > 0. π1 refines all these partitions (or π1 is that partition). Hence, g(π1) > 0.

Since state partitions on richer languages are assigned more weight it follows that g(πn) ≥ g(π1) > 0 for all n ∈ ℕ.

Trivially, g ( π n ) π Π n g ( π ). The latter is constant for all n. Hence, the sequence g(πn) is bounded from above by π Π n g ( π ).

We can thus choose a, b as follows a := g(π1) and b : = π Π 1 g ( π ).

Following [4] (p. 3556) we define:

Definition 17 (Spectrum of π). The spectrum of a partition π is defined as the multi-set of sizes of the members of π. We write σ(π) to denote the spectrum of π.

In other words, if π′ can be obtained from π by permuting the states in the members of π, then σ(π) = σ(π′). If g is symmetric, then g(π) only depends on the spectrum of π.

Lemma 4. If g is symmetric, then for all n and all spectra s

P = arg sup P π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log ° P ( F ) .

Proof. First note that π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log P ° ( F ) is a concave function, since −x log x is concave function for x ∈ [0, 1].

If P, P′ ∈ are such that one can be obtained from the other by a permutation of n-states, then for all spectra s

π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log ° P ( F ) = π Π n σ ( π ) = s g ( π ) F π ° P ( F ) ° log P ( F ) .

Hence, for all fixed spectra s P= ⇂n lies inside the contour lines of the function π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log P ° ( F ) in ℙn. It follows that

P = arg sup P π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log ° P ( F ) .

Corollary 6. If g is symmetric and such that

lim n log | Ω n | π Π n π π n g ( π ) = 0 ,
then for all P ∈ PL
lim n π Π n π π n g ( π ) F π ° P ( F ) log ° P ( F ) = 0.

Proof. For a fixed spectrum s we have

sup P E π Π n σ ( π ) = s g ( π ) F π ° P ( F ) log ° P ( F ) = π Π n σ ( π ) = s g ( π ) F π ° P = ( F ) log ° P = ( F ) = π Π n σ ( π ) = s g ( π ) F π | F | | Ω n | log | F | | Ω n | = π Π n σ ( π ) = s g ( π ) | Ω n | F π | F | ( log | F | log | Ω n | ) .

Thus,

| sup P E σ ( π ) = s π Π n g ( π ) F π ° P ( F ) log ° P ( F ) | σ ( π ) = s π Π n g ( π ) | Ω n | F π | F | log | Ω n | = σ ( π ) = s π Π n g ( π ) log | Ω n | .

Summing over all spectra now yields for all P

π π n π Π n g ( π ) F π ° P ( F ) log ° P ( F ) log | Ω n | π π n π Π n g ( π ) .

The claimed result follows.

In particular, if g is regular then the above Corollary applies, by Lemma 2.

Let us consider the application of objective Bayesianism to inductive logic (Section 3.3). It turns out that if g is regular and E is finitely generated then the functions in [ E ] with maximal entropy coincide with the entropy limits (Definition 11), and moreover there is a unique such function, the standard entropy limit:

Theorem 3. Let g be symmetric, atomic, state-inclusive and strongly refined, and E be finitely generated. Then

maxent E = P = { P Ω } .

Note that if g is also inclusive, then g is regular.

Proof. By Lemma 3 there exist 0 < a ≤ b < + such that g(πn) ∈ [a, b] for all n ∈ N and by Corollary 6 the combined weight given to all other partitions on Πn tends to zero, as n increases, fast enough that, for all P ,

lim n π π n π Π n g ( π ) F π ° P ( F ) log ° P ( F ) = 0.

For Q [ E ] \ { P Ω }there exists a minimal n with n ≥ K such that ( P Ω ) n Q n. Since H Ω n is strictly convex on E n and P Ω maximises H Ω n over [ E n ] it holds that H Ω n ( P Ω ) > H Ω n ( Q ). Using Corollary 3 and Corollary 4 we obtain H Ω r ( P Ω ) H Ω r ( Q ) H Ω k ( P Ω ) H Ω k ( Q ) for r ≥ n. Thus,

H Ω r ( P Ω ) H Ω r ( Q ) = g ( π r ) H Ω k ( P Ω ) + π π n π Π n g ( π ) F π ° P Ω ( F ) log ° P Ω ( F ) + g ( π r ) H Ω k ( Q ) + π π n π Π n g ( π ) F π ° Q ( F ) log ° Q ( F ) g ( π r ) ( H Ω r ( P Ω ) H Ω r ( Q ) ) + π π n π Π n g ( π ) F π ° P Ω ( F ) log ° P Ω ( F ) + π π n π Π n g ( π ) F π ° Q ( F ) log ° Q ( F ) .

For large enough r the sums over the ππr become negligible. Since g(πr) is bounded there has to exist some R ∈ ℕ with R ≥ max{K, n} such that for all rR it holds that

g ( π r ) ( H Ω n ( P Ω ) H Ω n ( Q ) ) > π Π r π π r g ( π ) F π ° P Ω ( F ) log ° P Ω ( F ) + ° Q ( F ) log ° Q ( F ) .

Hence, for all large enough r it holds that H g r ( P Ω ) H g r ( Q ) > 0.

Thus, maxent E = { P Ω }.

For the second part of the proof we show that for all r ∈ N and all F ⊆ Ωr it holds that

lim n ° P n ( F ) ° P Ω ( F ) = 0.

Observe that for all n ∈ ℕ

| H Ω n ( P n ) H Ω n ( P Ω ) | = | H Ω n ( P n ) 1 n ( π n ) H g n ( P n ) + 1 n ( π n ) H g n ( P n ) H Ω n ( P Ω ) | π Π n π π n g ( π ) g ( π n ) F π ° P n ( F ) log ° P n ( F ) + | 1 g ( π n ) H g n ( P n ) H Ω n ( P Ω ) | .

The first sum tends to zero as n goes to infinity by our assumptions on g.

For the second sum observe that for all ϵ > 0 there exists an N ∈ ℕ such that for all n ≥ max{N, K} and all P ∈ [ E ] it holds that | 1 g ( π n ) H g n ( P ) H Ω n ( P ) |< ϵ. Hence, ϵ > | sup P E 1 g ( π n ) H g n ( P ) sup P E H Ω n ( P ) | = | 1 g ( π n ) H g n ( P n ) H Ω n ( P Ω ) |. So,

lim n H Ω n ( P n ) H Ω n ( P Ω ) = 0.

For all nK, P n and P Ω equivocate under K up to n (Proposition 9). Hence, it holds that H Ω n ( P n ) H Ω n ( P Ω ) = H Ω K ( P n ) H Ω K ( P Ω ) (Corollary 3). So,

lim n H Ω K ( P n ) H Ω K ( P Ω ) = lim n H Ω n ( P n ) H Ω n ( P Ω ) = 0.

H Ω Kis a strictly concave and continuous function on ℙK. Hence, limn→∞ P n (ω) = P Ω (ω) for all ω ∈ ΩK. So, limn→( P n )K = ( P Ω )K.

For an arbitrary nK and an F ⊆ Ωn we find using that P Ω equivocates beyond K

lim n ° P k ( F ) = lim n v Ω n v F P k ( v ) = lim n v Ω n v F | Ω K | | Ω n | P k ( ω v ) = v Ω n v F | Ω K | | Ω n | lim n P k ( ω v ) = v Ω n v F | Ω K | | Ω n | P Ω ( ω v ) = v Ω n v F P Ω ( ω v ) = ° P Ω ( F ) .

The result for F ⊆ Ωr with r < K follows similarly. □

4.3. Loss and Expected Loss

We shall now analyse the notion of the loss incurred by an agent with belief function B B . In Section Section 5 we shall be interested how degrees of beliefs in quantified sentences affect losses. The following definition, axioms L1–4, Theorem 4 and Proposition 12 apply within our current, quantifier-free framework, i.e., = but they also apply to quantified sentences, i.e., = .

Definition 18 (Independent Sublanguages). Let B B be a fixed belief function such that B(τ) = 1 for any tautology τ, and = 12 where 1 and 2 are disjoint: 1 and 2 contain the same constants, they do not have a relation symbol in common and the union of the relation symbols in 1 and 2 equals {U1,…, Us}, the set of relation symbols in . We say that 1 and 2 are independent sublanguages, written 1B2, if and only if B(ϕ1 ˄ ϕ2) = B(ϕ1) · B(ϕ2) for all ϕ1S1 and ϕ2S2. Let B1(ϕ1) := B(ϕ1), B2 (ϕ2) := B(ϕ2).

By analogy with the line of argument of Section 2, we shall suppose that a default loss function L : S × B → (− ∞, ∞] satisfies the following requirements. Here L(φ, B) is to be interpreted as the loss specific to φ turning out to be true, when one adopts belief function B:

  • L1. L(φ, B) = 0, if B(φ) = 1.

  • L2. L(φ, B) strictly increases as B(φ) decreases from 1 towards 0.

  • L3. L(φ, B) only depends on B(φ).

  • L4. Losses are additive when the language is composed of independent sublanguages: if = 12 for 1B2, then L(ϕ1 ˄ ϕ2, B) = L1(ϕ1, B1) + L2(ϕ2, B2), where L1, L2 are loss functions defined on 1, 2 respectively.

Theorem 4. If a loss function L on S × B satisfies L1–4, then L(φ, B) = −k log B(φ), where the constant k > 0 does not depend on the language .

Proof. The proof is exactly analogous to that of Landes and Williamson [4] (Theorem 4), which gives the result in the case in which is a finite propositional language. □

Since multiplication by a constant is equivalent to change of base, we can take log to be the natural logarithm. Since we will be interested in the belief functions that minimise loss, rather than in the absolute value of any particular losses, we can take k = 1 without loss of generality. Theorem 4 thus allows us to focus on the logarithmic loss function:

L log ( φ , B ) : = log B ( φ ) .

Next we define our notion of expected loss. The expectation is taken with respect to a probability function P, and we consider the expectation taken over each partition of propositions. Each partition is weighted by the given weighting function g. Attention is restricted to inclusive weighting functions, so that each belief is evaluated; if the weighting function were not inclusive then degrees of belief in some propositions would fail to contribute to the expectation.

Definition 19 (n-representation). A sentence θSn n-represents a proposition F ⊆ Ωn, if and only if F = {ω ∈ Ωn: ωθ}. Let PΩn be a set of pairwise distinct propositions. We say that Θ ⊆ Sn is a set of n-representatives of , if and only if each sentence θ ∈ Θ n-represents a unique proposition in and each proposition in is n-represented by a unique sentence θ ∈ Θ.

A set ρ of n-representatives of PΩn will be called an n-representation. We shall use ρF to denote the sentence in ρ which n-represents F. We denote by ϱn the set of all n-representations.

Note that if belief function B respects logical equivalence, then for all n ∈ ℕ, all F ⊆ Ωn and all l-representations ρ with ln it holds that B(ρF ) = °B(F ). Otherwise there exist an n ∈ ℕ a proposition F ⊆ Ωn and n-representations ρ, ρ′, such that B(ρF) ≠ B(ρF).

Definition 20 (n-score). Given a loss function L, an inclusive weighting function g: Π → ℝ0, n ∈ ℕ, and an n-representation ρ ∈ ϱn we define the representation-relative n-score S g , ρ L , n: ℙ × B → [−∞, ∞] by:

S g , ρ L , n ( P , B ) : = π n g ( π ) F π P ( ρ F ) L ( ρ F , B ) .

Define the (representation-independent) n-score S g L , n : × B [ , ] by

S g L , n ( P , B ) : = sup ρ ϱ n S g , ρ L , n ( P , B ) .

(As a technical convenience, we shall consider loss functions and n-scores to be defined more generally, taking arguments P, B: S → [0, 1], although we will primarily be concerned with the case above where P is a probability function and B is a belief function.)

In the light of Theorem 4, we will focus exclusively on the logarithmic loss function in this paper:

S g , ρ L , n ( P , B ) : = π n g ( π ) F π P ( ρ F ) log B ( ρ F ) , S g n ( P , B ) : = sup ρ ϱ n S g , ρ n ( P , B ) .

For P ∈ ℙ we have that P (ρF) = P (ρF ) for all ρ, ρ′ ∈ ϱn, since P respects logical equivalence. Hence for P, Q ∈ ℙ we have

S g n ( P , Q ) = sup ρ ϱ n S g , ρ n ( P , Q ) = π n g ( π ) F π ° P ( F ) log ° Q ( F ) = S g ( ° P , ° Q ) ,
where Sg is the propositional scoring rule introduced in Section 2, in the case Ω = Ωn. There are also connections with g-entropy H g n, defined in (5), and the propositional notion of entropy Hg, defined in Section 2:
S g n ( P , P ) = H g n ( P ) = H g ( ° P ) .

If g = gΩ, we call the resulting function the standard logarithmic n-score:

S Ω n ( P , B ) = sup ρ ϱ n ω Ω n P ( ρ { ω } ) log B ( ρ { ω } ) = ω Ω n P ( ω ) P { ω } log B ( ω ) ,
where the latter equality applies if B respects logical equivalence.

The question arises as to how S g n, the notion of expected loss defined on a finite sublanguage n, relates to loss on , the language as a whole. One particularly natural suggestion is that B has a better overall loss profile than B′ if the latter’s n-scores eventually dominate those of B or if the worst-case n-score incurred by B′ is eventually greater than that of B:

  • If B has lower worst-case expected loss than B′ for all sufficiently large n, then B has a better loss profile than B′.

  • If for all P ∈ ℙ, B has an expected loss which is less than or equal than that of B′, and if for some P ∈ [ E ], B has strictly lower expected loss than B′ for sufficiently large n, then B has a better loss profile than B′.

We make this precise as follows:

Definition 21 (Better loss profile). B has a better loss profile than Bif and only if:

  • There exists some N ∈ ℕ such that for all nN, sup P E S g n (P, B) < sup P E S g n (P, B′), or

  • S g n (P, B) ≤ S g n (P, B′) < +∞ for all P ∈ ℙ and all n ∈ ℕ, and there exist at least one function Q ∈ [ E ] and some NQ ∈ ℕ such that S g n (Q, B) < S g n (Q, B′) for all nNQ.

We write BBto denote that B has better loss profile than B′. We will be interested in those belief functions that have the best loss profile, i.e., the minimal elements of ≺, and define:

minloss B : = { B B : t h e r e i s n o B B s u c h t h a t B B } .

Proposition 10 (Properties of ≺). The binary relationis asymmetric, partial, irreflexive and transitive.

Proof. Note that if for all P ∈ ℙ and all n ∈ ℕ it holds that S g n (P, B) ≤ S g n (P, B′), then sup P E S g n (P, B) ≤ sup P E S g n (P, B′) follows trivially. Hence, conditions 1 and 2 of Definition 21 are consistent, in the sense that the induced relation ≺ is asymmetric.

There exist different B, B′ ∈ B which are not open-minded on 1 and thus have infinite loss on n for all n ≥ 1 (cf., Proposition 13). For example, if B(τ′) = B′(τ′) = 0 where τ′ is a tautology in S1, then B and B′ have infinite expected loss for all n ∈ ℕ and all P ∈ ℙ. Thus, ≺ is only partial.

That ≺ is irreflexive follows directly from the definition.

Now consider B1, B2, B3 B such that B1B2B3. We will consider cases to prove that B1B3.

If there exist N1,2, N2,3 such that

sup P E S g n ( P , B 1 ) < sup P E S g n ( P , B 2 ) for all n N 1 , 2 sup P E S g n ( P , B 2 ) < sup P E S g n ( P , B 3 ) for all n N 2 , 3 ,
then
sup P E S g n ( P , B 1 ) < sup P E S g n ( P , B 3 ) for all n max { N 1 , 2 , N 2 , 3 } .

Thus, B1B3.

Now assume that there exists a number N1,2 such that sup P E S g n (P, B1) < sup P E S g n (P, B2) for all nN1,2 and assume that the pair (B2, B3) satisfies the second condition of Definition 21. Then, sup P E S g n (P, B1) < sup P E S g n (P, B3) for all n ≥ N1,2. Thus, B1B3.

The same argument shows that if the pair (B1, B2) satisfies the second condition of Definition 21 and the pair (B2, B3) satisfies the first condition, then B1B3.

Finally, suppose that the pairs (B1, B2) and (B2, B3) both satisfy the second condition of Definition 21. Then for all P ∈ ℙ and all n ∈ ℕ it holds that S g n (P, B1) ≤ S g n (P, B3). Furthermore, there has to exist a Q ∈ [ E ] and an NQ ∈ ℕ such that for all nNQ it holds that S g n (Q, B1) < S g n (Q, B2). But then S g n (Q, B1) < S g n (Q, B3) for all nNQ. Thus, B1B3.

Since ≺ is irreflexive and transitive it cannot contain a cycle.

One main theme of the rest of this paper will be the search for belief functions with the best loss profile. Since the loss function L we are interested in is log B(φ), and these values monotonically decrease as B(φ) increases from 0 to 1, it follows that, ceteris paribus, the belief functions with better loss profiles assign greater degrees of belief to sentences.

It might appear then that the normalisation (see Definition 1) would directly imply that no B B \ could have the best loss profile. Intuitively, this might be thought to hold since the belief functions B B \ assign smaller degrees of belief than the probability functions P ∈ ℙ. However, Equation (4) shows that some B B \ assign greater degrees of belief than a probability function P ∈ ℙ to certain sentences in the following sense: there exists a set of sentences Φ ⊂ S such that for all P ∈ ℙ it holds that ∑φ∈Φ B(φ)> ∑φ∈Φ P(φ).

While Condition 1 of Definition 21 deals with worst-case expected loss, Condition 2 deals with dominance of expected loss. Now, dominance is often used on its own to justify the Probability norm; see, e.g., de Finetti [12] (Chapter 3) and more recently by Joyce [13,14]. So, one might think that Condition 2 is strong enough on its own to imply the probability norm. However this is not the case:

Proposition 11. For E = ℙ there exist a weighting function g and a non-probabilistic belief function B B \ such that no probability function P ∈ ℙ has a loss which dominates that of B in the sense of Condition 2.

Proof. It suffices to show that there exist a weighting g and a B B \ such that for all Q ∈ ℙ there exist a P ∈ ℙ and infinitely many n ∈ ℕ such that S g n (P, B) < S g n (P, Q).

Consider a B B \ from Proposition 4 and consider an arbitrary Q ∈ ℙ. Then there has to exist an νO4 such that Q(ν) ≠ B(ν). Next note that Q(¬ν) ≠ B(¬ν) follows. Then, − 1 100 log 1 100 99 100log 99 100 < 1 100 log Q(ν) − 99 100 log Q(ν) since the logarithmic scoring rule is strictly proper.

So, for P ∈ ℙ with P (ν) = 1 100 and g({ν, ¬ν}) > 0 it holds that

g ( { v , ¬ v } ) ( P ( v ) log ( B ( v ) ) P ( ¬ v ) log ( B ( ¬ v ) ) ) < g ( { v , ¬ v } ) ( P ( v ) log ( Q ( v ) ) P ( ¬ v ) log ( Q ( ¬ v ) ) ) .

Next let ν1 := ¬Ut1t ˄ ¬Ut2t, ν2 := Ut1t ˄ ¬Ut2t, ν3 := ¬Ut1t ˄ Ut2t, and ν4 := Ut1t ˄ Ut2t. For n ≥ 4 let F n i ⊂ Ωn be the unique proposition which is equivalent to νi, F n i = {ω ∈ Ωn : ωνi}.

Now define gn for n ≥ 4 as follows:

g n ( { F n i , F ¯ n i } ) : = 1 , if n i mod 4 g n ( π ) : = 0 , else .

So, for this B and this g we have found that for all Q ∈ ℙ there exist a P ∈ ℙ and infinitely many n ∈ ℕ (every fourth n) such that

S g n ( P , B ) = 1 100 log 1 100 99 100 log 99 100 < 1 100 log log Q ( v ) 99 100 log Q ( v ) = S g n ( P , Q ) .

In general, determining the functions comprising minloss B is a challenging problem, which we shall tackle in due course. However, there is one general property we can prove directly: assigning zero degree of belief to an epistemically possible sentence is irrational, in the sense that it exposes one to avoidable losses. To see this, first note that:

Proposition 12. For any E , there exists a probability function P E which is open-minded.

Proof. The set of consistent sentences in is countable. The set

ϕ : = { φ S : there exists a P E with P ( ψ ) > 0 }
is a subset of the set of consistent sentences and is thus countable, too. We can hence enumerate Φ by some countable index set, I, say. Note that |I| ≥ 2 since P (τ) = 1 for all P ∈ ℙ and all tautologies τ.

For all φ ∈ Φ choose some Pφ E such that Pφ(φ) > 0. Next, for all iI pick an αi ∈ (0, 1) ⊂ ℝ such that ∑iI αi = 1. Since |I| ≥ 2 such αi exist.

We shall now define an open-minded function P E by putting

P = i I α i P φ i .

Note that P is in E since it is a convex combination of probability functions in the convex set E .

We next show that P is indeed open-minded. Let φ ∈ Φ be at the j-th position in the enumeration I of Φ. We now obtain P (φ) ≥ αjPφ(φ) > 0. So, P (φ) > 0 for all φ ∈ Φ. □

Proposition 13. B ∈ minloss B implies that B is open-minded.

Proof. If B is not open-minded, then there exists a k ∈ ℕ and a φSk such that B(φ) = 0 and there exists a P ∈ [ E ] such that P (φ) > 0. Since φSr for all rk, it holds for all rk that sup P E S g r(P,B)=+.

By Proposition 12 there exists an open-minded Q ∈ [ E ]. Thus, sup P E S g r (P, Q) < for all r. □

Note that the above proposition does not imply that minloss B is non-empty.

4.4. Minimax Theorems

In this section we shall relate the belief functions that have best loss profile to the probability functions that have maximal g-entropy.

It turns out that an improvement in loss profile is not necessarily accompanied by an increase in entropy (Appendix A). Nevertheless, we shall see that given appropriate conditions on g, there is a close relationship between the belief function that has the best loss profile and the probability function which has maximum entropy. On a finite sublanguage, the unique belief function with minimum worst-case expected loss is the probability function with maximum entropy (Section 4.4.1). Moreover, on the language as whole, if the evidence set E is finitely generated then the unique belief function with the best lost profile (i.e., the belief function that is minimal with respect to ≺) is the probability function in EL with maximal entropy (Section 4.4.2). However, this is not necessarily so when E is not finitely generated (Section 6.1).

4.4.1. Minimax on Finite Sublanguages

Lemma 5. For all n ∈ ℕ, all P ∈ ℙ and all B B respecting logical equivalence on n it holds that S g n (P, B) = S g , ρ n (P, B) for all ρ ∈ ϱn.

Proof. Simply note that S g , ρ n (P, B) = π Π n g ( π ) F π ° P ( F )log B(ρF) does not depend on ρ ∈ ϱn. □

Lemma 6. For all inclusive g, for all n ∈ ℕ and each belief function

B arg inf B B sup P E sup ρ ϱ n S g , ρ n ( P , B ) ,

B respects logical equivalence on n. Furthermore, for all such B there exists a partition π ∈ Πn such thatF∈π B(ρF)=1 for all ρ ∈ϱn.

Proof. Firstly, B cannot assign all φSn degree of belief 0, since this would an incur an infinite worst-case expected loss; and as we saw in Proposition 13, there are functions which have finite worst-case expected loss.

Assume for contradiction that a B B does not respect logical equivalence on n. Then define a function Binf : S → [0, 1] which respects logical equivalence on n by

B inf ( φ ) : = { inf ψ S n φ ψ B ( ψ ) , if φ S n B ( φ ) otherwise .

The next step in this proof is to show that

sup P E S g n ( P , B inf ) = sup P E S g n ( P , B ) .

In the second part of the proof we shall see that there is a belief function which has a strictly better worst case expected loss than Binf. This then contradicts the assumption that the belief function B has best worst case expected loss, i.e., B ∈ arg inf B B sup P E sup ρ ϱ n S g , ρ n ( P , B ).

Since B does not respect logical equivalence on n, there are logical equivalent φ, ψ S n such that B(φ) ≠ B(ψ). Thus, Binf(φ) < max{B(φ), B(ψ)} and hence Binf(φ) + Binf(¬φ) < max{B(φ), B(ψ)} + B(¬φ) 1. The last inequality holds since B B . So, B inf n n.

Recall that we extended the definition of scoring rules allowing the belief function to be any function defined on S taking values in [0, 1]. We shall be careful not to appeal to results that assume a normalised belief function in this situation.

We now find for P ∈

S g n ( P , B ) = sup ρ ϱ n S g , ρ n n ( P , B ) = sup ρ ϱ n π Π n g ( π ) F π P ( ρ F ) log B ( ρ F ) = π Π n g ( π ) F π ° P ( F ) inf ρ ϱ n log B ( ρ F ) = π Π n g ( π ) F π P ( ρ F ) log B inf ( ρ F ) for all ρ ϱ n = S g , ρ n ( P , B inf ) f o r a l l ρ ϱ n = sup ρ ϱ n S g , ρ n ( P , B inf ) = S g n ( P , B inf ) .

Hence sup p E S g n ( P , B ) = sup p E S g n ( P , B inf ), as claimed above.

Let us now consider cases to derive a contradiciton.

Case i There exists a π ∈ n such that ∑F∈π Binf(ρF)=1.

Since Binf respects logical equivalence this fact is independent of the particular ρ ∈ ϱn. Recall that we use the notation °Binf = °nBinf to denote the function that Binf induces over propositions in Ωn, defined by °Binf(F) = Binf(∨F).

With this convention we then note that °Binf B \ . Let E be the set of probability functions on Ωn which are in the canonical one-to-one correspondence with the probability functions on E n, i.e., E : = { ° P : P E }. We thus find, using Theorem 2 to obtain the strict inequality, that:

sup P E S g n ( P , B ) = sup P E S g n ( P , B inf ) = sup P E S g ( ° P , ° B inf ) = sup P E π n g ( π ) F π ° P ( F ) log ° B inf ( F ) > sup P E π n g ( π ) F π ° P ( F ) log ° P n ( F ) = sup P E S g ( ° P , ° P n ) = sup P E S g n ( P , P n ) .

Case ii For all π ∈ Πn and all ρ ∈ ϱn it holds that ∑Fπ Binf(ρF) < 1.

Since Binf respects logical equivalence on n we may consider the induced function °Binf defined over propositions of Ωn. Since Πn is finite, so is the set {∑Fπ °Binf(F)}. Thus, supπ∈ΠnFπ °Binf (F) = 1 − ϵ for some ϵ ∈ (0, 1].

Let us now define a function B : S [ 0 , 1 ]. Denote by μ ∈ (0, 1] the unique number such that for all π ∈ Πn and all ρ ∈ ϱn it holds that ∑Fπ μ+Binf (ρF) = ∑Fπ μ + °Binf(F) ≤ 1 and for at least one π ∈ Πn and one ρϱn we have ∑Fπ μ + Binf (ρF) = ∑Fπ μ + °Binf(F) = 1

Put B′(φ) := μ + Binf(φ) > Binf(φ) for all φ S n and B′(φ) := 0 otherwise. Observe that B′ ∈ B and that B′(¬τ) ≥ μ > 0 for the tautologies τ of n. But then °B′ ∈ B \ . Then for all π ∈ Πn and all P [ E n ] we have −∑ Fπ P (ρF) log B′ (ρF) < −∑ Fπ P(ρF) log Binf (ρF). We now apply Theorem 2 to find the strict inequality below

sup P E S g n ( P , B ) = sup P E S g n ( P , B inf ) sup P E S g n ( P , B ) = sup P E S g ( ° P , ° B ) > S g ( ° P , ° P n ) = sup P E S g ( ° P , ° P n ) = sup P E S g n ( P , P n ) .

So, in Case i and in Case ii we have found that P n has strictly better worst-case expected loss than B contradicting B ∈ arg inf B B sup P E sup ρ ϱ n S g , ρ n ( P , B ).

Finally, we need to show that for all such belief functions B there exists a π ∈ Πn such that ∑Fπ °B(F) = 1. Suppose for contradiction that is not the case. Note that B respects logical equivalence on n. Hence, we can define a belief function B′ ∈ B by adding a strictly positive number μ as in Case ii. B′ has a worst-case expected loss that is less or equal to the worst-case expected loss of B. Again, we find that °B′ ∈ B \ and hence B′ does not have minimal worst-case expected loss. Clearly then, B cannot have minimal worst-case expected loss. Contradiction. □

Theorem 5 (Finite sublanguage minimax). For all inclusive g, all n , all C ∈ arg inf B B sup P E S g n ( P , B ) and all Q ∈ arg sup P E H g n ( P ) it holds that

C n = Q n = P n .

Proof. From Lemma 6 we know that for every C ∈ arg inf B B sup P E S g n ( P , B ) it holds that Cn respects logical equivalence on n and that °C := °nC B (since C is normalised). Every probability function in P respects logical equivalence (Proposition 3).

Thus, S g n ( P , C ) and S g n ( P , P ) collapse to S g ( ° P , ° C ), respectively S g ( ° P , ° P ), the logarithmic scoring rule for propositions (1).

However, for the propositional case we know from Theorem 2 that the unique g-entropy maximiser on is the unique worst-case expected loss minimiser on B, P g = ° P n . arg inf B B sup P E S g ( P , B ) = arg sup P E H g ( P ) = { P g }.

Thus, for all F ⊆ Ωn it holds that C ( ρ F ) = P g ( F ) for all ρ ∈ ϱn. Hence, C n = Q n = P n . □

4.4.2. Minimax for Inductive Logic

We shall now consider the language as a whole. We shall assume in this section that EL is finitely generated by constraints on K. As noted in Section 3.3, this is the scenario that is of key relevance to inductive logic. Our goal is to justify the norms of objective Bayesianism by showing that the belief functions with the best loss profile are the probability functions in E with maximum entropy.

First we shall see that this is the case if g is language invariant:

Proposition 14 (Language invariance minimax). If g is inclusive and language invariant and if E is finitely generated, then

min loss B = max ent E = = { P } .

Proof. Note that we have = { P } from Proposition 8, in particular P n = P n for all n ≥ K.

Since g is inclusive, H g n is strictly concave on n (Lemma 1). Hence, P n is uniquely determined. By language invariance we obtain P ∈ arg sup P E S g n ( P , P ) for all n ≥ K. Thus, P ∈ maxent E .

For Q [ E ] \ {P} there has to exist some N such than QnPn for all n ≥ N. Since H g n is a strictly concave function on n and since P maximises H g n for all n ≥ K it follows that H g n ( P ) > H g n ( Q ) for all n ≥ max{K, N}. Thus, Q ∉ maxent E .

From Theorem 5 we have that P n ∈ arg inf B B sup P E S g n ( P , B ) for all n ≥ K. Since E is finitely generated and g is language invariant we have that P ∈ arg inf B B sup P E S g n ( P , B ) for all n ≥ K. Thus, P ∈ minloss B .

For every C B \{P} there has to exist an N such that for all n ≥ N it holds that C n P n For all n ≥ max{K, N} we now apply Theorem 5 to obtain sup P E S g n ( P , C ) > sup P E S g n ( P , P ). Hence, C ∉ minloss B . □

This result is not entirely satisfactory, because we cannot say anything yet about whether such weighting functions exist. Indeed, it was conjectured in Landes and Williamson [4] (p. 3564) that no inclusive, symmetric and refined weighting function g is language invariant. This conjecture remains open.

Our next result says that, for the standard weighting g Ω, the probability function with the best loss profile is the standard entropy maximiser:

Proposition 15 (Standard entropy minimax). If E is finitely generated and g = g Ω, then

min loss = max ent E = { P Ω } .

Proof. { P Ω } = max ent E follows directly, since g Ω is language-invariant and state-inclusive, Proposition 8.

It is well-known that

arg inf Q n sup P E n S g Ω ( P , Q ) = arg sup P E n S g Ω ( P , P ) = { P g Ω } ,
see for instance [15]. Hence,
min loss = max ent E = { P Ω } .

Because it only identifies probability functions with the best loss profile, rather than normalised belief functions with the best loss profile, Proposition 15 provides a justification for only two norms of objective Bayesianism, the Calibration Norm and the Equivocation Norm, under the supposition that g = g Ω. This is a useful result if there is some independent reason—such as the Dutch book argument—for taking belief functions to be probability functions. But our goal in this paper is to investigate the extent to which the notion of loss profile developed above can be used to justify all three norms at once.

We know that there are weighting functions that are regular, i.e., which are atomic, inclusive, symmetric and strongly refined. The plan of the rest this section is to prove the following analogous minimax theorem for regular weighting functions. This says that, for any regular weighting function, the belief function with the best loss profile is the probability function in E which has maximal standard entropy. This theorem thus justifies all three norms at once.

Theorem 6 (Regularity minimax). If g is regular and E is finitely generated, then

minloss B = max ent E = = { P Ω } .

In order to prove this theorem we give a number of lemmata. We shall state these lemmata under more minimal conditions on g. The reader not interested in the details might always replace the stated conditions on g by: “ g is regular”.

To begin with, we shall consider only belief functions B which respect logical equivalence. (Later we shall relax this restriction.) Hence, S g , ρ n ( P , B ) does not depend on ρ and we can ignore the particular representation ρ. This will allow us to focus on propositions.

Lemma 7. If n ≥ K, Q and if sup P E S Ω n ( P , Q ) is finite, then it holds that

sup P E S Ω n + 1 ( P , Q ) sup P E S Ω n ( P , Q ) + log | Ω n + 1 | | Ω n | .

Proof. Let P ∈ arg sup P E S Ω n ( P , Q ). Then define P″ on Ωn+1 by P ( v ) : = P ( ω v ) | Ω n | | Ω n + 1 | for all ν ∈ Ωn+1 and ων ∈ Ωn with v | = ω v. Now extend P″ arbitrarily to a function in [ E ]. Note that P n + 1 [ E n + 1 ] since E is finitely generated and n ≥ K.

Since log(x) is a strictly convex function on (0, 1] and since Q ( w ) = ν Ω n + 1 ν | = ω Q ( v ) for all ω ∈ Ωn it holds for all fixed ω ∈ Ωn that ν Ω n + 1 ν | = ω log Q ( v ) | Ω n + 1 | | Ω n | log ( | Ω n | | Ω n + 1 | Q ( ω ) ). We now find

sup P E S Ω n + 1 ( P , Q ) S Ω n + 1 ( P , Q ) = ν Ω n + 1 P ( ν ) log Q ( ν ) = ν Ω n + 1 P ( ω ν ) | Ω n | Ω n + 1 log Q ( ν ) w Ω n P ( ω ) · log | Ω n | · Q ( ω ) | Ω n + 1 | = log | Ω n | | Ω n + 1 | ω Ω n P ( ω ) · log Q ( ω ) = log | Ω n + 1 | | Ω n | + sup P E S Ω n ( P , Q ) .

Definition 22 (γ-weighting). To simplify notation we define for n ∈ N and F ⊆ Ωn

γ n ( F ) : = π Π n F π g n ( π ) .

If g is symmetric, then γn(F ) only depends on |F | := |{ω ∈ Ωn : ω ∈ F }| and we write γn(|F |).

In particular, since the belief function B is assumed to respect logical equivalence, we can write

S g n ( P , B ) = sup ρ ϱ n F Ω n γ n ( F ) P ( ρ F ) log B ( ρ F ) = F Ω n γ n ( F ) P ( ρ F ) log B ( ρ F ) .

Furthermore, we can easily characterise the set of inclusive g. g is inclusive, if and only if for all n and all F ⊆ Ωn γn(F) > 0.

Lemma 8. Let g be inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πn) ∈ [a, b] for all n and such that

lim n log | Ω n | π n \ { π n } g ( π ) = 0.

Then

R e s t n : = sup P E S g n ( P , P Ω ) g ( π n ) S Ω n ( P Ω , P Ω ) 0 a s n .

Proof. Let us thus first note that

S g n ( P , P Ω ) g ( π n ) S Ω n ( P , P Ω ) = π n \ { π n } g ( π ) F π ° P ( F ) log ° P Ω ( F ) .

Recall that P Ω is open-minded (Proposition 5). Thus, P [ E ], F ⊆ Ωn and ° P ( F ) > 0 imply ° P Ω ( F ) > 0. Let

m : = min { P Ω ( ω ) : ω Ω K & P Ω ( ω ) > 0 } ϵ ( 0 , 1 ] .

Then, for F ⊆ Ωn such that ° P Ω ( F ) > 0 it holds that

° P Ω ( F ) min { P Ω ( ν ) : ν Ω n & P Ω ( ν ) > 0 } = m · | Ω K | | Ω n | m | Ω n | ,
since P Ω equivocates beyond K.

Hence, P [ E ], F ⊆ Ωn and ° P ( F ) > 0 imply that ° P Ω ( F ) m | Ω n |. Since F π ° P ( F ) = 1 we now find

0 sup P E S g n ( P , P Ω ) g ( π n ) S Ω n ( P Ω , P Ω ) sup P E g ( π n ) S Ω n ( P , P Ω ) + sup P E π n \ { π n } g ( π ) F π ° P ( F ) log ° P Ω ( F ) sup P E g ( π n ) S Ω n ( P , P Ω ) sup P E π n \ { π n } g ( π ) F π ° P ( F ) log ° m | Ω n | = log m | Ω n | π n \ { π n } g ( π ) = ( log ( | Ω n | ) log ( m ) ) · π n \ { π n } g ( π )

To complete the proof, it suffices to note that this sums is eventually positive and converges in n to zero by our assumption on g and the fact that m is constant.

Proposition 16. Let g be inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πn) ∈ [a, b] for all n and such that

lim n log | Ω n | π n \ { π n } g ( π ) = 0.

Then for all B B \ { P Ω } that respect logical equivalence, P Ω B.

Proof. We shall proceed by considering cases.\

Case 1 B \ { P Ω }.

There exists an N ≥ K such that for all n ≥ N it holds that B n ( P Ω ) n. It is well-known that for all P

arg inf Q ω Ω P ( ω ) log Q ( ω ) = { P } .

That is, the usual logarithmic scoring rule, when applied to probability functions P and Q , is strictly proper. Savage [16] showed that this scoring rule is not only strictly proper but also unique under the further assumption of locality, which is requirement L3 in our framework. Thus, S Ω n ( P Ω , B ) S Ω n ( P Ω , P Ω ) > 0.

We then find by the first part of Corollary 3 and Lemma 7 for all n ≥ N that

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) = sup P E S g n ( P , B ) g ( π n ) S Ω n ( P Ω , P Ω ) R e s t n g ( π n ) sup P E S Ω n ( P , B ) g ( π n ) S Ω n ( P Ω , P Ω ) R e s t n = g ( π n ) sup P E S Ω n ( P , B ) g ( π n ) ( S Ω N ( P Ω , P Ω ) + log | Ω n | | Ω n | ) R e s t n g ( π n ) ( sup P E S Ω N ( P , B ) + log | Ω n | | Ω n | ) g ( π n ) ( S Ω n ( P Ω , P Ω ) + log | Ω n | | Ω n | ) R e s t n g ( π n ) ( S Ω n ( P Ω , B ) S Ω N ( P Ω , P Ω ) ) R e s t n

Recall from Lemma 8 that Restn converges to zero. Furthermore, the sequence ( g ( π n ) ) n is bounded in [a, b] with a > 0. Thus, for all large enough n ∈ N it holds that

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) g ( π n ) ( S Ω N ( P Ω , B ) ( S Ω N ( P Ω , P Ω ) ) R e s t n > 0.

Case 2 B B \ .

Case 2A There exists a P B such that for all n and all F ⊆ Ωn it holds that ° B ( F ) ° P B ( F ), i.e., PB dominates B.

Case 2Ai P B = P Ω and no other P is such that ° B ( F ) ° P ( F ) for all n and all F ⊆ Ωn. Then for all P and all propositions F it holds that

γ n ( F ) ° P ( F ) ( log ° B ( F ) + log ° P Ω ( F ) ) 0.

Thus, for all P and n it holds that S g n ( P , B ) S g n ( P , P Ω ).

Since B P Ω there exists some N and a ∅ ⊂ F ⊆ ΩN such that ° B ( F ) < ° P Ω ( F ). For n > N let ∅ ⊂ Fn ⊆ Ωn be such that Fn = {ω ∈ Ωn : ω ∈ F }. Hence, for all n > N it holds that log ° B ( F n ) + log ° P Ω ( F n ) > 0. Thus, ° P Ω ( F n ) γ n ( F n ) ( log ° B ( F n ) + log ° P Ω ( F n ) ) > 0. Since g is inclusive (γn(F ) > 0 for all n and all F ⊆ Ωn) it holds that S g n ( P Ω , B ) > S g n ( P Ω , P Ω ) for all n ≥ N.

Applying the second condition of Definition 21 yields P Ω B.

Case 2Aii There exists a P B dominating B such that P B P Ω .

Then for all n ≥ K and all P E it holds that S g n ( P , B ) S g n ( P , P B ) 0. For all large enough n it holds by Case 1 that sup P E S g n ( P , P B ) sup P E S g n ( P , P Ω ) > 0. Thus, we find for all large enough n

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) sup P E S g n ( P , P B ) sup P E S g n ( P , P Ω ) > 0.

Cas 2B There does not exist a P B such that for all n and all F ⊆ Ωn it holds that ° B ( F ) ° P B ( F ).

For example, the belief functions constructed in Proposition 4 are of this form, i.e., not dominated by a probability function.

Let us assume for contradiction that there exists an infinite set J : = { j 1 , j 2 , } such that lim i ω Ω j i B ( ω ) = 1. Now define a function Q on S by requiring that Q respects logical equivalence and that

° Q ( F ) : = lim i ω Ω j i ω F B ( ω ) .

Next we show Q and ° B ( F ) ° Q ( F ) for all F which will allow us to derive the required contradiction.

First note that for all n it holds that

ν Ω n Q ( ν ) = lim i ν Ω n ω Ω j i ω | = ν B ( ω ) = lim i ω Ω j i ω | = ν B ( ω ) = 1.

Furthermore, we have for all n and all F ⊆ Ωn

° Q ( F ) = lim i ω Ω j i ω F B ( ω ) = lim i ν Ω n ν F ω Ω j i ω | = ν B ( ω ) = ν Ω n ν F lim i ω Ω j i ω | = ν B ( ω )

So, Q .

Now assume that there exists a proposition F ⊆ Ωn such that ° B ( F ) > ° Q ( F ). Since Q it holds that ° Q ( F ) + ° Q ( F ¯ ) = 1. Note that

{ ω Ω j i : ω F } ω Ω j i ω F { ω }
is a partition in Π j i. Since we assumed that B respects logical equivalence it holds that B ( ω Ω i i : ω F ω ). Thus,
° B ( F ) + ω Ω j i ω F B ( ω ) 1
has to hold for all large i. We now obtain the required contradiction as follows:
1 lim i ( ° B ( F ) + ω Ω j i ω F ¯ B ( ω ) ) = ° B ( F ) + ° Q ( F ¯ ) > ° Q ( F ) + ° Q ( F ¯ ) = 1

Thus, there has to exist an α > 0 and an N with N ≥ K such that for all n ≥ N it holds that ω Ω n B ( ω ) 1 α. We have for n ≥ N that

sup P E S g n ( P , P B ) sup P E S g n ( P , P Ω ) = sup P E S g n ( P , B ) g ( π n ) S g n ( P Ω , P Ω ) R e s t n g ( π n ) ( sup P E S Ω n ( P , B ) S Ω n ( P Ω , P Ω ) ) R e s t n g ( π n ) ( S Ω n ( P Ω , B ) S Ω n ( P Ω , P Ω ) ) R e s t n

To complete the proof we will now show that there exists some β > 0, which depends on E and g but does not depend on the particular n ≥ N, such that S Ω n ( P Ω , B ) > S Ω n ( P Ω , P Ω ) > β. Since g(πn) is bounded, we then obtain that sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) > 0 for all large enough n.

We need to show that for all large enough n,

ω Ω n P Ω ( ω ) log f ( ω ) S Ω n ( P Ω , P Ω ) β > 0
for all functions f : Ωn [0, 1] such that ω Ω n f ( ω ) 1 α.

Suppose f arg min f ω Ω n P Ω ( ω ) log f ( ω ). If P Ω ( ω ) > 0 and f′(ω) = 0, then ω Ω n P Ω ( ω ) log f ( ω ) = . Hence, the minimum cannot obtain for such an f′. On the other hand, if f′(ω) > 0 and P Ω ( ω ) = 0, then there has to exist a μ ∈ Ωn \ {ω} such that P Ω ( μ ) > 0. Then define a function f″ such that f″ (ω) := 0, f″ (μ) := f′ (μ) + f′ (ω) > f′ (μ) and f″ (λ) := f′ (λ) for all λ ∈ Ωn \ {ω, μ}. Then ν Ω n P Ω ( ν ) log f ( ν ) > ν Ω n P Ω ( ν ) log f ( ν ). Again, the minimum cannot obtain for such an f′.

We may thus assume in the following that any f′ minimising the above sum satisfies: P Ω ( ω ) > 0, if and only if f′(ω) > 0. In particular, the function f′(ω) = 0 for all ω ∈ Ωn cannot be optimal.

Let a f : = ω Ω n f ( ω ) ϵ ( 0 , 1 α ]. Then

ω Ω n P Ω ( ω ) log f ( ω ) = ω Ω n P Ω ( ω ) ( log f ( ω ) a f + log a f ) = log ( a f ) ω Ω n P Ω ( ω ) log f ( ω ) a f .

By definition, ω Ω n f ( ω ) a f = 1. The sum in the above equation is thus standard logarithmic scoring rule on B n, S Ω n ( P , f a f ). For fixed P ∈ ℙ the minimum under this scoring rule obtains for a function which agrees with P on the states ω ∈ Ωn.

Thus, for fixed af the function f minimising ω Ω n P Ω ( ω ) log f ( ω ) is the af multiple of P Ω . In order to minimize ω Ω n P Ω ( ω ) log f ( ω ), −log af has to be minimal. This minimum obtains for af = 1 − α. We hence find the value of the minimum as

f : Ω n [ 0.1 ] ω Ω n f ( ν ) 1 α inf ω Ω n P Ω ( ω ) log f ( ω ) = log ( 1 α ) S Ω n ( P Ω , P Ω ) .

β may thus be chosen as β = log(1 − α) > 0. □

We now drop the assumption that belief functions respect logical equivalence.

Proposition 17. If g is inclusive and such that there exist 0 < a ≤ b < +∞ such that g(πn) ∈ [a, b] for all n ∈ ℕ and such that

lim n log | Ω n | π Π n \ { π n } g ( π ) = 0 ,
then
minloss B = { P Ω } .

Proof. We shall consider cases for B B \ { P Ω }. We will show that P Ω B holds for all cases. Then minloss B = { P Ω } follows.

Case 1 B respects logical equivalence.

By Proposition 16 we obtain P Ω B.

Case 2 B does not respect logical equivalence.

Since B does not respect logical equivalence, there exists a minimal N ∈ ℕ such that two different logically equivalent sentences φ, ψ ∈ SN are assigned different degrees of belief, i.e., B(φ) ≠ B(ψ).

We now inductively define functions Bn : S→ [0, 1] for nN. First, let

B N ( χ ) : = { inf { B ( θ ) : θ S N & χ θ } if χ S N B ( χ ) if χ S \ S N .

Now assume n > N. For all χSn such that no θSn−1 is logically equivalent to χ let

B n ( χ ) : = inf { B ( θ ) : θ S n & χ θ }
and otherwise let
B n ( χ ) : = { B n 1 ( θ ) if χ S n and there exists a θ S n 1 with χ θ B ( χ ) if χ S \ S n .

Note that Bn is well-defined, Bn−1 respects logical equivalence on n−1 and thus Bn−1(θ) does not depend on the particular sentence θS n−1 which is logically equivalent to χ.

By construction, Bn+1 agrees with Bn on S n.

Finally, let BI(χ) := limn→∞ Bn(χ). Trivially, BIN = BNN.

Since for all nN the Bn respect logical equivalence on n, BI respects logical equivalence on .

Furthermore, BI agrees with Bn on the sentences of n.

Now consider a χS and let k ∈ ℕ be minimal such that χSk and consider the corresponding proposition F ⊆ Ωk. For all n ≥ max{N, k} we shall show that

inf ρ ϱ n B ( ρ F ) B I ( χ ) .

If kN, then for all nN it holds that Bn(χ) = inf{B(θ) : θ ∈ SN & ⊨χ ↔ θ} = BN(χ). Hence, BI(χ) = BN(χ). For nN there exist ρ ∈ ϱn such that ρF = χ. Thus, inf ρ ϱ n B ( ρ F ) B N ( χ ) = B I ( χ ).

If kN, then there are two cases. If no θSk−1 is logically equivalent to χ, then Bk(χ) = inf{B(θ) : θSk \ Sk−1 & ⊨ χ ↔ θ}. In which case, we find for all nk > N

inf ρ ϱ n B ( ρ F ) inf ρ ϱ k B ( ρ F ) = inf { B ( θ ) : θ S k \ S k 1 & χ θ } = B I ( χ ) .

In the other case there does exist some θSk−1 which is logically equivalent to χ. Then Bn(χ) = Bk−1(θ) for all nk. So BI(χ) = Bk−1(θ). Thus, for all n ≥ max{N, k} ≥ k − 1 it is true that

inf ρ ϱ n B ( ρ F ) inf ρ ϱ k B ( ρ F ) inf ρ ϱ max { N , k } B ( ρ F ) = inf { B ( θ ) : θ S k 1 & χ θ } = B I ( χ ) .

It thus follows for all P ∈ ℙ and all nN that

S g n ( P , B ) = sup ρ ϱ n S g , ρ n ( P , B ) = F Ω n γ n ( F ) ° P ( F ) inf ρ ϱ n log B ( ρ F ) F Ω n γ n ( F ) ° P ( F ) log B I ( ρ F ) for all ρ ϱ n = S g n ( P , B I ) .

Let us now note that BI(φ) < max{B(φ), B(ψ)}. Thus, BI(φ) + BI(¬φ) < max{B(φ), B(ψ)} + BI(¬φ). Also observe that BI(χ) ≤ B(χ) for all χSN. Thus, BI(¬φ) ≤ B(¬φ). Hence,

B I ( φ ) + B I ( ¬ φ ) < max { B ( φ ) , B ( ψ ) } + B I ( ¬ φ ) max { B ( φ ) , B ( ψ ) } + B ( ¬ φ ) 1.

We infer BI(φ) + BI(¬φ) < 1 and thus BI ∉ ℙ.

Case 2A B I B \ .

Since BI respects logical equivalence, we obtain by Proposition 16 that P Ω B I. Applying (13) we obtain P Ω B.

Case 2B B I B .

We shall now define a function BJ assigning every proposition a value in [0, 1] as follows. Let τS be some tautology. {τ} is a partition. Since B I B it follows that BI(τ) < 1. Now put BJ(κ) := 1 − BI(τ) for all contradictions κS. Clearly, BJ(κ) > 0. For all satisfiable χS let BJ (χ) := BI(χ).

Note that B J B and since BJ(¬τ) > 0 it follows that B J B \ . Also note that for all n ∈ ℕ and all P ∈ ℙ it holds that S g n ( P , B I ) = S g n ( P , B J ) and so

S g n ( P , B ) S g n ( P , B I ) = S g n ( P , B J ) .

Since BJ respects logical equivalence we can apply Case 2A to obtain P Ω B J. But then P Ω B. □

Our main minimax theorem (already stated above on Page 2492) then follows immediately from Proposition 17 by applying Lemma 2 and Theorem 3:

Theorem 6 (Regularity minimax). If g is regular and E is finitely generated, then

minloss B = maxent E = = { P Ω } .

If E = P , then the unique function with greatest entropy is the equivocator (Proposition 7). Thus by Theorem 6,

minloss B = maxent = { P Ω } = { P = } .

Recall that P= assigns all n-states ω ∈ Ωn the same probability, P = ( ω ) = 1 | Ω n |. So, if the agent does not possess any evidence then all n-states ω ∈ Ωn are all believed to the same degree. Absence of evidence entails symmetric degrees of belief. In other words, the three norms of objective Bayesianism entail an instance of the Principle of Indifference.

Surprisingly, perhaps, symmetry of the weighting function is not necessary to guarantee this instance of the Principle of Indifference on finite sublanguages—see Appendix B.

4.5. Infinite-Language Invariance

So far, we have been working over a fixed predicate language (without quantifiers). One might wonder what would have happened if one had started out with a different such language.

We will investigate this question by considering predicate languages which contain finitely many further relation symbols and/or finitely many further constant symbols than does .

For all languages we consider here, we shall suppose that the ways the constant symbols are ordered are consistent. Furthermore, we suppose that the order types of the constant symbols are ω, the first infinite ordinal. That is, for 1 let t1, t2, … be the constant symbols in and let T n e w : = { t 1 n e w , , t m n e w } be the set of constant symbols in 1 which are not in . Then we require that the constant symbols of 1 are ordered such that

  • for all n ∈ ℕ, tn appears before tn+1 (consistency),

  • for all tT new there exists some n ∈ ℕ such that t appears before tn (order type ω).

The way the constant symbols of 1 are ordered can be thought of as inserting the tTnew into the ordering of the constant symbols of .

From now on, superscripts are used to refer to such predicate languages, while subscripts continue to refer to their respective finite sublanguages. For example, n 1 is the finite sublanguage of 1 which contains only the first n constants of 1. For 1, in general, the set of the first n constants of may be different from the set of the first n constants of 1.

Definition 23 (Infinite-Language Invariance). A weighting function g is infinite-language invariant, if and only if the following holds: for all and for all E finitely generated by constraints on the finite sublanguage K of , if 1 and 2 are such that 12, then for all B ∈ minloss B 1 there exists a C ∈ minloss B 2 such that C 1 = B.

Infinite-language invariance is motivated by the thought that simply adding new constant or predicate symbols to the language should not change the inferences which are expressible in the original language . Note the following qualification: since each element of the domain is picked out by some member of , one can infer that in ′ formed by adding constants to , there must be some constants which name the same individual.

We shall now proceed to show that the weighting functions which we focus on in this paper—the regular weighting functions—are infinite-language invariant.

Lemma 9. If ε, εare non-empty and convex sets of the following form

ε { ( x 1 , x n ) ϵ n : i = 1 n x i = 1 & x i 0 } ε { ( y 1 , z 1 , y 2 , z 2 , , y n , z n ) ϵ 2 n : y i , z i 0 & ( y 1 + z 1 , , y n + z n ) ϵ ε } ,
then for
{ ( x 1 , , x n ) } = arg sup ( x 1 , , x n ) ϵ ε i = 1 n x i log x i { ( y 1 , z 1 , , y n , z n ) } = arg sup ( y 1 , z 1 , , y n , z n ) ϵ ε i = 1 n y i log y i + z i log z i
it holds that y i = z i = x i 2 for all 1 ≤ in.

Proof. That the suprema are unique follows from the convexity of the sets ε, ε′ and the fact that H Ω n , H Ω 2 n are strictly concave functions on ℙn, respectively, ℙ2n.

Recall that U is the language introduced in Lemma 2. y = z i = x i 2 is a direct consequence of P Ω equivocating beyond k U (Proposition 9). □

Theorem 7. If g is regular, then g is infinite-language invariant.

Proof. Let E be finitely generated by constraints expressible in K. Let 12. By Theorem 6 we obtain minloss B 1 = maxent E 1 = { P Ω 1 } and minloss B 2 = maxent E 2 = { P Ω 2 }, where P Ω 1 and P Ω 2 are the standard entropy limits on 1, respectively, 2.

Let K2 ∈ ℕ be minimal such that K K 2 2, i.e., the set of the first K2 constant symbols of 2 contains the constant symbols {t1, …, tK} of . It suffices to show that for all n ≥ K2 and all ν Ω n 1 it holds that P Ω 1 ( ν ) = P Ω 2 ( ν ), where Ω n 1 is the set of n-states of 1. Note that the constants in t1, …, tK are in K 2 1.

Since the standard entropy limits is finite-language invariant (Section 4.2.1) it follows for nK2 that P Ω 1 ( ν ) = P Ω n 1 ( ν ), where { P Ω n 1 } = arg sup P E n 1 S Ω n ( P ), and { P Ω 2 } ( ν ) = { P Ω n 2 } ( ν ), where { P Ω n 2 } = arg sup P E n 2 S Ω n ( P ).

We now obtain from Lemma 9 and Proposition 5 that

P Ω n i ( ν ) = P Ω ( ω ν ) 1 2 | ν | | ω ν |
where ων is the unique maximal state of such that νων. Thus, P Ω 1 ( ν ) = P Ω 2 ( ν ). □

So, neither adding new redundant names for individuals in the domain to nor adding relation symbols which are not constrained by the agent’s evidence on changes one’s rational beliefs in the sentences φS.

Language invariance is an important desideratum for reasoning under uncertainty. We have seen that focussing on regular weighting functions ensures language invariance. We conjecture that, if one imposes the desiderata that g be atomic, inclusive, symmetric, refined and infinite-language invariant, then the standard entropy maximiser will be the belief function with the best loss profile. If this is the case then our results for regular weighting functions, which are strongly refined, are symptomatic of a more general phenomenon.

5. Handling Quantifiers

Thus far, we have shown that, on a language without quantifiers, if the evidence is finitely generated and the weighting function is regular, then the belief function that has the best lost profile is the probability function in [ E ] that maximises standard entropy. This provides a justification for all the norms of objective Bayesianism on a language without quantifiers.

As we shall see in Section 5.1, that the language is quantifier free was key here: on a language with quantifiers, the n-scores become infinite, which makes the comparison of loss profiles impossible. That the evidence is finitely generated is also key: we shall see in Section 6.1 that the minimax result need not hold true if the evidence is not finitely generated.

While the use of scoring rules cannot be readily adapted to a quantified language , we shall see in Section 5.2 that we can nevertheless justify the norms of objective Bayesianism on if we extend our notion of loss profile and add two further desiderata motivated by the application of objective Bayesianism to inductive logic: that inferences should be language invariant, and that, ceteris paribus, universal hypotheses should be afforded substantial credence.

5.1. Limits to the Minimax Approach

Here we explain why the minimax analysis adopted in Section 4 cannot be applied to the case of a language with quantifier symbols. The problem is that n-score becomes infinite, making it impossible to compare the scores of different belief functions.

There are two ways in which n-score becomes infinite. The first is through a failure of super-regularity. A probability function is super-regular, if it gives every contingent sentence positive probability. Now, many probability functions that seem eminently rational are not super-regular. For example, if one has no evidence, E = , then it is plausible that one is rationally entitled (even if not rationally compelled) to adopt the equivocator function P=, which gives each n-state the same probability, as one’s belief function. However, this probability function will give zero probability to a universally quantified sentence such as ∀xUx. More generally, if evidence is finitely generated then no inclusive, symmetric entropy maximiser will be super-regular:

Proposition 18. Let E be finitely generated and let g be symmetric and inclusive. If the sequence ( P n ) n has a point of accumulation Q ∈ ℙ, then Q is not super-regular.

Proof. Let U be a relation symbol in of arity r, say. For all n ∈ ℕ let

φ n : = ω i = 1 n U t i ω Ω n ω ,
where ti denotes the tuple of r repetitions of ti.

If P K ( φ K ) = 0, then by the open-mindedness of entropy maximisers P n ( φ K ) = 0 for all n ≥ K. Thus, for all points of accumulation Q ∈ ℙ it holds that Q(φK) = 0. Hence, Q is not super-regular.

If P K ( φ K ) > 0, then we apply Proposition 9 to find that for all ln

P l ( φ n ) = P l ( φ K ) | Ω K U | | Ω n U | P l ( φ K ) 2 K n 2 K n ,

Let Q be a point of accumulation of ( P n ) n and let ( P n ) n j be a subsequence which converges to Q. Since K is fixed we now find

0 Q ( x U x ) P 3 ¯ ¯ lim j Q ( i = 1 n j U t i ) = lim j lim m P n m ( i = 1 n j U t i ) = lim j lim m P n m ( φ n j ) lim j 2 K j = 0.

Q is not super-regular. □

Now, a failure of super-regularity is not normally problematic—it is simply a well accepted fact that probability theory forces probability 0 (respectively 1) on many sentences which might be true (respectively false). For example, the strong law of large numbers and the various zero-one laws force extreme probabilities. Moreover, the issue of super-regularity did not arise on , where no contingent sentences are given probability 0 by the entropy maximisers considered above. However, a problem does emerge if we try to apply the scoring rule approach to , where super-regularity becomes pertinent. If θ is possible yet is given zero belief by belief function B then the logarithmic loss, −log B(θ), is infinite if θ turns out to be true. Hence, as long as some epistemically possible physical probability function gives positive probability to θ, belief function B will have infinite score. When scores become infinite, they cannot be readily used to compare belief functions. It is clear, for example, that some non-super-regular belief functions will have better loss profiles than others, but this will not be apparent if we define loss profiles in terms of scores. This problem appears to limit the scope of scoring rules to languages without quantifiers.

One might suggest here that the fact that non-super-regular functions lead to infinite scores merely serves to show that one should adopt a super-regular function as one’s belief function. However, there are good grounds for questioning such a conclusion. In particular, consider again the case of a total absence of evidence. As mentioned above, imposing super-regularity rules out the equivocation function P= as a viable belief function. This means that any super-regular function must, in the total absence of evidence, force a skewed distribution on the n-states, for some n. Thus, one is forced to believe some states to a greater degree than others, despite the fact that one has no evidence to distinguish any such state from any other. So super-regularity leads to very counter-intuitive consequences and the infinite score problem suggests that the scoring rule approach breaks down on languages with quantifiers.

There is a second way in which the scores become infinite when quantifiers are admitted into the language. When one admits quantifiers into the language, one introduces the possibility of infinite partitions (Example 1) and it is natural, when defining a scoring rule on such a language, to consider scores on these infinite partitions. If a weighting function is inclusive then for any sentence θ S , some partition containing θ will be given positive weight. If it is refined, then any partition that refines this partition will be given positive weight, including any infinite partition which refines this partition. The problem is that, even in the total absence of evidence, every belief function has infinite worst-case expected loss over such a partition:

Proposition 19. If there exists a partition π consisting of infinitely many sentences such that g(π) > 0, then for all B B it holds that

sup P φ π g ( π ) P ( φ ) log B ( φ ) = + .

Proof. Let π = {φ1, φ2, … }. Let B B be arbitrary but fixed.

If there exists a φπ such that B(φ) = 0, then any P with P (φ) > 0 satisfies φ π g ( π ) P ( φ ) log B ( φ ) = + .

Now assume that B(φn) > 0 for all n ∈ ℕ.

Since B B it holds that φ π B ( φ ) 1. Thus, there has to exists an infinite set ℕB ⊆ ℕ \ {1} such that n ∈ ℕB implies 0 < B ( φ n ) < 1 n 1 2. Let { n 1 B , n 2 B , } be an enumeration of ℕB. Let { m 2 B , m 3 B , } be an enumeration of an infinite subset of ℕB such that 0 < B ( φ m k B ) 1 e ( k 2 ) < 1 and m k B < m k + 1 B for all k ∈ ℕ \ {1}. Since the n k B tend to infinity, such a sequence ( m k B ) k \ { 1 } has to exist.

Recall that n 1 n 2 = π 2 6. Let P be such that for k ≥ 2 it holds that

P ( φ m k B ) : = 6 π 2 1 k 2 P ( φ 1 ) : = 1 k = 2 P ( φ m k B ) = 6 π 2 P ( φ n ) : = 0 for all n \ { 1 , m 2 B , m 3 B , } .

We now explain why such a probability function P exists.

The idea is to define a measure which assigns the set of term structures which are a model of φ m k Bthe value 6 π 2 1 k 2 and assigns value zero to all other term structures which do not model any of the φ m k B. The probability of an arbitrary sentence χ S is then measure assigned to all term structures in which χ holds. One has to be careful of how to set up this measure. Fortunately, the recipe for doing so is well-known.

We follow [7] (pp. 164) and define a term structure of as a structure with domain {tn : n ∈ ℕ} and each constant symbol tn of is interpreted in as itself. We use T to denote the set of term structures of .

Now let P ( T ) denote the power set of T and put

T ( θ ) : = { T : | = θ } R : = { T ( θ ) : θ S } P ( T ) .

For a quantified sentence θ = ∃(x) let T(θ) := ∪i∈ℕT(θ(ti)), similarly for the universal quantifier ∀.

Now let μ* be any (finitely additive and normalised to one) outer measure on P ( T ) such that μ * ( φ m k B ) = 6 π 2 1 k 2. Particularly simple such outer measures μ* are measures which for all mk assign a single particular term structure in which φ m k B holds the value 6 π 2 1 k 2.

Next, define R to be the smallest subset of P ( T ) which contains R and is closed under complements and countable unions. We now define a countably additive measure μ on R as follows: μ : R → [0, 1] such that μ(A) = μ*(A) for all AR.

Letting P(θ) := μ(T(θ)) defines a probability function as shown in [7] (pp. 168–171). Furthermore, by construction μ * ( φ i ) = 6 i 2 π 2 = P ( φ i ).

Having demonstrated the existence of the required probability function P, we now show that, for this function P, B incurs an infinite loss. Intuitively, P(φn) can be obtained from the sequence ( 1 k 2 ) k by inserting zeros and normalising by multiplying with 6 π 2. The idea behind this definition is to ensure that for all k ∈ ℕ there exists a unique n ∈ ℕB such that P ( φ n ) = 6 π 2 1 k 2. Furthermore, for these n ∈ ℕB it holds that B ( φ n ) 1 e ( k 2 ). For all other n > 1 we ensure that P(φn) vanishes; P(φ1) is defined in such that Σφ∈π P(φ)=1 holds.

So, when P(φn) > 0 and n { m 2 B , m 3 B , , } we have

P ( φ n ) log B ( φ n ) 6 π 2 1 k 2 log e ( k 2 ) = 6 π 2 1 k 2 k 2 log e = 6 π 2 .

Finally, we obtain

φ π g ( π ) P ( φ ) log B ( φ ) g ( π ) m 2 B , m 3 B , 6 π 2 = + .

In particular, even the super-regular belief functions have infinite score on any such partition, so one cannot say that any super-regular function has lower overall score than a non-super-regular function. This result, then, casts further doubt on the suggestion that it might be preferable to adopt a super-regular function as one’s belief function. Moreover, it clearly suggests that an attempt to extend the minimax approach, which is based on scoring rules, to languages with quantifiers will be fraught with difficulty.

5.2. The Probability Norm

We have argued that there is little scope for straightforwardly extending the minimax analysis to languages with quantifiers because of the problem that scores will quickly become infinite and thus incomparable. So we need another approach, if we are to show that the Probability axioms P1-P3, as well as the Calibration and Equivocation norms, are to apply to languages with quantifiers.

Our plan of attack is as follows. First, as noted in Section 4.5, language invariance is an important desideratum. In particular, one would not want one’s degrees of belief on the sentences of a quantifier-free language to change if one were to introduce quantifiers into the language. That is, if evidence determines that one should adopt B1 as one’s belief function on and B2 as one’s belief function on , where both languages contain the same individuals and relation symbols, then one would want B1 and B2 to agree on quantifier-free sentences of , i.e., one would want that B1(θ) = B2(θ) for each θ S .

Thus far, we have argued that a belief function on , given finitely generated E, ought to satisfy the axioms of probability P1 and P2 on , as well as the Calibration and Equivocation norms. Given the language invariance desideratum, this implies that the appropriate belief function on , should, when restricted to quantifier-free sentences, satisfy P1, P2 and the Calibration and Equivocation norms. If we can show that the probability axioms P1-3 should also be satisfied on the language as a whole, then degrees of belief in the quantified sentences are uniquely determined by those on the quantifier-free sentences [7] (Theorem 11.2): there is no further role that Calibration or Equivocation can play on the quantified sentences. Thus it suffices to argue for the probability axioms on . As usual, we restrict attention to evidence sets that are finitely generated in the sense of Definition 5, i.e., E generated by constraints involving sentences of some K and regular weighting functions g.

In Theorem 4 we showed that the default loss incurred by adopting belief function B when φ is true is such that L(φ, B) = − log B(φ), modulo some multiplicative constant. This penalises smaller degrees of belief more than larger degrees of belief. As discussed above, there is little scope for using this to measure the overall expected loss incurred by B on , and so we cannot directly extend the notion of loss profile developed in Definition 21 to . However, this default loss function does suggest the following constraint:

(*) Suppose that for all θ S , B(θ) ≥ B′(θ), and there is some φ S such that B(φ) > B′(φ). Then B has a better loss profile than B′.

In other words, if the default loss incurred by B′ dominates that incurred by B then B has a better loss profile than B′. We can use (*) to extend our notion of loss profile: the two conditions in Definition 21 apply to quantifier-free sentences in , and we add the further condition (*) to constrain the quantified sentences. We shall show that the addition of (*) goes some way towards demonstrating P1-3 on , although we shall have to add a further desideratum in order to complete the derivation.

Definition 24 (Better loss profile on ). B has a better loss profile on than Bif and only if:

  • BB(as defined in Definition 21), or

  • B dominates Bon and there exists some φ S such that B(φ) > B′(φ).

We write B ≺* Bto denote that B has a better loss profile on than B′. Clearly, ≺* is asymmetric. We will be interested in those belief functions on that have the best loss profile on , i.e., the minimal elements of ≺*, and define:

minloss * B : = { B B : t h e r e i s o n B B s u c h t h a t B * B } .

Note that if B dominates B′ on , then BB′ cannot hold. ≺ and ≺* are thus consistent.

Proposition 20. All B ∈ minloss* B agree with P Ω on .

Proof. Since we assume that g is regular and that E is finitely generated we can apply Theorem 6 to obtain that all all B ∈ minloss B agree with P Ω on .

The claim now follows, since BB′ implies B ≺* B′. □

Proposition 21. If minloss B = , then minloss* B = .

Proof. ≺ is asymmetric, irreflexive and transitive, Proposition 10; and thus free of cycles. Hence, for all fixed B B there exists some B B such that BB′. This implies B ≺* B′.

Hence, for all B B there exists some B B such that B ≺* B′. We obtain minloss* B = . □

We shall use B B to denote an arbitrary but fixed belief function in minloss* B . A priori, it is not clear that such a function B exists.

The rest of this section does not depend on E , the weighting function g nor the particular probability function the B ∈ minloss B agree with on . All that matters is that there exists some probability function P the B ∈ minloss B agree with on . As we know, this is the case if E is finitely generated and g is regular.

Definition 25. A sentence φ S is called contingent, if and only if φ and ¬φ are satisfiable.

Lemma 10. For all θ, φ S such that θ |= φ it holds that B(φ) ≥ B(θ). In particular, B(ψ) = 0 for all contradictions ψ S and B(χ) = 1 for all tautologies χ S .

For θ, φ S we have already seen that B(φ) ≥ B(θ), this followed from B satisfying P1 and P2 on .

Proof. Case 1. θ is a contradiction.

For a tautology τ S , {τ, θ} is a partition. Since B(τ) = 1 and B(τ) + B(θ) 1 it follows that B(θ) = 0. Hence, B(φ) 0 = B(θ).

Case 2. θ is a tautology.

Let χ S be a contradiction. We just proved that B(χ) = 0. The only constraints applying to B(θ) are of the form B(θ) + B(χ) 1 where χ is a contradiction and of the form B(θ) 1. Thus, the only meaningful constraint on B(θ) is B(θ) 1. By (*) we have B(θ) = 1.

Since θ implies φ, φ has to be a tautology, too. Hence, B(φ) = 1 = B(θ).

Case 3. θ is contingent.

If φ is a tautology, then B(φ) = 1 by the above and we are done.

Note that φ cannot be a contradiction since θ is satisfiable.

Assume from now on that φ is contingent.

Case 3A |= θ ↔ φ.

For all index sets I and all sentences φ i S the following are equivalent

  • { φ } i I { φ i } ϵ ,

  • { θ } i I { φ i } ϵ

(*) implies that B(φ) = B(θ).

Case 3B θ, φ and φ ∧ ¬θ are contingent.

Let I be any countable index set and let φ i S for iI be contingent such that

{ φ } i I { φ i } ϵ .

Then by the consistency of θ and φ ∧ ¬θ

{ θ φ } { φ ¬ θ } i I { φ i } ϵ .

And since θ |= φ

{ θ } { φ ¬ θ } i I { φ i } ϵ .

From normalisation (Definition 1) we now obtain

B ( φ ) + i I B ( φ i ) 1
B ( θ ) + B ( φ ¬ θ ) + i I B ( φ i ) 1.

Note that the equations in (15) are the only constraints which constrain B(φ). In particular, B(φ) = B(θ) will not violate any constraint in (15).

The question arises whether B(ϕ) = B(θ) imposes any further constraints?

B(ϕ) only imposes constraints on the B(φi) for iI. Let iI be fixed and let J be an index set and ( ψ j ) j J S be such that { φ i } { φ } j J { ψ j } . Then { φ i } { θ } { φ ¬ θ } j J { ψ j } . Thus, B(φ) = B(θ) does not impose any further constraint on B(φi) which is not already imposed by B(θ).

By (*) we now find B(θ) ≤ B(φ). □

Corollary 7. B respects logical equivalence on .

Proof. If φ, θ S are logically equivalent, then B(φ) ≤ B(θ) ≤ B(φ) and thus B(φ) = B(θ). □

Corollary 8. For all x θ ( x ) ϵ S it holds that

lim n B ( i 1 n θ ( t i ) ) B ( x θ ( x ) ) .

Proof. First note that i = 1 n ( θ ( t i ) ) implies i = 1 n ( θ ( t i ) ). Thus, B ( i = 1 n ( θ ( t i ) ) ) is a (not necessarily strictly) increasing sequence in [0, 1] which has a limit. Finally, note that for all n i = 1 n ( θ ( t i ) )implies ∃(x). Hence, B(∃(x)) has to be greater or equal than the limit. □

Corollary 9 (Superadditivity of B on ). If |= ¬ (θφ), then B(θ) + B(ϕ) ≤ B(θ ˅ φ).

Proof. If either θ or φ is a contradiction or a tautology, then the Corollary follows trivially.

If θ ˅ φ is a tautology, then the corollary follows trivially, too.

It remains to consider the case of contingent θ ˅ φ. By the above we may assume that θ and φ are contingent. Let I be any countable index set and let φ i S for iI be satisfiable such that

{ θ } { φ } i I { φ i } .

Then,

{ θ φ } i I { φ i } .

From normalisation (Definition 1) we now obtain

B ( θ ) + B ( φ ) + i I B ( φ i ) 1 B ( θ φ ) + i I B ( φ i ) 1.

The same reasoning a in Lemma 10 about constraints now yields: B(θ) + B(φ) ≤ B(θ ˅ φ).

Lemma 11. For all θ S it holds that B(θ) + Bθ) = 1.

In particular, this means that B ( x θ ( x ) ) + B ( x ¬ θ ( x ) ) = 1 for all x θ ( x ) ϵ S .

Proof. If θ is not contingent, then the lemma holds trivially.

Now assume that θ is contingent and B(θ) + B(¬θ) < 1.

Case 1 There exist contingent ( φ ) i I , ( ψ ) j J S such that

{ θ } i I { φ i } { ¬ θ } j J { ψ j } ϵ
with
B ( θ ) + i I B ( φ i ) = 1 B ( ¬ θ ) + j J B ( ψ j ) = 1.

Note that i I { φ i } j J { ψ j } ϵ and thus i I B ( φ i ) + j J B ( ψ j ) 1. Adding the above equations we now obtain

2 = B ( θ ) + i I B ( φ i ) + B ( ¬ θ ) + j J B ( ψ j ) B ( θ ) + B ( ¬ θ ) + 1.

B(θ) + B(¬θ) 1 follows. Contradiction.

Case 2 For all π with θπ and all π with ¬ θ π it holds that φ π B ( φ ) < 1 and ψ π B ( ψ ) < 1.

Applying (*) we obtain a contradiction since B(θ) or Bθ) could have been set to a greater number.

Case 3 For all π with θπ it holds that ψ π B ( ψ ) < 1 and there exists a partition π with ¬ θ π such that ∑φ∈π′ B(φ) = 1.

Let π′ comprise of contingent (φi)i∈I and ¬θ. For π with θπ we have for all finite JI that

j J { φ j } { θ ¬ j J φ j } { ψ π : ψ θ } ϵ .

In the same manner as in the proof of Lemma 10 it follows that B(θ) ≥ ∑jJ B(φj). Since this holds for all finite JI and I can be at most countable, it follows that B(θ) ≥ ∑iI B(φj).

From Bθ) + ∑iI B(φj) = ∑φπ′ B(φ) = 1 the required contradiction follows:

B ( θ ) + B ( ¬ θ ) i I B ( φ i ) + B ( ¬ θ ) = 1.

(*) is not strong enough to uniquely determine constrain B on . We invoke the following further desideratum to pin down B: ceteris paribus, prefer belief function B to belief function B′ if B gives greater degree of belief to some universally quantified sentence than does B′. One has to be a bit careful about how one formulates such a principle, in order to specify it in such a way that it can be applied consistently. One can appeal to the concept of prenex normal form in order to formulate this desideratum:

(∀*) Suppose that neither of B, B′ have a better loss profile on than the other. Furthermore, suppose there exists a minimal quantifier rank q such that the following hold: For all φ S in prenex normal form with a quantifier rank of q−1 or less it holds that B(φ) = B(φ′) and for all universally quantified θ S in prenex normal form of quantifier rank q it holds that B(θ) ≥ B′(θ) and the inequality is strict at least once. Then B is to be preferred to B′.

The motivation behind (∀*) is not in terms of loss. Rather, the motivation stems from the application to inductive logic (see Section 3.3). The use of probability in inductive logic has been roundly criticised for tending to give non-tautological universal laws probability zero, when such laws are widely—and seemingly rationally—believed in science and beyond; see, e.g., Popper [17] (Appendix *vii). Thus there seems good reason to prefer, ceteris paribus, those probability functions which give more credence to universal hypotheses. (There is a flip-side to (∀*). The more credence one gives to a universal statement ∀(x), the less credence one must give to ∃x¬θ(x). One might motivate the latter policy by appeal to Okham’s Razor, which demands scepticism with respect to the existence of entities—particularly new kinds of entity.)

This leaves us with some desiderata that stem from considerations to do with loss, namely the criteria that make up Definition 21—appealing to dominance of loss, dominance of expected loss, and worst-case expected loss—and some desiderata that stem from the application to inductive logic, namely language invariance and (∀*). These desiderata taken together are enough to justify the norms of objective Bayesianism on , as we shall proceed to show in the remainder of this section.

We shall see first that (∀*) is responsible for ensuring that the degree of belief B(∀(x)), which is already constrained to [ 0 , inf n i = 1 n B ( θ ( t i ) ) ], is equal to the upper bound. On the other hand, B(∃(x)) comes out to be sup n i = 1 n B ( θ ( t i ) ). An arbitrary belief function B ∈ minloss* B which is also optimal according to (∀*) will be denoted by B .

Proposition 22. For all universally quantified sentences x θ ( x ) ϵ it holds that B ( x θ ( x ) ) = lim n B ( i = 1 n θ ( t i ) ).Proof. First note that x θ ( x ) | = i = 1 n θ ( t i ) for all n ∈ ℕ and we thus obtain from Lemma 10 that B ( x θ ( x ) ) lim n B ( i = 1 n θ ( t i ) ).

We now prove by an argument on quantifier ranks that

B ( x θ ( x ) ) = lim n B ( i = 1 n θ ( t i ) ) .

Assume for contradiction that there exists a minimal quantifier rank q ≥ 1 and a sentence ∀(x) in prenex normal form of quantifier rank q such that B ( x ψ ( x ) ) < lim n B ( i = 1 n ψ ( t i ) ).

We now define a function B′ which will be preferred to B which contradicts our standing assumption that no function is preferred to B . Let B ( χ ) : = B ( χ ) for all sentences χ S which are in prenex normal form and have a quantifier rank of q − 1 or less. In particular, B and B′ agree on .

For all φ ( x ) ϵ in prenex normal form of quantifier rank q − 1 we let

B ( x φ ( x ) ) : = lim n B ( i = 1 n φ ( t i ) )
and
B ( x ¬ φ ( x ) ) : = lim n B ( i = 1 n ¬ φ ( t i ) ) .

Now arbitrarily extend B′ to a function in B .

Note that B ( x ψ ( x ) ) > B ( x ψ ( x ) ) and B ( x ¬ ψ ( x ) ) < B ( x ¬ ψ ( x ) ). So, (*) does not discriminate between B and B′. Hence, B and B′ are equally preferable according to ≺*.

B and B′ agree on all sentences in prenex normal form of quantifier rank q−1. Since B ( x φ ( x ) ) lim n B ( i = 1 n φ ( t i ) ) has to hold for all φ ( x ) ϵ it follows that for φ(x) in prenex normal form of quantifier rank q − 1 that B ( x φ ( x ) ) B ( x φ ( x ) ) and for ∀(x) = ψ the inequality is sharp. (∀*) now implies that B′ is preferred to B .

Finally, every sentence of the form ∀(x) is logically equivalent to a universally quantified sentence φ = ∀(x) in prenex normal. Note that θ(t) is logically equivalent to φ(t) for all constants t. Hence,

B ( x θ ( x ) ) = B ( x φ ( x ) ) = lim n B ( n = 1 n φ ( t i ) ) = lim n B ( n = 1 n θ ( t i ) ) .

Proposition 23. B satisfies the axiom P3.

Proof. Applying Lemma 11, Proposition 22 and applying Lemma 11 a second time we find

B ( x θ ( x ) ) = 1 B ( x ¬ θ ( x ) ) = 1 lim n B ( i = 1 n ¬ θ ( t i ) ) = 1 lim n ( 1 B ( i = 1 n θ ( t i ) ) ) = lim n B ( i = 1 n θ ( t i ) ) .

The following might be of interest outside the context of this paper since it generalises Gaifman’s Theorem, [5] (Theorem 1).

Proposition 24. If f : S [ 0 , 1 ] satisfies

  • f(θ) = 1 for all tautologies θ S [ P 1 o n ],

  • for all mutually exclusive θ, φ S it holds that f ( θ φ ) = f ( θ ) + f ( φ ) [ P 2 o n ],

  • f ( x θ ( x ) ) = sup m P ( i = 1 m θ ( t i ) ) for all x θ ( x ) ϵ S and – [P3]

  • f respects logical equivalence on − [P4],

then f is a probability function, i.e., f .

Clearly, P1 on and P4 jointly imply P1.

Proof. First note that f agrees with some probability function on the quantifier free sentences of . By Gaifman’s Theorem, this probability function is unique on ; it shall be denoted by Pf.

We now show that f = Pf. We need to show that for all φ S that f(φ) = Pf(φ).

First, write φ in prenex normal form, φpre. Note that f(φ) = f(φpre).

Next, we do a proof by induction on the quantifier-block rank of φpre to show that f(φpre) = Pf(φpre). The quantifier-block rank of φpre is the number of alternating quantifier blocks in φpre

Base case φpre is of quantifier block rank zero, i.e., φpre does not contain quantifiers. Then

f ( φ ) = f ( φ p r e ) = P f ( φ p r e ) = P f ( φ ) ,
where the second equation holds since f and Pf agree on all sentences of . The first and the last equation hold since f and Pf respect logical equivalence on . This fact will be used without further mention.

Inductive step φpre is of quantifier block rank q ≥ 1.

Let us first suppose that φ p r e = x ¯ χ ( x ¯ ) For q ≥ 2 the first symbol of χ is a universal quantifier, ∀, for q = 1, the first symbol of χ is a relation symbol, a negation symbol or an opening bracket. We find for q = 1

f ( φ ) = f ( φ p r e ) = f ( x ¯ χ ( x ¯ ) ) = p 3 lim n 1 ... lim n k f ( i 1 = 1 n 1 ... i k = 1 n k χ ( t i 1 , ... , t i k ) ) = lim n 1 ... lim n k P f ( i 1 = 1 n 1 ... i k = 1 n k χ ( t i 1 , ... , t i k ) ) = p 3 P f ( x ¯ χ ( x ¯ ) ) = P f ( φ p r e ) = P f ( φ ) ,
where we may substitute Pf for f since χ is quantifier-free and we can thus apply the induction hypothesis.

For q ≥ 2 φ p r e = x ¯ 1 x ¯ 2 ... Q x ¯ q χ ( x ¯ 1 , x ¯ 2 , ), where Q = ∃ for odd q and Q = ∀ for even q.

First, here is an example of two logically equivalent sentences:

( i = 1 2 x 1 x 2 U x 1 x 2 t i ) ( y 1 1 y 2 1 y 1 2 y 2 2 i = 1 2 U y i 1 y i 2 t i ) .

Note that the quantifier block rank on of the sentence on the right of “” is two. The quantifier block rank has been kept low at the price of larger blocks of quantifiers. Since we are giving a proof by induction on the quantifier block rank, we do not have to worry about paying this price. To denote the larger blocks we will use y ¯. In general, the greater the number of variables and on the left of an x ¯ i, the greater the number of variables in y ¯ i.

Now let us compute

f ( φ ) = f ( φ p r e ) = f ( x ¯ 1 x ¯ 2 ... Q x ¯ q χ ( x ¯ 1 , x ¯ 2 , ... x ¯ q ) ) = p 3 lim n 1 ... lim n k f ( i 1 = 1 n 1 ... i k = 1 n k x ¯ 2 ... Q x ¯ q χ ( t i 1 , ... , t i k , x ¯ 2 , ... x ¯ q ) ) = lim n 1 ... lim n k f ( y ¯ 2 ... Q y ¯ q i 1 = 1 n 1 ... i k = 1 n k χ ( t i 1 , ... , t i k , y ¯ 2 , ... , y ¯ q ) ) = I H lim n 1 ... lim n k P f ( y ¯ 2 ... Q y ¯ q i 1 = 1 n 1 ... i k = 1 n k χ ( t i 1 , ... , t i k , y ¯ 2 , ... , y ¯ q ) ) = lim n 1 ... lim n k P f ( i 1 = 1 n 1 ... i k = 1 n k x ¯ 2 ... Q x ¯ q χ ( t i 1 , ... , t i k , x ¯ 2 , ... , x ¯ q ) ) = P 3 P f ( x ¯ 1 x ¯ 2 ... Q x ¯ q χ ( x ¯ 1 , x ¯ 2 , ... x ¯ q ) ) = P f ( φ p r e ) = P f ( φ ) .

“I H” indicates that we used the induction hypothesis on a sentence of quantifier rank q − 1.

The case of φ p r e = ∀(x) is analogous, simply replace the disjunctions by conjunctions.

Theorem 8. If E is finitely generated and g is regular, then

{ B } = maxent E = { P Ω } .

Proof. By Proposition 24 we only need to convince ourselves that B satisfies P1 on , P2 on , P3 and P4 in order to conclude that B P L. Note that we have done so in Theorem 6, Proposition 23 and Corollary 7. So all B are probability functions.

All B agree on with P Ω . Two different probability functions have to disagree on a quantifier-free sentence (Gaifman’s theorem). Hence, B is a unique and equal to P Ω .

We should point out that (∀*) was only used in Proposition 23. We showed that (*) alone is enough to force that B satisfies P1, P2 on , B ( x θ ( x ) ) lim n B ( i = 1 n θ ( t i ) ) and P4.

In sum, then, by adding invoking two new considerations, (*) and (∀*), one can show that the Probability norm must hold on a predicate language with quantifiers. Since the Calibration and Equivocation norms are already forced on the quantifier-free sentences, and probabilities on these quantifier-free sentences determine those of the quantified sentences, all the norms of objective Bayesianism hold on , assuming that the weighting function is regular and the evidence is finitely generated.

6. More Complex Evidence

The question arises as to which functions have an optimal loss profile when E L is not finitely generated. In Section 6.2 we shall present a tractable case and show that in that example the function with maximal standard entropy has the best loss profile. First, in Section 6.1, we shall see that not all examples admit of such an analysis. In particular, we shall analyse an example in some depth in which { P } = maxent E but P 6 minloss B . Thus, when evidence is not finitely generated, the optimal loss profile may not be achievable by maximising entropy.

6.1. When Losses Cannot Be Minimised

We shall now develop an example in which the minimax theorem fails: P Ω minloss B , as we shall see in Proposition 27. However, the entropy identity, P = { P Ω } = maxent E , does hold (Proposition 25 and Proposition 26). The connection with optimal loss fails to obtain since minloss B = (Proposition 30). Thus, there is no belief function with an optimal loss profile in this sort of example. Nevertheless, certain equivocal functions P ¯ N derived from the maximal entropy function come arbitrarily close to having the best loss profile (Proposition 29 and Proposition 31). So, while there is no unique function with the best loss profile, the functions P ¯ N have a very good loss profile.

In the following discussion we shall focus on the most simple possible language, = U, which contains only one relation symbol, U, which is unary. We focus on this simple language since the minimax results already fail here and considering more expressive languages does not lead to new insights while creating more notational issues. As a technical convenience, we extend the notion of a loss profile to arbitrary functions f : S [ 0 , 1 ], not merely normalised belief functions.

The example that we shall consider is generated by the following evidence:

E = { ¬ U 1 t i ¬ U 1 t 1 : i = 1 , 2 , } .

Let ω k n Ω n be the k-th n-state of = U, i.e., ω 1 n : = l = 1 n ¬ U t 1 , ω 2 n : = l = 1 n 1 ¬ U t 1 U t n , ... , ω n 2 n : = l = 1 n U t 1. The set of calibrated probability functions can be characterized in various ways:

E L = { P : P ( ¬ U 1 t i ¬ U 1 t 1 ) = 1 , i = 1 , 2 , } = { P : P ( ω n + 1 1 ) = P ( ω 1 n ) for all n 1 } = { P : P ( ω i n ) = 0 for 2 i 2 n 1 , n = 1 , 2 , ... } = { P : P ( ω 2 n ) = 0 for all n 2 } = { P : P ( ¬ U t 1 U t n ) = 0 for all n 2 } = { P : P ( ω 1 n + 1 | ω 1 n ) = 1 for all n } = { P : P ( ω 2 n + 1 | ω 1 n ) = 0 for all n } = { P : P ( x ( U t 1 ¬ U x ) ) = 1 } = { P : P ( x ( ¬ U t 1 U x ) ) = 0 }

The last two characterisations employ quantifiers; adding quantifiers to the language enables a finite representation of what is essentially an infinitely generated evidence set. Hence in Definition 5, we specified that an evidence set is finitely generated just if it generated by quantifier-free sentences of some finite sublanguage.

We now begin our analysis of this example:

Proposition 25. If g = gΩ or if g is symmetric and inclusive, then = { P Ω } and P Ω is not open-minded.

Proof. For all n

E n = { P n : p ( ω i n ) = 0 for all 2 i 2 n 1 } .

Then, by Landes and Williamson [4] (Corollary 6, p. 3574) for symmetric and inclusive g

P n ( ω 1 n ) = P n ( ω i n ) = 1 2 n 1 + 1 for all 2 n 1 + 1 i 2 n
and so for all n and all 1 ≤ i ≤ 2n−1
lim n P n ( ω 1 n ) = lim n P n ( ω 1 1 ) = lim n 1 2 n 1 + 1 = 0.

For all n and i ∈ {1, 2n−1+1,…,2n}

P ( ω 1 n ) = 1 2 n 1 .

The result for g = gΩ follows in the same way as above. □

We shall note for later reference that for all n ≥ 2

H Ω n ( P n ) = log 1 2 n 1 + 1 > log 1 2 n 1 = H Ω n ( P Ω ) .

Proposition 26. If g = gΩ or if g is regular, then

maxent E = { P Ω } .

Proof. First note that [ E ] = E .

We shall show that for all Q E \ { P Ω }there exists an N such that for all n ≥ N we have H Ω n ( P n ) > H Ω n ( Q ) and H g n ( P Ω ) > H g n ( Q ).

Since Q P Ω there exists a minimal k and a k-state ν ∈ Ωk such that Q(ν) > PΩ(ν) 0.

Case 1 ν = ω 1 k.

To simplify notation let α := Pk(ν) = Pk() α : = P k ( ν ) = P k ( ω 1 n ) > 0 for all n 1 Let us now define a function Q E \ { P Ω }. Note that since we want Q to be a member of E we need to let Q ( ω 1 n ) : = Q ( ω 1 1 ) for all n . Now let for all n

Q ( ω 1 n ) : α > 0 Q ( ω i n ) : = 0 for 2 i 2 n 1 Q ( ω i n ) : = 1 α 2 n 1 for 2 n 1 + 1 i 2 n

The restriction operator n applied to some belief function B continuous to refer to the restriction of B to n , rather than to the restriction to n.

Note that for all n ≥ 1

{ Q n } = arg sup p E n P ( ω 1 n ) = α H g n ( P )
since entropy maximisers assign n-states the same degree of belief whenever possible [4] (Corollary 7, p. 3577). Thus, H g n ( Q ) H g n ( Q ) for all n . Also , H Ω n ( Q ) H Ω n ( Q ) for all n .

Let us compute for n ≥ k

H Ω n ( P Ω ) H Ω n ( Q ) = log ( 1 2 n 1 ) [ α log ( α ) ( 1 α ) log ( 1 α 2 n 1 ) ] = log ( 2 n 1 ) + α log ( α ) + ( 1 α ) ( log ( 1 α ) log ( 2 n 1 ) ) = log ( 2 n 1 ) ( 1 α ) log ( 2 n 1 ) + α log ( α ) + ( 1 α ) log ( 1 α ) = α ( n 1 ) log ( 2 ) + α log ( α ) + ( 1 α ) log ( 1 α ) .

It follows that for all large enough n that H Ω n ( P Ω ) > H Ω n ( Q ) H Ω n ( Q ).

For regular g we now find

H g n ( P Ω ) H g n ( Q ) = g ( π n ) . [ α ( n 1 ) log ( 2 ) + α log ( α ) + ( 1 α ) ( log ( 1 α ) ] π n g ( π ) F π ° P Ω ( F ) log ( ° P Ω ( F ) ) ° Q ( F ) log ( ° Q ( F ) ) .

So, as long as π n \ { π n } g ( π ) goes to zero quickly enough it follows that H g n ( P Ω ) > H g n ( Q ) H g n ( Q ) for large enough n. Corollary 6 shows that this is indeed the case for regular g.

Case 2 ν { ω 2 k , , ω 2 k 1 k } Since Q is assumed to be calibrated, Q E , this case cannot occur. Case 3 ν { ω 2 k 1 + 1 k , , ω 2 k k }.

Case 3A Q { ω 1 k } = 0.

Then Q { ω 1 1 } = 0. But for all n ∈ ℕ

arg sup P E n P ( ω 1 1 = 0 ) H Ω n ( P ) = { P Ω n } = arg sup P E n P ( ω 1 1 = 0 ) H g n ( P )

Since QPΩ it follows that there exists some N ∈ ℕ such that Q n P Ω n for all n N. But then H g n ( P Ω ) > H g n ( Q ) and H Ω n ( P Ω ) > H Ω n ( Q ) for all n N.

Case B Q { ω 1 k } > 0.

Then Q ( ω 1 k ) > 0 = P Ω ( ω 1 k ). Proceed as in Case 1. □

Proposition 27. If g = gΩ or if g is regular, then P Ω minloss B .

Proof. We here show that there exists an R E such that for all n ∈ ℕ it holds that S g n ( R , P ) = S Ω n ( R , P ) = and that there exists an open-minded Q E such that for all n ∈ ℕ we have sup P E S Ω n ( P , Q ) < .

Note that the probability function R with R ( ω 1 n ) : = 1. Then S g n ( R , P Ω ) = S Ω n ( R , P Ω ) = for all n .

We shall now construct an open-minded Q E as advertised. For all n ∈ ℕ let

Q ( ω 1 n ) : = 1 2 Q ( ω i n ) : = 0 for all 2 i 2 n 1 Q ( ω i n ) : = 1 2 n for all 2 n 1 + 1 i 2 n .

Thus, Q is open-minded and hence sup P E S Ω n ( P , Q ) sup P E S g n ( P , Q ) < + for all n∈ ℕ. □

Note that Condition 1 of Definition 21 is solely responsible for the fact that P Ω minloss B . Condition 2 has played no role here.

So far, we have established that P = P Ω does not have the best loss profile. The question arises whether there exists a belief function B B which is a minimal element of , i.e., B minloss B .

Proposition 28. If g = gΩ, then

minloss B = = minloss = minloss E

Initially, one might suspect that minloss B = would be somehow due to the fact that the S Ω ndo not take beliefs in all sentences into account. This is not the case. As we will see, minloss = = minloss E holds. That is, even when restricting attention to probability functions, whose values on the n-states completely determine degrees of beliefs in all other sentences, we cannot find a function with an optimal loss profile.

Proof. Suppose for contradiction that Q minloss \ { P Ω }.

If Q is not open-minded, then there exists an N ∈ ℕ, an F ⊆ ΩN and an P E such that °P (F ) > 0 and °Q(F ) = 0. But then there has to exists some ω ∈ ΩN with ω ∈ F such that P (ω) > 0 = Q(ω) since Q and P are probability functions. Thus, for all n ≥ N there exists some ν ∈ Ωn such that ν = ω with P (ν) > 0 = Q(ν). But then S Ω n ( P , Q ) = + for all n ≥ N.

In the proof of Proposition 27 we constructed an open-minded function Q+ ∈ E. For Q+ we have for all n that s u p p E S Ω n ( P , Q + ) < + . So, any Q minloss has to be open-minded.

Case 1 Q minloss \ E and Q ∉ E

Since Q \ E there has to exist a minimal k ≥ 2 such that Q ( ω 2 k ) > 0.

We next define a probability function Q E with the following construction for all n ≥ 2

Q ( ω i l ) : = Q ( ω i l ) for all 1 l k 1 and all i Q ( ω 1 n ) : = Q ( ω 1 k ) for all n Q ( ω 1 n ) : = 0 for all n k and all 2 i 2 n 1 Q ( ω i n ) : = Q ( ω i n ) for all 2 n 1 + 1 i 2 n .

It follows that for all n ≥ k and all ω Ω n \ { ω 2 n , , ω 2 n 1 n } and all P E n such that P (ω) > 0 it holds that. Q ( ω ) Q ( ω ) For all large enough n ∈ N we then find

sup p E S Ω n ( P , Q ) log min 2 n 1 + 1 i 2 n { Q ( ω ) } = log min 2 n 1 + 1 i 2 n { Q ( ω ) } = sup p E S Ω n ( P , Q )

Hence, there has to exists a Q E minloss with Q P Ω .

Case 2 Q Q minloss and Q E { P Ω }.

Thus, 0 < Q ( ω 1 1 ) = Q ( ω 1 n ) for all n 2. Let N ≥ 3 be such that Q ( ω 1 N ) > min { Q ( ω 2 N 1 + 1 N ) , ... , Q ( ω 2 N N ) } For n ≥ N let

Ω n : = arg min { Q ( ω 2 n 1 + 1 n ) , ... , Q ( ω 2 n n ) } Ω n

We now find for all fixed n ≥ N that

sup p E S Ω n ( P , Q ) = log Q ( ω n ) for all ω n Ω n .

We shall now define a function R E \ { P Ω } by letting for all n ≥ 2:

R ( ω 1 n ) : = Q ( ω 1 n ) 2 = Q ( ω 1 1 ) 2 R ( ω i n ) : = 0 for all 2 i 2 n 1 R ( ω i n ) : = Q ( ω i n ) + Q ( ω 1 1 ) 2 2 | Ω n | > Q ( ω i n ) for all 2 n 1 + 1 i 2 n

That is, R = Q + P Ω 2.

For large enough M ∈ ℕ it holds for all n ≥ M that

R ( ω 1 n ) > min { R ( ω 2 n 1 + 1 n ) , ... , R ( ω 2 n n ) }

Furthermore, for all n ≥ max{M, N} it holds that

arg min { Q ( ω 2 n 1 + 1 n ) , , Q ( ω 2 n n ) } = arg min { R ( ω 2 n 1 + 1 n ) , , R ( ω 2 n n ) }
and hence for all large enough fixed n ∈ ℕ and all ω n Ω n
sup P E S Ω n ( P , R ) = log R ( ω n ) < log Q ( ω n ) = sup P E S Ω n ( P , Q ) .

Thus, R has a better loss profile than Q. Hence, Q minloss and Q minloss E .

Finally, let us consider loss profiles for B B \ . .

Case 3 B minloss B and B ..

For all P , the expression S Ω n ( P , B )only depends on the degrees of belief B assigns to sentences which represent an n-state. So, the degree of belief in a sentence φ ∈ S which does not n-represent an n-state are ignored by S Ω n ( P , B ) for all n and all P . If B agrees with some probability function P on all sentences of S which n-represent an n-state, then B and P are equally preferable according to ≺. As we saw above, for all P there exists some Q with QP. Thus, B cannot be a minimal element of ≺.

We can hence assume that for all P there exists some sentence φ S n which n-represents an n-state such that B(φ) 6= P(φ). Since no P is dominated, it follows that B(ϕ) < P(φ).

First define a function B0 as follows:

B 0 ( φ ) : = inf ρ ϱ n ρ ω = φ B ( ρ ω ) , if such an n and such a ρ ϱ n exist , B 0 ( φ ) : = 0 otherwish .

B0, which does not agree with any probability function on has been constructed in such a way that B and B0 are equally preferred according to ≺.

Next define a function B+ by first letting for all fixed N ∈ ℕ

B + ( φ ) : = sup n N v Ω n v = ω B 0 ( v )
for all sentences φ S N which are logically equivalent to an N-state. Put B+(ψ) := 0 for all other ψ S .

Since B+ dominates B0 the loss profile of B+ cannot be worse than that of B0. Furthermore, note that for all N ∈ ℕ, all ω ∈ ΩN and all n > N it holds that

B + ( ω ) v Ω n v | = ω B + ( v ) .

Let α : = lim n ω Ω n B + ( ω ). For α = 0 it follows by the usual reasoning that B+ cannot have an ideal loss profile. This leads to a contradiction in the usual way.

For 1 ≥ α > 0 define a function B by first letting for all sentences φ S which are logically equivalent to some n-state ω

B ( φ ) : = 1 α lim n v Ω n v | = φ B + ( v ) .

For all other sentences φ S let B(φ) := 0.

Observe that for all k ∈ ℕ and all ω ∈ Ωk

B ( ω ) = 1 α lim n v Ω n v | = ω B + ( v ) = 1 α lim n λ Ω k + 1 λ | = ω v Ω n v | = λ B + ( v ) = λ Ω k + 1 λ | = ω B ( λ ) .

Finally, we note that B agrees with some P on all sentences in S which represent a state. Then B cannot have a better loss profile than P. As we saw in Case1 and Case2, for all P there exists a Q which has a strictly better loss profile than P. This contradicts B ∈ minloss B . □

Denote by P ¯ N the unique probability function in E satisfying for all n ∈ ℕ

P ¯ N ( ω 1 n ) = P N ( ω 1 n ) = P N ( ω 1 1 ) = 1 2 N 1 + 1 P ¯ N ( ω i n ) = 0 for all 2 i 2 n 1 P ¯ N ( ω i n ) = ( 1 1 2 N 1 + 1 ) 2 | Ω n | = 1 | Ω N | 2 + 1 | Ω N | | Ω n | for all 2 n 1 + 1 i 2 n .

That is, P ¯ N agrees with P ¯ N on N and equivocates beyond N as much as possible while satisfying P ¯ N E Proposition 29. For all ϵ > 0 there exists an N such that for all n ≥ N

sup P E S Ω n ( P , P ¯ N ) sup P E S Ω n ( P , P n ) ϵ .

Proof. For all large enough N and even larger n we find

0 sup P E S Ω n ( P , P ¯ N ) sup P E S Ω n ( P , P n ) = log ( | Ω N | 2 | Ω N | 2 + 1 2 | Ω n | ) + log ( 1 2 n 2 + 1 ) = log ( 2 N 1 2 N 1 + 1 ) + log ( 2 n 2 ) + log ( 1 2 n 1 + 1 ) = log ( 2 N 1 2 N 1 + 1 ) + log ( 2 n 1 2 n 1 + 1 ) .

For ϵ > 0 let N > 2 be such that 0 < log 2 N 1 2 N 1 + 1 < ϵ. Then for all n ≥ N it holds that 0 > log 2 n 1 2 n 1 + 1 > log 2 N 1 2 N 1 + 1. For n ≥ N large enough we now obtain

0 sup P E S Ω n ( P , P ¯ N ) sup P E S Ω n ( P , P n ) = log ( 2 N 1 2 N 1 + 1 ) + log ( 2 N 1 2 N 1 + 1 ) < ϵ .

Having considered loss for g = g Ω we now investigate loss for regular g.

Proposition 30. If g is regular, then minloss B = .

Proof. We will show that ≺ has no minimal element. Suppose for contradiction that B B is such a minimal element.

Define a function B : S [ 0 , 1 ] by

B ( φ ) : = 0 for all φ for which there exists an n with i = 2 2 n 1 ω i n = φ B ( ψ ) : = B ( ψ ) else .

B′ and B are equally preferable according to ≺ since P (φ) = 0 for all P E and all such φ.

For all φ S let nφ be the minimal n such that φ S n φ . Now define a function Binf by first letting

B inf ( φ ) = inf ψ S n φ φ ψ B ( ψ ) .

Put Binf(φ) := B′(φ) for all other φ S . For all φ S it holds that Binf(φ) ≤ B(φ). Furthermore, Binf is equally preferable to B′ according to ≺. We now consider cases to show that there is a function with a strictly better loss profile than Binf, which contradicts our assumption that B ∈ minloss B .

Case A There exists some N such that for all n ≥ N, Binf and P Ω agree on all n-states. Since B P Ω it holds that B inf P Ω and hence B 1 2 : = B inf + P Ω 2 P Ω . Thus, for all n ≥ N B 1 2and P Ω agree on all n-states. But then for all n ≥ N all F ⊆ Ωn and all ρ ∈ ϱn B inf ( ρ F ) B 1 2 ( ρ F ). Hence, for all P it holds that S g n ( P , B 1 2 ) S g n ( P , B inf ).

From the above we have that for all n ≥ N there exists an F ⊆ Ωn such that F \ { ω 2 N , , ω 2 N 1 N } = and such that B inf ( ρ F ) < B 1 2 ( ρ F ) for some ρ. Thus, there exists some P E with °P(F) > 0. Then S g n ( P , B 1 2 ) < S g n ( P , B inf ) for this P and all n ≥ N.

Thus, B 1 2 B inf by Condition 2 of Definition 21.

Case B There exist infinitely many n where Binf and P Ω agree on all n-states and infinitely many n many where they do not agree on all n-states.

Since P Ω is a probability function it follows that for all n , all F ⊆ Ωn and all ρ ∈ ϱn B inf ( ρ F ) P Ω ( ρ F ) has to hold. Now proceed as in Case A.

Case C The number of n for which Binf and P Ω agree on all n-states is finite (possibly zero).

Case C1 There exists an infinite set J , J = {j1, j2, … }, such that limi→∞ ω Ω j i B inf ( ω ) = 1.

If P Ω dominates Binf, we are done.

If P Ω does not dominate Binf, then define a function B1 by letting for all n and all F ⊆ Ωn

° B 1 ( F ) : = lim i ω Ω j i ω F B inf ( ω )
and requiring that B1 satisfies logical equivalence on L. For all φ S \ S use Gaifman’s condition to ensure that B1 is a probability function.

Since we assumed that P Ω does not dominate Binf B 1 P Ω holds. Furthermore, B1 dominates Binf.So, the loss profile of B1 is at least equally good as that of B.

We complete this proof by showing that minloss B = .

Now suppose for contradiction that there exists a function Q ∈ minloss B such that Q ( ω 2 n ) > 0 for some n ≥ 2, i.e., Q E . It needs to hold that Q ( ω 1 n ) > 0 for all n (open-mindedness).

Let k ≥ 2 be minimal such that Q ( ω 2 k ) > 0. Now define a function R by letting for all n > k

R ( ω i k ) : = Q ( ω i k ) + P Ω ( ω i k ) 2 for all 1 i 2 k R ( ω i n ) : = R ( ω 1 k ) = Q ( ω 1 k ) 2 = Q ( ω 1 k ) + P Ω ( ω 1 k ) 2 for all n > k R ( ω 2 n k + 1 n ) : = Q ( ω 2 k ) + P Ω ( ω 2 k ) 2 = Q ( ω 2 k ) 2 > 0 R ( ω i n ) : = Q ( v ) + P Ω ( v ) 2 for all 2 n 1 + 1 i 2 n where v Ω k with ω i n v R ( ω i n ) : = 0 otherwise .

That is, R is the arithmetic mean of Q and P Ω on k. Beyond k, R equivocates under the k-states which imply Ut1. For such n-states R ( ω i n ) = Q ( v ) + 1 2 n 1 2 holds. Beyond k, there are only two n-states which imply ¬Ut1 which are assigned non-zero probability, w 1 n and w 2 n k + 1 n.

We now show that R has a strictly better loss profile than Q what contradicts Q ∈ minloss B .

Let v k ∈ arg min ω { ω 2 k 1 + 1 , , k ω 2 k k }. Trivially, 0 < Q ( v k ) < 1 2 k 1. Next note that for all n ≥ k which are large enough it holds that

min ω { ω 1 n , ω 2 n 1 + 1 n , , ω 2 n n } R ( ω ) = 1 2 k 1 + Q ( v k ) 2 | Ω k | | Ω n |
and that
min ω { ω 2 n 1 + 1 n , , ω 2 n n } Q ( ω ) Q ( v k ) | Ω k | | Ω n | .

We now find for all large enough n > k that

sup P E S g n ( P , Q ) sup P E S g n ( P , R ) g ( π n ) log ( Q ( v k ) | Ω k | | Ω n | ) sup P E S g n ( P , R ) g ( π n ) ( log ( Q ( v k ) | Ω k | | Ω n | ) sup P E S g n ( P , R ) ) sup P E π Π n \ { π n } g ( π ) F π ° P ( F ) log ° R ( F ) .

Whenever °P (F) > 0 with F ⊆ Ωn, then °R(F) is bounded from below by 1 2 n. Hence, the last term in the above sum converges to zero, since g is regular.

We now obtain the contradiction as follows: there exists some ϵ > 0 such that for all large enough n ≥ k it holds that

log ( Q ( v k ) | Ω k | | Ω n | ) sup P E S Ω n ( P , R ) = log ( Q ( v k ) | Ω k | | Ω n | ) + log ( 1 2 k 1 + Q ( v k ) 2 | Ω k | | Ω n | ) = log ( 1 2 k 1 + Q ( v k ) 2 ) log Q ( v k ) ϵ .

We have thus shown that if minloss B , then there exists some Q E minloss B .

Case C1A Q ( w 1 1 ) = 0. Then Q has infinite worst-case expected loss for all n and we are done.

Case C1B Q ( ω 1 1 ) > 0.

By open-mindedness, Q ( ω 1 1 ) < 1 has to hold.

For all n let ω n ∈ arg min ω { ω 2 n 1 + 1 , , n ω 2 n n } Q ( ω ) From Q E we now obtain that for all large enough n there exists a probability function R ∈ arg sup P E S Ω n ( P , Q ) such that R ( ω n ) = 1.

Next, define a probability function Q′ ∈ E where Q ( ω 1 n ) : = Q ( ω 1 1 ) : = Q ( ω 1 1 ) and Q′ equivocates over Ut1, Q ( ω i n ) : = Q ( ω 2 1 ) | Ω 1 | | Ω n | for all n and for all 2n−1 + 1 ≤ i ≤ 2n. Assume for contradiction that QQ′.

We next show that Q≺ Q. This contradicts Q ∈ minloss B . To this end let us note that for all large enough n

sup P E S g n ( P , Q ) sup P E g ( π n ) S Ω n ( P , Q ) + sup P E π n \ { π n } g ( π ) F π ° P ( F ) log ° Q ( F ) g ( π n ) log Q ( ω 2 n n ) + sup P E π n \ { π n } g ( π ) F π ° P ( F ) log Q ( ω 2 n n ) = g ( π n ) log Q ( ω 2 n n ) log ( Q ( ω 2 1 ) | Ω 1 | | Ω n | ) π n \ { π n } g ( π ) .

Since whenever °P (F) > 0, then °Q′(F) is bounded from below by Q ( ω 2 n n ).

Thus, for all large enough n we have

0 sup P E S g n ( P , Q ) g ( π n ) sup P E S Ω n ( P , Q ) log ( Q ( ω 2 1 ) | Ω 1 | | Ω n | ) π n \ { π n } g ( π ) .
g is regular, hence, this last term converges to zero. We thus obtain
lim n sup P E g ( π n ) S Ω n ( P , Q ) sup P E S g n ( P , Q ) = 0.

Since QQ′, Q, Q′ ∈ E and Q ( ω 1 1 ) = Q ( ω 1 1 ), there has to exist some minimal k ∈ ℕ a minimal

i ≥ 2k−1 + 1 such that Q ( ω i k ) < Q ( ω i k ). We now find for all large enough n that

sup P E S g n ( P , Q ) g ( π n ) sup P E S Ω n ( P , Q ) g ( π n ) ( sup P E S Ω n ( P , Q ) sup P E S Ω n ( P , Q ) ) g ( π n ) ( log Q ( ω n ) + log Q ( ω 2 n n ) ) g ( π n ) ( log ( Q ( ω i n ) | Ω k | | Ω n | ) + log ( Q ( ω 2 i k ) | Ω k | | Ω n | ) ) g ( π n ) ( log Q ( ω i k ) + log Q ( ω i k ) ) > 0.

Recall that there exists 0 < a ≤ b such that for all n ∈ ℕ a ≤ g(πn) ≤ b holds. Hence, there exists some constant c > 0 such that g ( π n ) ( log Q ( ω i k ) + log Q ( ω i k ) ) c > 0. From (17) we conclude that for all large enough n

sup P E S g n ( P , Q ) sup P E S g n ( P , Q ) > 0
holds. Thus, Q′ ≺ Q. So, Q ∉ minloss B .

To complete the proof of Case C1B we show that there exists some N ∈ ℕ such that P ¯ N has a strictly better loss profile than Q′.

Let N ∈ ℕ be such that P ¯ N ( ω 1 1 ) < Q ( ω 1 1 ). Analogous to the above it holds that

lim n sup P E S g n ( P , P ¯ N ) sup P E g ( π n ) S Ω n ( P , P ¯ N ) = 0.

It hence suffices to show that there exists some ε > 0 such that for large enough N ∈ ℕ and all n ≥ N

g ( π n ) ( sup P E S Ω n ( P , Q ) sup P E S Ω n ( P , P ¯ N ) ) > ϵ .

We now recall that Q ( ω 1 1 ) > P ¯ N ( ω 1 1 ). The required inequality follows for large enough n ∈ ℕ

sup P E S Ω n ( P , Q ) sup P E S Ω n ( P , P ¯ N ) = log ( 1 Q ( ω 1 1 ) 2 n 1 ) + log ( 1 P ¯ N ( ω 1 1 ) 2 n 1 ) > ϵ .

Hence, P ¯ N Q .

Case C2 There exist an α > 0 and an minimal N1 such that for all n N 1 ω Ω n B inf ( ω ) 1 α holds.

We may assume that Binf is open-minded on . Thus there has to exist some minimal N ≥ N1 such that 0 < P n ( ω 1 1 ) < B inf ( ω 1 1 ) for all n ≥ N. For all large enough n ≥ N we now find

1 g ( π n ) sup P E S g n ( P , B inf ) sup P E S Ω n ( P , B inf ) = max ω Ω n \ { ω 2 n 1 + 1 n , , ω 2 n n } log B inf ( ω ) = log ( max ω Ω n \ { ω 2 n 1 + 1 n , , ω 2 n n } B inf ( ω ) ) log 1 α B inf ( ω 1 1 ) 2 n 1 .

Using (18) we find for all large enough n ∈ ℕ

sup P E S g n ( P , B inf ) sup P E S g n ( P , P ¯ N ) g ( π n ) ( log 1 α B inf ( ω 1 1 ) 2 n 1 + log 1 P N ( ω 1 1 ) 2 n 1 ) + ( log ( | Ω n | ) log ( | Ω n | ) log ( P ¯ N ( ω 2 N N ) ) ) π Π n \ { π n } g ( π ) > 0.

Proposition 31. For all regular g and all ϵ > 0 there exists an N ∈ ℕ such that for all n ≥ N

sup P E S g n ( P , P ¯ N ) sup P E S g n ( P , P ¯ N ) ϵ .

Proof. Let ϵ > 0 be fixed. By (18) it suffices to show that there exists some N ∈ ℕ such that for all n ≥ N it holds that

0 sup P E S g n ( P , P ¯ N ) sup P E S g n ( P , P ¯ N ) g ( π n ) sup P E S Ω n ( P , P ¯ N ) g ( π n ) sup P E S Ω n ( P , P ¯ N ) ϵ .

Now simply note that we have proved this already in Proposition 29. □

Hence, for all ϵ > 0 there exists some N ∈ ℕ such that for all n ≥ N and all Q B

sup P E S g n ( P , P N ) sup P E S g n ( P , Q ) sup P E S g n ( P , P N ) sup P E S g n ( P , P N ) > ϵ .

Although, P ¯ N is not a minimal element of ≺, the losses incurred by adopting any other B B can only be marginally better, eventually.

Thus, for fixed k and δ > 0 there exists an N ∈ ℕ such that for all φ S k | P ¯ N ( φ ) P Ω ( φ ) | < δ. Hence, belief functions with an arbitrarily good loss can be found within an (Euclidean) neighbourhood of P Ω .

Since the P ¯ N are probability functions, there does not exist a B B which dominates P ¯ N on or on . Furthermore, the P ¯ N are optimal according to (∀*). The P ¯ N thus are almost optimal in all the senses we here considered.

In essence, the phenomenon of minloss B = arises from P ¯ N + 1 having a strictly better loss profile than P ¯ N but the limit of the sequence ( P ¯ N ) n is P Ω , which is not open-minded. This phenomenon is reminiscent of min{x ∈ ℝ : 0 < x < 1} = ∅, where it is possible to get ever closer to zero but it is impossible to reach it.

6.2. When Losses Can Be Minimised

The analysis of Section 6.1, shows that there can be no general minimax theorem which covers any evidence that is not finitely generated. On the other hand, we shall see in this section that for certain natural cases evidence which cannot be finitely generated, minimax theorems do obtain.

Let contain only one m-ary relation symbol, U, and c ∈ [0, 1]. Let ν 1 n : = 1 i 1 , , i m n ¬ U t i 1 t i 2 t i m Ω n and let ν 2 n , , ν | Ω n | n be an enumeration of the remaining n-states. We shall consider the following example:

E = { P : lim n P ( ν 1 n ) = c } = { P : P ( x 1 x 2 x m ¬ U x 1 x 2 x m ) = c } .

Slightly less general versions of E have attracted recent interest in the literature [18] (Example 3, p. 95), [19] (Example 3.5, p. 172) and [1] (Example 5.7, p. 99). We here consider relations symbols U of arbitrary arity, while previously U was taken to be unary.

First of all, if c = 0 and g is symmetric and inclusive, then P= ∈ E and we immediately obtain that P = = P Ω and { P = } = maxent E = minloss B .

We shall assume from now on that c > 0.

Proposition 32. For symmetric and inclusive g it holds that = { P } = { P Ω } and P Ω ( ν 1 n ) = c + 1 c | Ω n | and P Ω ( ν i n ) = 1 c | Ω n | for all n ∈ ℕ and all 1 ≤ I ≤n|.

Proof. For all n ≥ 2 and symmetric and inclusive gn it holds that P n ( ν 2 n ) = P n ( ν 2 n ) = P n ( ν 2 + i n ) for all 1 ≤ i ≤n| − 2 by [4] (Corollary 7, p. 3577). Thus, there exists some λn ≥ 0 such that P n ( ν 1 n ) = λ n and P n ( ν k n ) = 1 λ n | Ω n | 1 for all 2 ≤ k ≤n|.

For all n ∈ ℕ, now define a function P1 [ E n ] by P 1 ( ν 1 n ) : = 1. Then, define a convex combination of the equivocator on E n and P1 by P λ n : = λ n P 1 + ( 1 λ n ) P = n.. Recall that gn is equivocator-preserving (Proposition 7) and that H g n is strictly concave on ℙn (Lemma 1). Thus, H g n ( P λ n ) > H g n ( P λ n ) for all 0 λ n < λ n 1.

On the one hand g-entropy strictly increases with decreasing λn on the other hand P n [ E ] imposes the constraint P n ( ν 1 n ) c.. Let N ∈ ℕ be minimal with | Ω N | > 1 c Then for all n ≥ N it holds that P n ( ν 1 n ) c. and P n ( ν 2 n ) = P n ( ν 2 + i n ) = 1 c | Ω n | 1 for all 1 ≤ i ≤ |Ωn| −2.

For all r ≥ N it follows that

P r ( ν 2 N ) = P n ( ν 2 + i r ) | Ω n | | Ω N | = 1 c | Ω N | | Ω r | 1 | Ω N | P r ( ν 1 N ) = 1 ( | Ω N | 1 ) P r ( ν 2 N ) .

Thus, for all r ≥ N we find

P ( ν 2 r ) = lim n P n ( ν 2 r ) = 1 c | Ω r | P ( ν 1 r ) = lim n P n ( ν 2 r ) = 1 ( | Ω r | 1 ) P ( ν 2 r ) = | Ω r | ( | Ω r | 1 ) ( 1 c ) | Ω r | = c + 1 c | Ω r | = c + P ( ν 2 r ) .

Thus, for all n ∈ ℕ P ( ν 1 n ) = c + 1 c | Ω n | and P ( ν 2 n ) = 1 c | Ω n |.

We now show that P is indeed a probability function. We need to show that ν Ω n + 1 ν ω P ( ν ) = P ( ω ) for all n ∈ ℕ and all ω ∈ Ωn:

P ( ν i n ) = 1 c | Ω n | = | Ω n + 1 | 1 c | Ω n | | Ω n + 1 | = | Ω n + 1 | | Ω n | P ( ν i n + 1 ) for all 2 i | Ω n | P ( ν 1 n ) = c + 1 c | Ω n | = c + 1 c | Ω n + 1 | + ( | Ω n + 1 | | Ω n | 1 ) 1 c | Ω n + 1 | = P ( ν 1 n + 1 ) + ( | Ω n + 1 | | Ω n | 1 ) P ( ν 2 n + 1 ) .

Finally, observe that n = arg sup P E n H Ω n ( P ). Hence, = { P Ω }. □

Proposition 33. If g = gΩ or if g is regular, then maxent E = { P Ω }.

Proof. Let Q E \ { P Ω }. For regular g, it suffices to show that there exists an N ∈ ℕ such that for all n N H g n ( Q ) < H g n ( P Ω ) holds.

Since Q P Ω there has to exist a minimal N ∈ ℕ and an N-state ω′ ∈ Ω N \ { ν 1 N } such that Q ( ω ) P Ω ( ω ).

Now define a function Q′: S → [0, 1] by requiring that Q′ respects logical equivalence, Q and Q′ agree on SN,

  • Q ( ν ) : = | Ω N | | Ω n | Q ( ω )Q(ω0) for all n > N all ν′ ∈ Ωn with ν′ ⊨ ω′,

  • Q ( ν 1 n ) : = Q ( ν 1 n ) for all n > N and

  • Q ( ν ) : = 1 Q ( ν 1 n ) Q ( ω ) | Ω n | 1 | Ω n | | Ω N | for all n > N and all ν ∈ Ωn \ { ν 1 n }with νω

In general, Q′ is not a probability function because

Q ( ν 1 n ) < ω Ω n + 1 ω ν 1 n Q ( ω ) .

Note that for all n N H Ω n ( Q ) H Ω n ( Q ) holds.

We now show that for all large enough n H Ω n ( Q ) < H Ω n ( P Ω ) holds. Let us first compute

H Ω n ( Q ) = Q ( ν 1 n ) log ( Q ( ν 1 n ) ) + ( | Ω n | | Ω N | | Ω N | | Ω n | Q ( ω ) ) log Q ( ω ) | Ω N | | Ω n | + ( 1 Q ( ν 1 n ) Q ( ω ) ) log 1 Q ( ν 1 n ) Q ( ω ) | Ω n | 1 | Ω n | | Ω N | = Q ( ν 1 n ) log ( Q ( ν 1 n ) ) + Q ( ω ) ( log ( Q ( ω ) ) + log ( | Ω N | | Ω n | ) ) = + ( 1 Q ( ν 1 n ) Q ( ω ) ) ( log ( 1 Q ( ν 1 n ) Q ( ω ) | Ω N | | Ω N | | Ω n | 1 ) + log ( | Ω N | | Ω n | ) ) + ( 1 Q ( ν 1 n ) Q ( ω ) ) log ( 1 Q ( ν 1 n ) Q ( ω ) | Ω N | | Ω N | | Ω n | 1 ) .

Since

H Ω n ( P Ω ) = ( c + 1 c | Ω n | ) log ( c + 1 c | Ω n | ) ( | Ω n | 1 ) 1 c | Ω n | log ( 1 c | Ω n | )
we now find with lim n Q ( v 1 n ) = c that
lim n H Ω n ( P Ω ) H Ω n ( Q ) = c log ( c ) + lim n ( ( | Ω n | 1 ) 1 c | Ω n | log ( 1 c | Ω n | ) + Q ( ν 1 n ) log ( Q ( ν 1 n ) ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 Q ( ν 1 n ) ) ( log ( | Ω N | | Ω n | ) ) + ( 1 Q ( ν 1 n ) Q ( ω ) ) ( log ( 1 Q ( ν 1 n ) Q ( ω ) | Ω N | | Ω N | | Ω n | 1 ) ) ) = c log ( c ) ( 1 c ) log ( 1 c ) + lim n ( ( 1 c ) ( | Ω N | 1 ) | Ω n | log ( | Ω n | ) + c log ( c ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 Q ( ν 1 n ) ) ( log ( | Ω N | | Ω n | ) ) ) + ( 1 c Q ( ω ) ) ( log ( 1 c Q ( ω ) | Ω N | 1 ) ) = ( 1 c ) log ( 1 c ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 c Q ( ω ) ) ( log ( 1 c Q ( ω ) | Ω N | 1 ) ) + lim n ( ( 1 c ) ( | Ω n | 1 ) | Ω n | log ( | Ω n | ) + ( 1 Q ( ν 1 n ) ) ( log ( | Ω N | ) log ( | Ω n | ) ) ) = ( 1 c ) log ( 1 c | Ω N | ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 c Q ( ω ) ) ( log ( 1 c Q ( ω ) | Ω N | 1 ) ) + lim n ( 1 c ) ( | Ω n | 1 ) | Ω n | log ( | Ω n | ) ( 1 Q ( ν 1 n ) ) log ( | Ω n | ) Q ( ν n 1 ) c ( 1 c ) log ( 1 c | Ω N | ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 c Q ( ω ) ) ( log ( 1 c Q ( ω ) | Ω N | 1 ) ) + lim n ( 1 c ) ( | Ω n | 1 ) | Ω n | log ( | Ω n | ) ( 1 c ) log ( | Ω n | ) = ( 1 c ) log ( 1 c | Ω N | ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 c Q ( ω ) ) ( log 1 c Q ( ω ) | Ω N | 1 ) ) + ( 1 c ) lim n ( | Ω n | 1 | Ω n | 1 ) log ( | Ω n | ) = ( 1 c ) log ( 1 c | Ω N | ) + Q ( ω ) log ( Q ( ω ) ) + ( 1 c Q ( ω ) ) ( log ( 1 c Q ( ω ) | Ω N | 1 ) ) .

Since Q ( ω ) 1 c | Ω N | there exists some ϵ > 0 such that for all large enough n

H Ω n ( P Ω ) H Ω n ( Q ) > ϵ > 0.

This establishes the result for g = gΩ.

We now turn to regular g.

H g n ( P Ω ) H g n ( Q ) H Ω n ( P Ω ) H Ω n ( Q ) π Π n \ { π n } g ( π ) f π ° Q ( F ) log ° Q ( F ) H Ω n ( P Ω ) H Ω n ( Q ) π Π n \ { π n } g ( π ) f π ° Q ( F ) log ° Q ( F ) .

The last sum goes to zero since g is regular, Corollary 6. Eventually, H Ω n ( P Ω ) H Ω n ( Q ) is greater some ϵ > 0 as we established in the first part of the proof. Thus, for all large enough n ∈ ℕ and all Q E \ { P } we have

H g n ( P Ω ) H g n ( Q ) > 0.

Lemma 12. The following three conditions are equivalent for all large enough n ∈ ℕ and inclusive and symmetric g

  • P ϵ arg sup P E S g n ( P , P Ω )

  • P ( ν 1 n ) = c

  • P arg sup P E S Ω n ( P , P Ω )

Proof. Note that for all P ∈ ℙ

S g n ( P , P Ω ) = π Π n g ( π ) F π ° P ( F ) log ° P Ω ( F ) = ν Ω n P ( ν ) F Ω n ν F γ n ( F ) log ° P Ω ( F ) = P ( ν 1 ) ( γ n ( ν 1 ) log P Ω ( ν 1 ) + ν 1 F | F | 2 F Ω n γ n ( F ) log ° P Ω ( F ) ) + i = 2 | Ω n | P ( ν i ) ( γ n ( ν i ) log P Ω ( ν i ) + ν i F | F | 2 F Ω n γ n ( F ) log ° P Ω ( F ) ) .

The term between the last set of brackets () does not depend on i. So, S g n ( P , P Ω ) only depends on P(ν1) but not on how P distributes probabilities among the other n-states.

For large enough N ∈ ℕ it holds that P Ω ( ν 1 ) > P Ω ( ν 2 ) = P Ω ( ν i ) for all 3 ≤ i ≤ |Ωn|.

Since g is symmetric, γn(F) is only a function of the size of F, |F|, it follows that every P arg sup P E S g n ( P , P Ω ) assigns as little probability as possible to ν1. Since we require that P E it follows that P′(ν1) = c.

The result for S Ω n follows as above by noting that for g = gΩ it holds that γn(ν) = 1 for all n-states ν ∈ Ωn and γn(F) = 0 otherwise. □

Adapting Joyce’s notion of truth-directedness [14] we define:

Definition 26 (Chance-directed scoring rule). A function Ff: [0, 1] × [0, 1] → [0, +∞] of the form Ff (x, y) = x · f(y) + (1 − x) · f(1 − y) is called chance-directed, if and only if for all x ∈ [0, 1], all 0 ≤ λ < 1 and all y ∈ [0, 1] \ {x}

F f ( x , y ) = x f ( y ) + ( 1 x ) f ( 1 y ) > x f ( ( 1 λ ) x + λ y ) + ( 1 x ) f ( 1 ( 1 λ ) x λ y ) = F f ( x , ( 1 λ ) x + λ y )
holds. For a scoring rule Ff this formalises the idea that beliefs which are closer to the chances on two mutually exclusive and exhaustive events are strictly better scored.

In particular, Ff(x, y) = −x log y − (1 − x) log(1 − y) is chance-directed. The score improves by simultaneously moving y closer to x and 1 − y closer to 1 − x.

Proposition 34. If g is regular, then all B ∈ minloss B agree with P Ω on .

Proof. If c = 1, then | E | = 1 and maxent E = { P Ω } follows trivially. By Theorem 5 we have that for every function B arg inf B B sup P E S g n ( P , B ) it holds that B n = P Ω n . Thus, all B ∈ minloss B agree with P Ω on .

We now focus on 0 < c < 1.

From the above lemma we obtain

sup P E S Ω n ( P , P Ω ) = c log ( c + 1 c | Ω n | ) ( 1 c ) log ( 1 c | Ω n | ) .

We now follow the structure of the proof of Proposition 16 for fixed 0 < c < 1. Let B ∈ minloss B .

Case1 B \ { P Ω }.

Case1A B [ E ] \ { P Ω }.

If there exists an n ∈ ℕ such that B ( ν 1 n ) > P Ω ( ν 1 n ), then ν Ω n \ { ν 1 n } B ( ν ) < ν Ω n \ { ν 1 n } P Ω ( ν ). If there exists an m ∈ ℕ such that B ( ν 1 m ) > P Ω ( ν 1 m ), then there has to exist some k > m such that

ν Ω k \ { ν 1 k } ν ν 1 k 1 B ( ν ) < ν Ω k \ { ν 1 k } ν ν 1 k 1 P Ω ( ν ) .

Since B P Ω either such an n ∈ ℕ or such a k ∈ ℕ has to exist, possibly both exist. Overall, there has to exist some N ∈ ℕ, a ν N Ω N \ { ν 1 N } and an ϵ > 0 such that B ( ν N ) + ϵ = P Ω ( ν N ).

For large enough n ∈ ℕ, depending on B, and c, it holds that

sup P E S Ω n ( P , B ) c log B ( ν 1 n ) ( 1 c ) log ( B ( ν N ) | Ω N | | Ω n | ) > c log B ( ν 1 n ) ( 1 c ) log ( ( B ( ν N ) + ϵ 2 ) | Ω N | | Ω n | ) sup P E S Ω n ( P , P Ω ) = c log ( c + 1 c | Ω n | ) ( 1 c ) log ( P Ω ( ν N ) | Ω N | | Ω n | ) .

Since we may assume that B ( ν 1 n ) converges in n to c B E we now find

lim n sup P E S Ω n ( P , B ) sup P E S Ω n ( P , P Ω ) 1 c lim n log ( B ( ν N ) | Ω N | | Ω n | ) + log ( P Ω ( ν N ) | Ω N | | Ω n | ) > log ( B ( ν N ) + ϵ 2 ) + log P Ω ( ν N ) > 0.

Whether this limit exists or not, we have thus established that for large enough n ∈ ℕ there exists a lower bound of the sequence

( sup P E S Ω n ( P , B ) sup P E S Ω n ( P , P Ω ) ) n
which is strictly positive, since we take N ∈ ℕ to be fixed here.

For all fixed n ∈ ℕ let P n E be such that P n ( ω 1 n ) : = c and P n ( ω 2 n ) : = 1 c. Note that P n arg sup P E S g n ( P , P Ω ) for all large enough n and P n arg sup P E S Ω n ( P , P Ω ) for all large enough n, Lemma 12.

To simplify notation let R n : = π Π n \ { π n } g ( π ) F π ° P n ( F ) log ° P Ω ( F ). With this notation we have for all large enough n ∈ ℕ

0 R n = π Π n \ { π n } g ( π ) F π ° P n ( F ) log ° P Ω ( F ) π Π n \ { π n } g ( π ) F π ° P n ( F ) log 1 c | Ω n | = π Π n \ { π n } g ( π ) log 1 c | Ω n | = ( log ( | Ω n | ) log ( 1 c ) ) π Π n \ { π n } g ( π ) .

By our standing assumption on g (regularity), we obtain that Rn converges to zero. We now find

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) = sup P E S g n ( P , B ) g ( π n ) S Ω n ( P n , P Ω ) R n g ( π n ) ( S Ω n ( P n , B ) S Ω n ( P n , P Ω ) ) R n .

Because g(πn) is bounded and Rn converges to zero, we obtain for all large enough n ∈ ℕ that

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) > 0.

Case1B B [ ] \ [ E ].

Case1Bi lim n B ( ν 1 n ) > c.

Let us first note that this limit has to exist, because B ( ν 1 n ) is a (not necessarily strictly) decreasing sequence bounded from below by c. Let b 1 : = lim n B ( ν 1 n ) > c.

Note that there has to exist some N ∈ ℕ such that for all nN it holds that B ( ν 1 n ) > P Ω ( ν 1 n ). For all n ≥ N there has to exist some ν Ω n \ { ν 1 n } such that B ( ν ) < P Ω ( ν ). Then, for all nN

1 g ( π n ) sup P E S g n ( P , B ) c log B ( ν 1 n ) ( 1 c ) log 1 B ( ν 1 n ) | Ω n | 1 = c log B ( ν 1 n ) ( 1 c ) ( log ( 1 B ( ν 1 n ) ) + log 1 | Ω n | 1 ) > c log ( c + b 1 c 2 ) ( 1 c ) log ( 1 c b 1 c 2 ) + ( 1 c ) log 1 | Ω n | 1 = c log ( c + b 1 + c 2 ) ( 1 c ) log 1 c b 1 c 2 | Ω n | 1 = c log ( b 1 + c 2 ) ( 1 c ) log 1 b 1 + c 2 | Ω n | 1 ,
where the strict inequality follows from chance-directedness. We now find
lim n sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) > lim n g ( π n ) ( c log ( b 1 + c 2 ) ( 1 c ) log ( 1 b 1 + c 2 | Ω n | 1 ) + c log ( c + 1 c | Ω n | ) + ( 1 c ) log ( 1 c | Ω n | ) ) R n = lim n g ( π n ) ( c log ( b 1 + c 2 ) ( 1 c ) log ( 1 b 1 + c 2 | Ω n | 1 | Ω n | ) + c log ( c + 1 c | Ω n | ) + ( 1 c ) log ( 1 c ) ) = lim n g ( π n ) ( c log ( b 1 + c 2 ) ( 1 c ) log ( 1 b 1 + c 2 ) + c log ( c + 1 c | Ω n | ) + ( 1 c ) log ( 1 c ) ) = ( lim n g ( π n ) ) ( c log ( b 1 + c 2 ) ( 1 c ) log ( 1 b 1 + c 2 ) + c log ( c ) + ( 1 c ) log ( 1 c ) ) > 0 ,
where the last line follows from the fact that the standard logarithmic scoring rules is strictly proper, i.e., Equation (11) holds.

Case1Bii lim n B ( ν 1 n ) < c.

Let b 2 : = lim n B ( ν 1 n ) < c, b2 exists for the same reasons b1 exists. Note that there has to exist some N ∈ ℕ such that for all n ≥ N it holds that B ( ν 1 n ) < b 2 + c b 2 2 < c < P Ω ( ν 1 n ). Using chance-directedness we find for all n ≥ N

1 g ( π n ) sup P E S g n ( P , B ) c log B ( ν 1 n ) ( 1 c ) log 1 B ( ν 1 n ) | Ω n | 1 > c log ( c + b 2 c 2 ) ( 1 c ) log 1 c b 2 c 2 | Ω n | 1 = c log ( b 2 c 2 ) ( 1 c ) log 1 b 2 c 2 | Ω n | 1 .

Now proceed as in Case1Bi.

Case2 B B \ and B respects logical equivalence on .

Case2A There exists a P B such that for all n ∈ ℕ and all F ⊆ Ωn it holds that °B(F) ≤ °PB(F).

Since B there has to exists an N ∈ ℕ and an F′ ∈ ΩN such that °B(F′) < °PB(F′).

Case2Ai P B = P Ω and no other P is such that °B(F) ≤ °P (F) for all n and all F ⊆ Ωn. Follows as does Case2Ai in Proposition 16.

Case2Aii There exists a P B such that P B P Ω .

Then for all n ≥ N and all P ∈ [ E ] it holds that S g n ( P , B ) S g n ( P , P B ) 0. For all large enough n ∈ ℕ it holds by Case1 that sup P E S g n ( P , P B ) sup P E S g n ( P , P Ω ) > 0. Thus,

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) sup P E S g n ( P , P B ) sup P E S g n ( P , P Ω ) > 0 .

Case2B There does not exist a P B such that for all n ∈ ℕ and all F ⊆ Ωn it holds that °B(F) ≤ °PB(F).

As in Case2B in Proposition 16 we obtain that there has to exist an α > 0 and a N ∈ ℕ such that for all n ≥ N it holds that ω Ω n B ( ω ) 1 α.

We have for n ≥ N that

sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) = sup P E S g n ( P , B ) g ( π n ) S Ω n ( P n , P Ω ) R n g ( π n ) ( S Ω n ( P n , B ) S Ω n ( P n , P Ω ) ) R n .

To complete the proof we will now show that there exists some β > 0, which depends on E and g but does not depend on the particular n ≥ N, such that S Ω n ( P n , B ) S Ω n ( P n , P Ω ) > β > 0. Since g(πn) is bounded, we then obtain that sup P E S g n ( P , B ) sup P E S g n ( P , P Ω ) > 0 for all large enough n ∈ ℕ.

We show that for all large enough n ∈ ℕ that

ω Ω n P n ( ω ) log f ( ω ) c log ( c + 1 c | Ω n | ) ( 1 c ) log ( 1 c | Ω n | ) β
for all functions f : Ωn [0, 1] such that ω Ω n f ( ω ) 1 α.

The minimum obtains, if and only if f ( ω ) = ( 1 α ) P n ( ω ) for all ω ∈ Ωn as we saw in Proposition 16. Thus, the minimum obtains for f ( ν 1 n ) = ( 1 α ) ( c + 1 c | Ω n | ) and f ( ν i n ) = ( 1 α ) 1 c | Ω n | for all other ν i n Ω n. Let us now compute

ω Ω n P n ( ω ) log f ( ω ) = c log ( ( 1 α ) ( c + 1 c | Ω n | ) ) ( 1 c ) log ( ( 1 c ) ( 1 α ) | Ω n