This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/.)
The purpose of this paper is to look at some existing methods of semantic information quantification and suggest some alternatives. It begins with an outline of Bar-Hillel and Carnap's theory of semantic information before going on to look at Floridi's theory of strongly semantic information. The latter then serves to initiate an in-depth investigation into the idea of utilising the notion of truthlikeness to quantify semantic information. Firstly, a couple of approaches to measure truthlikeness are drawn from the literature and explored, with a focus on their applicability to semantic information quantification. Secondly, a similar but new approach to measure truthlikeness/information is presented and some supplementary points are made.
information quantificationsemantic informationBar-HillelCarnapFloridiTruthlikenessIntroduction
The term ‘information’ can mean many things. Here information is understood as factual semantic content, where if s is an instance of semantic content then:
s consists of one or more data
the data in s are well-formed
the well-formed data in s are meaningful
‘Factual’ simply means that the semantic content is about some state of affairs associated with a fact. So entries in Encyclopedias, road signs and maps are some examples of semantic information. Furthermore, such structured data count as pieces of semantic information relative to systems or languages in which they are well-formed and meaningful. For more on this standard definition see [1].
It is important to emphasise that the term ‘information’ is a polysemantic one, applied to a range of phenomena across a range of disciplines. The semantic conception of information used in this paper is but one of many conceptions, drawn from the philosophy of information field. For some alternative views on and surveys of information, including criticism of the view that information is a kind of data, see [2], [3] and [4].
In this paper we consider only semantic information represented by propositions [5], and the inspected approaches can be seen to fall under the logical approach to semantic information theory [4] (p. 315). Hence, semantic information is understood in terms of propositions and throughout this paper represented using statements in a basic propositional logic (i.e., the information that p). Finally, as will become clearer, a veridicality thesis for semantic information is endorsed, where if i counts as an instance of semantic information then it is semantic content that is also true [6].
So information is carried by statements/propositions and judgements such as ‘statement A yields more information or is more informative than statement B’ are naturally made [7]. The motivations for and the aims of this enterprise to develop ways to quantify semantic information can be straightforwardly appreciated. The following cases and considerations will serve to illustrate this:
Imagine a situation in which a six-sided die is rolled and it lands on 4. Given the following collection of statements describing the outcome of the roll, which is the most informative? Which is the least informative?
The die landed on 1
The die landed on 1 or 2 or 4
The die landed on 4
The die landed on 4 or 5
The die did not land on 4
A formal framework for the quantification of semantic information could assign a numerical measure to each statement and rank accordingly.
Take a simple domain of inquiry, involving the following two questions:
Is Berlin the capital city of Germany? (A)
Is Paris the capital city of France? (B)
Now, there is a coherent sense in which the following ranking of answer statements in terms of highest to lowest informativeness is right:
A ∧ B
A
A ∨ B
¬A ∧ ¬B
A ∧ B precisely describes the truth of the domain, it gives the correct answer to both questions. A is also true but only provides information about one of the items, giving the correct answer to the first question. A ∨ B only tells us that the answer to at least one of the questions is ‘yes’, but not which one. ¬A ∧ ¬B on the other hand is false and would provide an incorrect answer to both questions.
What is a suitable formal framework for the quantification of semantic information that can rigorously capture these intuitions and provide a way to measure the complete range of statements within a domain of inquiry?
Take a more general example of a logical database consisting of a collection of facts. How can the information content of the database be measured? Why is one database more informative than another?
Information is often treated as a commodity and one factor that will determine the value of a piece of information is going to be its quantitative measure. A formal method to measure the information yield of a statement could therefore facilitate its informational valuation.
Thus motivations and aims for the task of this paper are clear.
Bar-Hillel and Carnap's Theory of Semantic Information
Bar-Hillel and Carnap's seminal account of semantic information [8,9] measures the information yield of a statement within a given language in terms of the set of possible states it rules out and a logical probability space over those states.
The general idea is based on the Inverse Relationship Principle, according to which the amount of information associated with a proposition is inversely related to the probability associated with that proposition. This account will henceforth be referred to as the theory of Classical Semantic Information (CSI) [10]. Using some a priori logical probability measure pr on the space of possible states, two measures of information (cont and inf) are provided, such that:
cont(A)=df1−pr(A)and
inf(A)=df−log2(pr(A))
In order to demonstrate these definitions, take a very simple propositional logical space consisting of three atoms [11]. The space concerns a weather framework, where the three properties of a situation being considered are whether or not it will be (1) hot (h), (2) rainy (r) and (3) windy (w) [12]. Since there are three variable propositions involved, there are eight logically possible ways things can be, there are eight possible states. Here is a truth table depicting this:
State
h
r
w
w_{1}
T
T
T
w_{2}
T
T
F
w_{3}
T
F
T
w_{4}
T
F
F
w_{5}
F
T
T
w_{6}
F
T
F
w_{7}
F
F
T
w_{8}
F
F
F
Now, each possible state can be represented using a state description, a conjunction of atomic statements consisting of each atom in the logical space or its negation, but never both. For example, the state description for w_{1} is h ∧ r ∧ w and the state description for w_{8} is ¬h ∧ ¬r ∧ ¬w. For the sake of example, the probability distribution used will be a simple uniform one; that is, since there are eight possible states, each state has a probability of
18 that it will obtain or be actual [13]. Using these parameters, here are some results:
Statement (A)
cont(A)
inf(A)
h ∧ w ∧ r
0.875
3
¬h ∧ ¬w ∧ ¬r
0.875
3
h ∧ r
0.75
2
h ∨ w
0.25
0.415
h ∨ ¬h
0
0
h ∧ ¬h
1
∞
Note that with the formulas for cont() and inf(), it is clear that the following relationships hold between logical entailment and semantic information, simply due to the fact that A ⊢ B ⇒ pr(A) ≤ pr(B):
A⊢B⇒cont(A)≥cont(B)A⊢B⇒inf(A)≥inf(B)
Bar-Hillel and Carnap also provided some formulas to measure relative information, amongst them:
cont(A∣B)=dfcont(A ∧B)−cont(B)and
inf(A∣B)=dfinf(A ∧B)−inf(B)
Given these definitions, it so happens that cont(A∣B) = cont(B ⊃ A) and inf(A∣B) = −log_{2}(pr(A∣B)).
It is worth briefly noting that Bar-Hillel and Carnap's account was subsequently expanded upon by Jaakko Hintikka. Amongst the contributions made by Hintikka are:
Further investigation into varieties of relative information.
Extension of the CSI account to full polyadic first-order logic, making use of his distributive normal forms.
Introduction of his distinction between surface information and depth information. The latter is defined similarly to the approach used by Bar-Hillel and Carnap. The former is used to address the ‘scandal of deduction’: according to Bar-Hillel and Carnap's account logical truths or proofs of logical validity give no information. In order to accommodate a sense in which they do give information, Hintikka developed a way to measure the information yield of deductive inferences.
Readers further interested in Hintikka's contributions are advised to consult [14–16].
Some Comments on the Theory of Classical Semantic Information
Does the CSI account overall provide an acceptable measure of semantic information, in line with the motivations and aims discussed in the introduction? To a certain extent its results accord with our intuitions and are expected. True state descriptions yield much information; a statement which narrows things down to just one of the possible states yields a lot of information. By contrast, if a statement tells us that it will be hot and rainy but tells us nothing about whether or not it will be windy (h ∧ r), it would yield less information, as the calculations agree. Further still, a statement which only says that it will be hot or windy (h ∨ w) yields relatively little information. Finally, a tautological statement, which gives no reduction of the original eight possibilities, appropriately yields no information. Despite these results, there are significant cons that indicate the inadequacy of the CSI account in accommodating certain criteria associated with the type of quantitative account of semantic information that is in line with present motivations and aims.
The most prominent issue concerns its assignment of maximal information yield to contradictions, what has elsewhere been termed the Bar-Hillel-Carnap Paradox [1]. Can no non-contradiction provide more information than a contradiction? Surely this is not the case. Furthermore, do contradictions yield any information at all? These questions will be discussed shortly.
Bar-Hillel and Carnap provide the following commentary on the matter:
It might perhaps, at first, seem strange that a self-contradictory sentence, hence one which no ideal receiver would accept, is regarded as carrying with it the most inclusive information. It should, however, be emphasised that semantic information is here not meant as implying truth. A false sentence which happens to say much is thereby highly informative in our sense. Whether the information it carries is true or false, scientifically valuable or not, and so forth, does not concern us. A self-contradictory sentence asserts too much; it is too informative to be true [8] (p. 229).
No doubt the notion of information expounded in the above passage is at odds with the ordinary sense of information as discussed in the introduction. It is fair to say that the more a sentence says the more informative it is. But when does a sentence ‘say much’? We intuitively judge the statement A ∧ B to say more than the statement A ∨ B not just because it is less probable or excludes more possible states, but also because it does a better, more detailed job of describing how presumably things are. For any two true statements A and B such that pr(A) < pr(B), it is fair to say that A yields more information than B. On the other hand, not only is A ∧ ¬A false, but it does not at all do a good job of describing how things presumably are or could be. It does not discriminate and selectively narrow down on a potential state of affairs (unless a contradiction does actually occur!).
Further to this issue, the CSI account's indifference to truth and falsity means that it cannot distinguish between true and false statements with the same probability. If it actually is hot, rainy and windy (i.e., h ∧ r ∧ w is the true state description), then in the sense of information we are interested in, the statement h ∧ r ∧ w yields more information than the statement ¬h ∧ ¬r ∧ ¬w. Even the statement h ∧ r, which has a higher probability, yields more information than ¬h ∧ ¬r ∧ ¬w, since it contains more truth. These considerations suggest that at the least semantic information is not just about content and should be meant as implying truth.
Floridi's Theory of Strongly Semantic Information
In part prompted by these considerations, Luciano Floridi has developed a Theory of Strongly Semantic Information (TSSI), which differs fundamentally to CSI. It is termed ‘strongly semantic’ in contrast to Bar-Hillel and Carnap's ‘weakly semantic’ theory because unlike the latter, where truth values do not play a role, with the former semantic information encapsulates truth.
Floridi's basic idea is that the more accurately a statement corresponds to the way things actually are, the more information it yields. Thus information is tied in with the notion of truthlikeness. If a statement perfectly corresponds to the way things actually are, if it completely describes the truth of a domain of inquiry, then it yields the most information. Then there are two extremes. On the one hand, if a statement is necessarily true in virtue of it being a tautology, it yields no information. On the other, if a statement is necessarily false in virtue of it being a contradiction, it also yields no information. Between these two extremes and perfect correspondence there are contingently true statements and contingently false statements, which have varying degrees of information yield.
Using the weather framework introduced above, let the actual state be w_{1}. The following ranking of propositions, from highest information yield to lowest information yield illustrates this idea:
h ∧ r ∧ w
h ∧ r
¬h ∧ ¬r
¬h ∧ ¬r ∧ ¬w
h ∧ ¬h, h ∨ ¬h
It is time to take a look in more detail at Floridi's account as given in [17]. The informativeness of each statement s [18] is evaluated as a function of (1) the truth value of s and (2) the degree of semantic deviation (degree of discrepancy) between s and the actual situation w. This degree of discrepancy from the actual situation is measured by a function f, which takes the statement as input and outputs some value in the interval [-1, 1]. This allows for the expression of both positive (when s is true) and negative (when s is false) degrees of discrepancy. Basically, the more a statement deviates from 0, the less informative it is.
Floridi stipulates five conditions, that “any feasible and satisfactory metric will have to satisfy” [17] (p. 206):
Condition 1. For a true s that conforms most precisely and accurately to the actual situation w, f(s) = 0.
Condition 2. For an s that is made true by every situation (i.e., a tautology), f(s) = 1.
Condition 3. For an s that is made true in no situation (i.e., a contradiction), f(s) = −1.
Condition 4. For a contingently false s, −1 < f(s) < 0 [19].
Condition 5. For a contingently true s that is also made true by situations other than the actual one (so does not conform to w with the highest degree of precision), 0 < f(s) < 1.
For cases that fall under the fourth condition, f measures degree of inaccuracy and is calculated as the negative ratio between the number of false atomic statements (e) in the given statement s and the length (l) of s.
Inaccuracy:f(s)=−e(s)l(s)
To get values for e and l, it seems that some simplifying assumptions need to be made about a statement's form. Statements to be analysed in this way are in conjunctive normal form, with each conjunct being an atomic statement. e is the number of false conjuncts and l is the total number of conjuncts. This point will be discussed in more detail shortly.
For cases that fall under the fifth condition, f measures degree of vacuity and is calculated as the ratio between the number of situations, including the actual situation, with which s is consistent (n) and the total number of possible situations (S) or states.
V acuity:f(s)=n(s)S
Of the inaccuracy metric Floridi writes, “[it] allows us to partition s = s^{l} into l disjoint classes of inaccurate s {Inac_{1}, …, Inac_{l}} and map each class to its corresponding degree of inaccuracy” [17] (p. 208).
The model Floridi uses to illustrate his account, denoted E, has two predicates (G and H) and three objects (a, b and c). So in all there is a set of 64 possible states W = w_{1}, …, w_{64}. An application of the inaccuracy metric to the model E is presented in Table 1 (degree of informativeness calculations will be explained below).
For vacuous statements, Floridi writes “[the vacuity metric] and the previous method of semantic weakening allow us to partition s into l − 1 disjoint classes Vac = {Vac_{1}, …, Vac _{l}_{−1}}, and map each class to its corresponding degree of vacuity” [17] (p. 209). The semantic weakening method referred to consists of generating a set of statements by the following process. In this case, the number of all atomic propositions to be used in a statement is 6. Start with a statement consisting of 5 disjunctions, such as
Ga ∨Gb ∨Gc ∨Ha ∨Hb ∨Hc
This is the weakest type of statement and corresponds to Vac_{1}. Next, replace one of the disjunctions with a conjunction, to get something like
(Ga ∨Gb ∨Gc ∨Ha ∨Hb) ∧Hc
This is the second weakest type of statement and corresponds to Vac_{2}. Continue this generation process until only one disjunction remains. Table 2 summarises an application of this process to the model E:
Say that the actual situation corresponds to the state description Ga ∧ Ha ∧ Gb ∧ Hb ∧ Gc ∧ Hc. Table 3 summarises an example member of each class:
With a way to calculate degrees of vacuity and inaccuracy at hand, Floridi then provides a straightforward way to calculate degrees of informativeness (g), by using the following formula, where once again f stands for the degree of vacuity/inaccuracy function:
g(s)=1−f(s)2
Furthermore, this is extended and an extra way to measure amounts of semantic information is provided. As this extension is simply derivative and not essential, I will not go into it here. Suffice it to say, naturally, the higher the informativeness of s, the larger the quantity of semantic information it contains; the lower the informativeness of s, the smaller the quantity of semantic information it contains. To calculate this quantity of semantic information contained in s relative to g(s), Floridi makes use of integrals and the area delimited by the equation given for degree of informativeness.
Some Comments on Floridi's Theory
It will be evident to the reader that the classes of inaccuracy and vacuity presented by Floridi are not comprehensive in that they do not accommodate the full range of statements which could be constructed in the logical space of E. Once again, say that the actual situation corresponds to Ga ∧ Ha ∧ Gb ∧ Hb ∧ Gc ∧ Hc.
Take the false statement Ga ∧ Ha ∧ Gb ∧ ¬Hb ∧ ¬Gc, consisting of 5 conjoined atoms, 2 of which are false. A simple extension of Floridi's method for dealing with inaccuracy would naturally result in several other classes and the degree of inaccuracy of this statement would be
−25.
Or take the following false statement:
s=(Ga ∨Ha) ∧Gb ∧¬Hb ∧(¬Gc ∨¬Hc)
How should the formula given for inaccuracy be applied here? There is no clear-cut way to determine the values for e and l going by Floridi's description of the formula.
As for the possible classes of vacuity in E, it is clear that beyond those listed in above table, for any x such that 2 ≤ x ≤ 63, it is possible to construct a statement such that the degree of vacuity is
x64.
In total the classes given by Floridi deal with 14 different types of propositions: 1 (true state description) + 1 (tautologies) + 1 (contradictions) + 6 (classes of inaccuracy) + 5 (classes of vacuity). Since there are 64 possible states in the space E, there are 2^{64} different propositions (propositions being satisfied by a set of states, so 2^{64} different sets of states). With an aim of developing a system that can deal with the complete range of propositions in a given logical space, we must look further.
Another aspect of Floridi's system which draws consideration is a certain asymmetry between the metrics for false statements and true statements. False statements are dealt with syntactically and true statements are dealt with by analysing their models. Might there be one approach that deals appropriately with both? A consequence of this separation between the false metric and true metric is that the same numerical value might be given to both a false statement and a true statement with otherwise different information measures. For example, take the following two statements:
(A ∧ B) ∨ (B ∧ C) ∨ (C ∧ D) (true)
A ∧ B ∧ ¬C ∧ ¬D (false)
Relative to an actual state corresponding to the state description A ∧ B ∧ C ∧ D, both 1 and 2 are given informativeness measures of 0.75. Yet it would seem that the first, true statement yields more information.
The central idea behind Floridi's theory is right. Both false and true statements can deviate from the precise truth. For false statements, the more falsity and less truth they contain, the greater the deviation hence the less information. For true statements, the less precise they are the greater their deviation, hence the less information. However as we have just seen it falls short of providing rigorous metrics, which can deliver appropriate and relatively consistent measures for the complete class of propositions in a logical space.
Information Quantification via Truthlikeness
As we have seen so far, the CSI approach to quantifying semantic information does not stipulate a veridicality condition for information. Floridi's TSSI on the other hand holds that semantic information should be meant as implying truth. This paves the way for an alternative approach to quantifying semantic information; rather than measuring information in terms of probability, the information given by a statement is to be measured in terms of its truthlikeness. Discussion in Section 3.1 suggested however that Floridi's contribution in [17] is perhaps of more conceptual rather than technical value, as the metrics provided can make way for some more detailed and refined improvements.
At any rate, there already is a mass of formal work on truthlikeness to draw from [20]. Indeed the truthlikeness research enterprise has existed for at least a few decades now, far before any idea of utilising it to quantify semantic information. Whilst there are quite a few approaches to truthlikeness in the literature, to keep things simple and in some cases for certain technical reasons, I will look at two of them. The presentation given here of these two accounts is kept relatively simple and guided by the task at hand, which is to investigate their applicability to semantic information quantification. While seemingly a simple concept in essence, the notion of truthlikeness has resisted a straightforward formal characterisation. Over the last few decades of research, a variety of rival approaches have developed, each with their own pros and cons. In turn it follows that information quantification via truthlikeness measures is also not going to be a straightforward matter with a definitive account.
We start with a prominent and elegant approach to truthlikeness first proposed by Pavel Tichy and expanded upon by Graham Oddie [21] (p. 44). This approach is an example of approaches that measure the truthlikeness of a statement A by firstly calculating its distance from a statement T using some distance metric Δ, where T is a state description of the actual state (so T is in a sense the truth). Actually, the distance metric is ultimately a function that operates on states. Δ(w_{i}, w_{j}) measures the distance between states w_{i} and w_{j} and we use the notation Δ(A, T) in effect as shorthand for an operation that reduces to Δ operation on states corresponding to A and T.
The result of this distance calculation is then used to calculate the statement's truthlikeness (Tr); the greater this distance, the less truthlike the statement and vice versa. This inverse relation is simply achieved with the following formula:
Tr(A,T)=1−Δ(A,T)
To see this approach at work, consider again the canonical weather framework, with w_{1} being the actual state (throughout the remainder of this paper w_{1} is assumed to be the actual state in all examples):
Before continuing with some examples, the introduction of some terminology is in order. We firstly recall that a state description is a conjunction of atomic statements consisting of each atom in the logical space or its negation, but never both and that each state description corresponds to a state. In our example T = h ∧ r ∧ w.
A statement is in distributive normal form if it consists of a disjunction of the state descriptions of states in which it is true. For example, h ∧ r is true in states w_{1} and w_{2}, so its distributive normal form is (h ∧ r ∧ w) ∨ (h ∧ r ∧ ¬w). For notational convenience throughout the remainder of this paper, when listing a statement in its distributive normal form, state descriptions may be substituted by the states they correspond to. For example, h ∧ r may be represented with the term w_{1} ∨ w_{2}.
Now, take the possible states w_{1}, w_{8} and w_{5}, which correspond to the state descriptions h ∧ r ∧ w, ¬h ∧ ¬r ∧ ¬w and ¬h ∧ r ∧ w respectively. The difference between w_{1} and w_{8} is {h, r, w}; they differ in every atomic assertion. w_{5} and w_{1} on the other hand differ by only {h}. So ¬h ∧ r ∧ w is more truthlike than ¬h ∧ ¬r ∧ ¬w, because the distance between w_{5} and the actual state w_{1} is less than the distance between w_{8} and w_{1}.
The general formula to calculate the distance between A and T is:
Δ(A,T)=d|WA|where
Let w_{T} stand for the state that corresponds to T.
W_{A} stands for the set of states in which A is true.
A weight with value
1n is assigned to every atomic element, where n is the number of propositions in the logical space. So in our example,
weight(h)=weight(r)=weight(w)=13. This weight is used to calculate the value of the distance between two states, with each atomic assertion difference adding
1n. So
Δ(w5,w1)=13 and Δ(w_{8}, w_{1}) = 1.
d is the sum of atomic assertion differences between A and T. That is, the sum of Δ(w_{a}, w_{T}) for each w_{a} ∈ W_{A}.
So the statement ¬h∧r∧w (the state description for w_{5}) has a truthlikeness (information measurement for our purposes) of
23 and the statement ¬h ∧ ¬r ∧ ¬w (the state description for w_{8}) has a truthlikeness (information measurement) of 0.
This approach extends to the complete range of statements involving h, r and w. According to Formula 1, the distance of a statement from the truth is defined as the average distance between each of the states in which the state is true and the actual state. Take the statement h ∧ ¬r, which makes assertions about only 2 of the 3 atomic states. It is true in both w_{3} and w_{4}, or {h, ¬r, w} and {h, ¬r, ¬w} respectively. w_{3} has a distance of 0.33 and w_{4} has a distance of 0.67 from w_{1} so the average distance is
0.33+0.672=0.5. Note that henceforth the truthlikeness function (Tr) will be replaced by an information yield function (info), so that
info(A,T)=1−Δ(A,T)
Also, given that T is set as w_{1} throughout this paper, info(A, T) will generally be abbreviated to info(A).
Table 5 lists some results using this method. How do these results fare as measures of information yield? Statement #1 (complete truth) clearly yields the most information and #21 (complete falsity) clearly yields the least. #10, #11, #15 and #16 indicate that for a disjunctive statement, the more false constituents it has the less information it yields. In general, the more false constituents contained in a formula the more likely a decrease in information yield, as indicated by the difference between #5 and #9.
Statements #7, #17 and #21 make sense; the greater the number of false conjuncts a statement has, the less information it yields, with the upper bound being a statement consisting of nothing but false conjuncts, which is accordingly assigned a measure of 0. Although #17 and #19 have the same measure, #17 has one true conjunct out of three atomic conjuncts and #19 has zero true atomic statements out of one atomic statement. From this it can be said that the assertion of a false statement detracts more from information yield than the absence of an assertion or denial of that statement. Half of #14 is true, so it is appropriately assigned a measure of 0.5. #20 further shows that falsity proportionally detracts from information yield. The difference between #18 and #16 is perhaps the most interesting. Although #18 has one true conjunct out of two and #16 contains no true atoms, #16 comes out as yielding more information, further suggesting the price paid for asserting falsity in a conjunction. Also, note that some false statements yield more information than some true statements. An interpretation of the Tichy/Oddie method, which will provide a way to further understand some of these results, is given in Section 4.6.
One possible issue with this method is that tautologies are not assigned a maximum distance and hence are assigned a non-zero, positive measure. Without going into detail, it seems that this is a more significant issue for a quantitative account of information than it is for an account of truthlikeness. The implication that tautologies have a middle degree of information yield strongly conflicts with the intuition and widely accepted position that tautologies are not informative. Whilst an explanation of this result will be discussed in Section 4.6, the simplest and perhaps best way to get around this issue would be to just exclude tautologies from this metric and assign them a predefined distance of 1 hence information yield of 0. Although an expedient, this move would put the tautology alongside its extreme counterpart the contradiction, which is also excluded. In the case of contradictions (#22), these calculations do not apply, because since they are true in no states this would mean division by zero.
Quantifying Misinformation
One task that Floridi leaves for a subsequent stage of research in his paper to which we shall now turn is “the extension of the quantitative analysis to the semantic concepts of quantity of misinformation” [17] (p. 217). However before undertaking this task a clarification is in order. Without getting caught up in the veridicality thesis debate [22,23], in this paper semantic information is defined as true semantic content and semantic misinformation is defined as false semantic content. For example, in Table 5, statement #1 would be considered a piece of semantic information and statement #7 would be considered a piece of semantic misinformation. Given this, here are two issues to consider:
Statement #7, a piece of misinformation, has a non-zero, positive information measure;
Some pieces of misinformation (false statements) yield more information than some pieces of information (true statements).
In order to address these issues, I refer to some points made by Floridi in his objections to reasons to think that false information is a type of semantic information [22](p. 361):
• Reason 1. Misinformation can include genuine information.
• Objection 1. This just shows that misinformation is a compound in which only the true component qualifies as information.
• Reason 2. Misinformation can entail genuine information.
• Objection 2. Even if one correctly infers only some semantically relevant and true information from misinformation, what now counts as information is the inferred true consequence, not the original misinformation. Besides, ex falso quod libet sequitur, so any contradiction would count as information.
Thus in this approach statements can have both a true, information component and a false, misinformation component.
Continuing on, how might the approach in this section to quantifying information be used in order to quantify misinformation? To begin with, it seems that one straightforward stipulation to make is that true statements should have a measure of 0, for they are not misinformative. So unlike information measures, where both true and false statements can have relevant measures, with misinformation attention is confined to false statements [24]. Of course, the more falsity in a false statement, the more misinformation it yields. As with the information measure just given, the first metric to devise is one which measures the degree of deviation from complete misinformation (Δ_{misinfo}). A deviation of 0 will translate to maximum misinformation and a deviation of 1 will translate to no misinformation. So the metric for this deviation will at least satisfy these conditions:
all true statements have a predefined deviation of 1;
all contingently false statements have a deviation greater than or equal to 0 and less than 1.
Two ways to go about measuring this deviation come to mind. The first is to use the complement of the deviation for information measures: Δ_{misinfo}(A, T) = 1 − Δ_{info}(A, T).
For the second, I'll use some terminology given by Oddie [21](p. 50), who discusses a reversal operation Rew() on states such that:
Rew(U) = the state V such that for any atomic state B, B is true in U if and only if B is false in V.
This reversal operation on states is extended to a reversal operation on propositions. The reversal of a proposition A is the image of A under Rew():
Rev(A)=the propositionBsuch thatAcontains stateUif and only ifBcontains Rew(U),for any stateU.
Where w_{T} is the actual state, the second way to measure misinformation deviation would be to measure the distance of a statement from Rew(w_{T}). In our example, w_{T} = w_{1} so Rew(w_{T}) = w_{8}.
These two approaches turn out in fact to be equivalent. From here one can go on to calculate a statement's quantity of misinformation simply by subtracting its deviation or distance from 1; the greater the distance, the less misinformation yielded by the statement:
misinfo(A)=1−Δmisinfo(A,T)
From this it follows that
info(A)+misinfo(A)=1and the following hold:
info (Rev(T)) = 0
info(A) + info(Rev(A)) = 1
misinfo(A) = info(Rev(A))
As this section has shown, unlike the CSI framework, truthlikeness approaches such as this one accommodate semantic misinformation.
Adjusting Atom Weights
It is worth briefly mentioning the possibility of adjusting atomic weights in order to reflect differences in the informational value of atomic statements. As we have seen, where n stands for the number of propositional atoms in a logical space, each atom is assigned a standard weight of
1n for the purposes of Δ calculation. In the 3-proposition weather example being discussed, this results in each atom being assigned a weight of
13. As a consequence of this even distribution of weight, the three statements h ∧ r ∧ ¬w, h ∧ ¬r ∧ w and ¬h ∧ r ∧ w are all assigned the same information yield measure.
Beyond this there is the possibility of adjusting the weights so that the resulting assignment is non-uniform. Such a modification could perhaps be used to model cases where different atomic statements (hence respective compound statements containing them) have different informational value. There is much room to interpret just what is meant here by ‘informational value’. Statement A could have a greater informational value than B if an agent prefers the acquisition of A over the acquisition of B, or if the agent can do more with A than B. Variable weights could also perhaps be used to reflect extra-quantitative or qualitative factors. These are some preliminary thoughts.
The minimum requirement is that the sum of the values assigned to the atoms comes to 1. This is to ensure that values for Δ are appropriately distributed between 0 and 1, with 1 being the maximum, reserved for Rew(w_{T}).
With this in mind, take the following weight assignments:
weight(h)=16
weight(r)=26
weight(w)=36
With such an assignment, the information yield measurement distribution changes significantly Some results are given in Table 6.
It can be seen that although statements 1, 2 and 3 all share the same form (2 true atoms and 1 false atom), none share the same information yield measure. In such a case, 3 yields more information than 1 due to the fact that its two true atoms are of greater informational value than the two true atoms contained in 1.
One potential issue with such a modification is the treatment of misinformation quantification. Although 1 has a lower information yield measure than 3, it does not seem as right to say that conversely it has a greater misinformation yield, since it contains the same amount of falsity as 3. Is there a corresponding misinformational value?
Contradictions
As we have just seen, it is not mathematically possible to calculate a value for contradictions using this truthlikeness method (since contradictions contain no states, this mean a division by 0). How then should contradictions be dealt with? One option is to simply exclude contradictions from the metrics and assign them a predefined deviation of 1 hence information yield of 0. As we have seen, this is the approach that Floridi takes.
This however is arguably too rash a move and there are good reasons to adopt an approach in which the metrics are adjusted or expanded in order to accommodate contradictions. To begin with, the class of contradictions is not homogeneous with regards to information yield and different contradictions can be treated as having different non-zero information yields.
Take a logical space consisting of 100 atomic propositions, p_{1},p_{2}…p_{100}, all true relative to the actual state. Now the statement p_{1} ∧ p_{2} ∧ p_{3} ∧ … ∧ p_{99} ∧ p_{100} yields maximal information. If we conjoin it with ¬p_{100} to get the contradiction p_{1} ∧ p_{2} ∧ p_{3} ∧ … ∧ p_{99} ∧ p_{100} ∧ ¬p_{100}, the original information remains and the resulting statement should not instantly be assigned an information measure of 0. To put it another way, if one contradictory atom is inserted into a database with much information, whilst this means that there is now some misinformation within the database, surely there is still a lot of information within the database. If the contents of the database were to be represented as a statement (the statement being a contradiction), it would be good to have a way to measure information that can deal with the fact that the database still contains much information and that is also sensitive to the fact that different contradictions have different information yield measures; for example, p_{1} ∧ p_{2} ∧ p_{3} ∧ … ∧ p_{99} ∧ p_{100} ∧ ¬p_{100} clearly yields more information than p_{1} ∧ ¬p_{1}.
Now that the case has been made, in order to accommodate contradictions and have a way to assign them positive, non-zero measures of information, the following framework is suggested. All non-contradictions are dealt with as before, using the standard metrics with classical propositional logic. For contradictions, a many-valued framework such as that associated with the paraconsistent logic LP is employed [25]. In this framework, a third truth value, B (true and false) is introduced. Importantly, since contradictions can hold in states involving B, the denominator in calculations involving contradictions need no longer be 0.
In the classical framework, each atomic element is assigned a weight given by
1n and the value of the distance between a T and an F is equal to this weight. For this extended framework, the classical distances are still the same and the distance between B and a T or an F is
12n. Semantically speaking, the reason for this is that B consists of both T and F, so it half corresponds to either one of them. Formally, the distance calculation between A and T now becomes:
Δ(A,T)=d+dB|WA|where
Let w_{T} stand for the state that corresponds to T. This state will consist of only Ts and Fs.
W_{A} stands for the set of states in which A is true.
where n is the number of propositions in the logical space:
A weight with value
1n is assigned to every classical atomic state (T or F). The distance function Δ(w_{a}, w_{T}) deals with calculations involving classical states.
A weight with value
12n is assigned to every non-classical atomic state (B). The distance function Δ_{B}(w_{a}, w_{T}) deals with calculations involving the non-classical states.
d is the sum of atomic assertion differences between A and T involving the truth values T and F. That is, the sum of Δ(w_{a}, w_{T}), for each w_{a} ∈ W_{A}.
d_{B} is the sum of atomic assertion differences between A and T involving the truth value B. That is, the sum of Δ_{B}(w_{a}, w_{T}), for each w_{a} ∈ W_{A}.
To see this approach at work we again consider the weather framework. The list of 27 possible states looks like this:
Take the statement h ∧ ¬h. It holds in w_{10} – w_{18}. In this set of states there are 6 instances of F and 15 instances of B. Therefore, the distance is:
Δ(h ∧¬h,h ∧r ∧w)=(6×13)+(15×16)9=0.5
Table 8 lists some results, with all statements being classical contradictions. These results are reasonable. #1 is the contradiction which yields the most information and can simply be seen as a conjunction of the maximum information yielding true state description h ∧ r ∧ w and the false contradicting atom ¬h. Conversely, #13 is the contradiction which yields the least information and can simply be seen as a conjunction of the minimum information yielding completely false state description ¬h ∧ ¬r ∧ ¬w and the true contradicting atom h. In between these two extremes, the sensitivity of this approach to various forms of contradictions is clear. The significant number of contradictions with a measure of 0.5 emphasises how the Tichy/Oddie measure can be seen as gauging the balance between the truth and falsity contained in a statement (a detailed interpretation of the Tichy/Oddie approach, related to this balance gauging, is given in Section 4.6). For the basic contradiction #5, half of its atomic assertions are true and half are false; similarly for #8 and #9. Comparing #6 to #8 and #7 to #9, it can be seen that whether a collection of basic contradictions are disjunctively or conjunctively connected makes no difference and the measure remains 0.5; whether h ∧ ¬h becomes (h ∧ ¬h) ∧ (r ∧ ¬r) or (h ∧ ¬h) ∨ (r ∧ ¬r), the balance between truth and falsity remains a constant. However, as indicated by #10 and #12, there can be a difference between the conjunction of a basic contradiction with a non-basic contradiction and the corresponding disjunction.
Truthlikeness Adequacy Conditions and Information Conditions
It was seen with the CSI account that if A ⊢ B then cont(A) ≥ cont(B) and inf(A) ≥ inf(B). This is not going to hold in general for a Tichy/Oddie truthlikeness approach to information quantification.
For example, looking back at Table 5, although ¬h ∧ ¬r ∧ ¬w ⊢ ¬h, info(¬h ∧ ¬r ∧ ¬w) = 0 < info(¬h) = 0.334. However since both of these statements are false, this example is not an issue; given two false statements such as these, the logically stronger one is understandably further away from the truth.
The question of interest is whether or not this property holds when A and B are both instances of information, when they are both true. So the pertinent question is this: among true statements, does information as truthlikeness here covary with logical strength? Put formally, does the following condition hold?:
IfAandBare true statements andA⊢B,then info(B)≤info(A)
As it turns out, this condition does not hold. To begin with, it can be seen that although h ∨ ¬r ∨ ¬w ⊢ h ∨ ¬h, info(h ∨ ¬r ∨ ¬w) = 0.48 whereas info(h ∨ ¬h) = 0.5. But leaving aside cases where tautologies are involved, this result also does not hold more generally, in cases where only contingently true statements are involved. For example, although (h ∧ ¬r) ∨ w ⊢ h ∨ w, info((h ∧ ¬r) ∨ w) = 0.6 < info(h ∨ w) = 0.61. This is not an isolated case either.
Remark
In a logical space containing three propositional variables, there are 2187 possible ways to have two true statements A and B such that A ⊢ B. Out of these, 366 are such that info (A) < info (B).
An interesting example arises in a space with four atoms. Take the 4-proposition logical space obtained by adding the propositional variable d to the 3-atom weather framework, with the actual state corresponding to the state description h∧r∧w∧d. Whilst it is the case that ¬h∨¬r∨w ⊢ ¬h∨¬r∨w∨¬d, info(¬h ∨ ¬r ∨ w) = 0.482 whilst info(¬h ∨ ¬r ∨ w ∨ ¬d) = 0.483.
This result seems quite counter-intuitive; the addition of a false disjunct to an already true statement slightly increases its information yield. Contrary to this result, it is fair to say that information is proportional to accuracy, and that the addition of disjunctions, particularly when false, decreases accuracy. Also, this result seems contrary to the notion of information loss. If the information ¬h∨¬r∨w stored in a database were corrupted and the salvaged remains were the weakened ¬h ∨ ¬r ∨ w ¬d, this would ordinarily be described as a case of information loss. Or if a signal is transmitting the message ¬h ∨ ¬r ∨ w and it became ¬h ∨ ¬r ∨ w ∨ ¬d due to noise, once again it can be said that there is information loss.
The failure of Tichy/Oddie truthlikeness, hence for our purposes information, to covary with logical strength amongst true statements is discussed by Ilkka Niiniluoto, who whilst developing his own account of truthlikeness in [26] surveys how various approaches to truthlikeness fare against a range of adequacy conditions [27]. Though an interpretation of the Tichy/Oddie method that can serve to explain its failure to satisfy this condition and justify its use will be discussed shortly, in the meantime we will take a brief look at Niiniluoto's preferred approach to truthlikeness, which satisfies all of the adequacy conditions he lists.
Niiniluoto on Truthlikeness
As with the Tichy/Oddie approach to truthlikeness, the task for Niiniluoto is to define some distance function Δ such that Δ(A, T) ∈ [0, 1]. He looks at six distance functions, which are listed below. Firstly, a recap and establishment of terms to be used:
Δ(w_{i}, w_{j}) calculates the distance between states w_{i} and w_{j}. This is the sum of atomic differences multiplied by the atomic weight (
1n).
w_{T} is the actual state, that corresponds to the true state description T.
W_{A} is the set of states in which A is true.
B is the set of all states in the logical space.
Here are the distance functions:
Δ_{min}(A, T) = the minimum of the distances Δ(w_{a}, w_{T}) with w_{a} ∈ W_{A}.
Δ_{max}(A, T) = the maximum of the distances Δ(w_{a}, w_{T}) with w_{a} ∈ W_{A}.
Δ_{sum}(A, T) = the sum of all distances Δ(w_{a}, w_{T}) with w_{a} ∈ W_{A}, divided by the sum of all distances Δ(w_{b}, w_{T}) with w_{b} ∈ B.
Δ_{av}(A, T) = the sum of the distances Δ(w_{a}, w_{T}) with w_{a} ∈ W_{A} divided by |W_{A}|.
Δmmγ(A,T)=γΔmin(A,T)+(1−γ)Δmax(A,T) for some weight γ with 0 ≤ γ ≤ 1.
Δmsγλ(A,T)=γΔmin(A,T)+λΔsum(A,T) for some two weights γ and λ with 0 ≤ γ ≤ 1 and 0 ≤ λ ≤ 1.
Δ_{av} is the Tichy/Oddie approach. Niiniluoto's preferred metric for truthlikeness is
Δmsγλ, which he terms the weighted min-sum measure [26] (p. 228). Once again, this distance calculation is then used to calculate truthlikeness:
Tr(A,T)=1−Δmsγλ(A,T). Table 9 lists some results using the min-sum measure, with γ being assigned the value 0.89 and λ being assigned the value 0.44.
An Interpretation of the Tichy/Oddie Approach
Returning to the Tichy/Oddie approach, it is time to investigate further some of its problematic aspects that were touched upon earlier and see what we might make of them in relation to task of quantifying information. But before doing so, an initial general point to make is that when it comes to considerations of information/misinformation yield, there is a certain asymmetry between true and false statements. Whilst false statements (misinformation) can be judged to yield some truth (information), true statements are ordinarily judged to just contain information and do not yield any misinformation. But like false statements can yield some information, perhaps there is a sense in which true statements can lead to some misinformation. Given an actual state corresponding to the state description h ∧ r ∧ w, it is straightforward to say that the false statement h ∧ ¬r ∧ ¬w, whilst misinformative on the whole, still yields some information. On the other hand, the statement h ∨ ¬w ∨ ¬r, whilst true, does not give one the complete truth about the domain of inquiry and the majority of its disjuncts are false. As will be shown now, such a statement can be seen as being potentially misinformative in the sense that it can potentially lead to falsity and it is this type of view that can be associated with and justify usage of the Tichy/Oddie approach for certain applications.
In Section 4.4 we saw that whilst it is the case that (h ∧ ¬r) ∨ w ⊢ h ∨ w, info((h ∧ ¬r) ∨ w) < info(h ∨ w). In light of the interpretation to be now provided, this result finds some support. h ∨ w contains no false atoms, whereas (h ∧ ¬r) ∨ w contains one, namely ¬r. Expanding upon this example to emphasise the point, take a logical space consisting of five atoms ({p_{n}|1 ≤ n ≤ 5|), such that each is true in the actual state. Whilst it is the case that (p_{1} ∧ ¬p_{2} ∧ ¬p_{3} ∧ ¬p_{4}) ∨ p_{5} ⊢ p_{1} ∨ p_{5}, the antecedent statement makes many false assertions whereas the consequent makes none. In this way the Tichy/Oddie approach can be seen to measure not only how much atomic truth a statement contains, but also how much atomic falsity it contains. Also, in light of Section 4.1, these types of results can be seen as part of adopting a uniform method for the measurement of information and its complementary misinformation.
Something along the lines of the idea just outlined can be rigorously captured by viewing the Tichy/Oddie approach in terms of expected utility. We begin by assigning each state (or state description) a utility value, where the value for a state w (val(w)) is determined using the following method:
Let n stand for the number of propositional variables in the logical space.
Let t stand for the number of true atomic elements, relative to the actual state, in the state w.
val(w)=tn.
So in the case of our 3-atom logical space with w_{1} being the actual state, each state is valued as follows:
val(w_{1}) = 1
val(w2)=val(w3)=val(w5)=23
val(w4)=val(w6)=val(w7)=13
val(w_{8}) = 0
Now, given a statement A that holds in n states, convert it to distributive normal form, which will have n state description disjuncts. Imagine an agent is to choose one and only one of the disjuncts. The value of the selected disjunct's corresponding state determines the utility or informational value of the choice. Using the standard decision theoretic framework, we can say that the estimated utility of A (eu(A)) is:
eu(A)=∑i=1nval(wi)×pr(wi)where val(w_{i}) is the value of a state and pr(w_{i}) is the probability of it being chosen. Each disjunct has the same probability of being selected as any other, so for each state the probability of it being chosen is
1n. This estimated utility value equates to the statement's truthlikeness/information measure using the Tichy/Oddie approach.
Going back to the sound argument (h ∧ ¬r) ∨ w ⊢ h ∨ w, here are both antecedent and consequent in distributive normal form followed by a decision-theoretic-style tabulation of the two statements:
Thus the information measurement of a statement here can be seen in terms of its expected utility, whereby both the positive (truth, information) and the negative (falsity, misinformation) are factored into calculations. Analysing the Tichy/Oddie measure in this way explains its counter-intuitive results. The tautology can be seen as suspension of judgement and sits at 0.5, favouring neither information nor misinformation.
Also, take the example in Section 4.4, where info(¬h ∨ ¬r ∨ w) < info(¬h ∨ ¬r ∨ w ∨ ¬d). Whilst the addition of a false disjunct to a statement is grounds to think that its information measure should decrease, seen in terms of expected utility, the addition of one false disjunct results in the addition of one more model to the set of satisfying states, namely the state corresponding to the state description h ∧ r ∧ ¬w ∧ ¬d. As can be seen, this addition actually results in a greater overall expected utility.
Apparently Tichy himself offered a similar analysis as an argument in defence of his truthlikeness measure [26](p. 238). Niiniluoto is dismissive of the argument, but whilst his reasons are arguably legitimate, they pertain specifically to considerations of truthlikeness. At this point it is important to stress that although we are using the notion of truthlikeness to quantify information, this is not to say that they amount to one and the same thing and a consideration in accounting for one need not be a consideration in accounting for the other. Furthermore, the concept of information is generally treated more pluralistically than truthlikeness, “it is hardly to be expected that a single concept of information would satisfactorily account for the numerous possible applications of this general field” [28]. This point of view could accommodate the suggestion that for some purposes of semantic information quantification this approach might be suitable. In fact, some of Niiniluoto's claims could be seen as offering support for the use of the Tichy/Oddie approach in a framework for information measurement. For example, he commences his case with the following:
even if it were the case that function M_{av} [the Tichy/Oddie measure]serves to measure the degree of trustworthiness (or pragmatic preference for action) of a hypothesis, it would not follow that M_{av} is an adequate measure of truthlikeness. These two concepts are clearly distinct. Tichy tends to think that a disjunction of two constituents h_{1} ∨ h_{2} is something like a lottery ticket which gives us the alternatives h_{1} and h_{2} with equal probabilities: if we ‘put our trust’ to h_{1} ∨ h_{2}, we have to make a ‘blind choice’ between h_{1} and h_{2}. This idea is irrelevant, if we are dealing with answers to cognitive problems - the connection between truthlikeness and practical action is quite another question which has to be studied separately[26](p. 238).
In closing and without elaboration, it suffices to mention that terms employed here, such as ‘trustworthiness’ and ‘practical action’ could be not inappropriately associated with the notion of information.
Contradictions under this Interpretation
The contradiction-accommodating extension to the Tichy/Oddie approach outlined in Section 4.3 can be dealt with nicely using this interpretation. A straightforward way is to modify the standard classical way, where the value of a state w becomes:
val(w)=12nb+1ntwhere b stands for the number of B-valued atomic elements in the state and t stands for the number of T-valued atomic elements in the state.
Example
Take the statement h ∧ ¬h ∧ r ∧ ¬r ∧ w, which as listed in Table 8 as #2 has an information measure of 0.583. This statement holds in states w_{13} and w_{14} of Table 7. Now
val(w13)=23
val(w14)=12
so
eu(h ∧¬h ∧r ∧¬r ∧w)=(23×12)+(12×12)=0.583
As mentioned earlier, the adoption of a paraconsistent framework here is for instrumental purposes, with no commitment to the actual obtainability of paraconsistent states. One small issue with the interpretation given in the previous example is that it somewhat entertains the possibility of paraconsistent states. The decision theoretic framework outlined here involves an agent making a choice between possible states, with the possibility that one of the states is the actual state. For cases of contradiction measurement, the states being dealt with are paraconsistent ones. However since the preference here is to remain neutral and refrain from endorsing the possibility or actuality of paraconsistent states, a modified interpretation in terms of classical states is appropriate.
Example
In the previous example, the contradiction h ∧ ¬h ∧ r ∧ ¬r ∧ w corresponded to a choice between two paraconsistent states, w_{13} and w_{14}. How can this decision-theoretic analysis be made in terms of classical states only? To begin with, take w_{13}, which has the following valuations of atomic elements: v(h) = B, v(r) = B and v(w) = T. We can say that one of these valuations, namely v(w) is definite, since it results in a classical value. The others though are not definite and result in the paraconsistent valuation of B. In terms of the decision-theoretic analysis, a valuation of B means that the atomic element actually has a value of T or F, but there is no indication as to which one in particular. So it is to be treated as a ‘wildcard valuation’, to be substituted by either T or F. Using this system, substitutions for B in w_{13} result in the following set of classical states of Table 4 in Section 4: {w_{1}, w_{3}, w_{5} and w_{7}.
Since none of the 3 valuations for w_{14} are definite, substitutions for B in w_{14} result in a set consisting of all the classical states of Table 4: {w_{1} – w_{8}}.
Given this translation to sets of classical states, we can reframe the situation for an agent choosing amongst the states corresponding to h ∧ ¬h ∧ r ∧ ¬r ∧ w as follows. The original calculation was:
eu(h ∧¬h ∧r ∧¬r ∧w)=(val(w13)×12)+(val(w14)×12)
With the modified way, val(w_{13}) and val(w_{14}) are replaced with ‘sub’ estimated utility calculations, involving the classical states they translate to. Here are the replacements:
val(w13)=eu(w1 ∨w3 ∨w5 ∨w7)=23
val(w14)=eu(w1 ∨w2 ∨w3 ∨w4 ∨w5 ∨w6 ∨w7 ∨w8)=12
So that the estimated utility is once again 0.583.
Adjusting State Description Utility Values
Given this decision-theoretic interpretation of the Tichy/Oddie method, one possible extension is to adjust the way in which utilities are assigned to state descriptions. As we have seen, the standard method uses a simple, linear utility function; for all levels l, there is a constant difference between l and l + 1 / l − 1. As a simple example of this extension, take a basic quadratic utility function. Where x stands for the standard linear utility of a state description, its modified utility value y is calculated as follows: y = x^{2}.
What does a utility function such as this one say in terms of state description distribution and information yield? Roughly speaking, it favours high-valued state description efficiency: the more efficiently one statement gets to containing a high valued state description disjunct the better, where more efficiently means doing so with less state description disjuncts.
In the ranking of all possible 255 propositions in the 3-atom logical space, from highest to lowest information yield, the original method using a standard linear utility and the method with this quadratic utility function agree on the first eight positions, after which differences start to appear. To illustrate the higher premium given to relatively small state collections involving w_{1}, whilst the linear way places w_{1}, w_{8} below w_{1}, w_{2}, w_{8}, the quadratic way places w_{1}, w_{8} above w_{1}, w_{2}, w_{8}.
If the rate of increase of the utility function is sufficiently large or set it a certain way, it is even possible to get the covariation with logical strength amongst true statements condition to hold, although at the cost of a skew where a disproportionate number of states containing w_{1} are placed at the top of the rankings.
The utility and decision theory literature is rich and this decision theoretic analysis of the Tichy/Oddie method (and in general other similar methods) opens the door up to further experimentation and the application of decision theoretic resources to truthlikeness/information methods.
Another Method to Quantify Semantic Information
In this section a new, computationally elegant method to measure information along truthlikeness lines is proposed. As will be seen, this value aggregate approach differs to both the Tichy/Oddie and Niiniluoto approaches, lying somewhere in between.
As has been well established by now, each statement corresponds to a set of states W in a logical space. Each of the states in this logical space can be valued, with the actual state being the uniquely highest valued. The rough idea behind the value aggregate approach is a simple one, with two factors concerning state distribution in W involved in determining the information yield of a statement:
the greater the highest valued state of W the better for information yield.
given a collection of states with a highest value member w, the fewer other states of lesser value than w the better for information yield. Also, given other states of lesser value, the closer they are to w the better.
This determination is a two-tier process; first rank a statement according to the first factor and then apply the second. For example, the state description corresponding to the actual state (in our case h ∧ r ∧ w) ranks highest relative to the first factor. It then also ranks highest relative to the second factor because there are no other states of lesser value than w_{1} in its state collection. Accordingly, its information yield is maximal.
As another example, the statement h ∨ r, which corresponds to states w_{1} - w_{6}, yields more information than the statement h ∨ ¬r, which corresponds to states w_{1} - w_{4}, w_{7}, w_{8}, because although both have the same highest valued state and the same number of lesser valued states, the lesser valued states for h ∨ r are closer to the highest valued state.
Whilst this informal idea is still perhaps somewhat loose, it motivates and can be linked to the formal method that will now be detailed. To begin with, each state is once again assigned a value and ranked, where the value for a state w (val(w)) is determined using the following method:
Let n stand for the number of propositional variables in the logical space
Let t stand for the number of true atoms in a state w relative to the actual state
val(w)=tn×2n
So in the case of our 3-atom logical space with w_{1} being the actual state, each state is valued as follows:
val(w1)=324
val(w2)=val(w3)=val(w5)=224
val(w4)=val(w6)=val(w7)=124
val(w_{8}) = 0
Now, given a statement A, its information yield is measured using the following algorithm:
Determine the set of states W in which A holds (W = {w | w |= A})
Place the members of W into an array X_{1} of length |W| and order the members of X_{1} from lowest to highest value
This process is represented with a function arraystates(), so that X_{1}= arraystates(W).
Let X_{2} stand for an empty array with 2^{n} spaces. Next, start by placing the first (lowest) element of X_{1} in the first position of X_{2}. In general, place the n^{th} element of X_{1} in the n^{th} position of X_{2}.
Unless the statement being measured is a tautology, then length(X_{1}) < length(X_{2}). So once the last element of X_{1} has been reached, use this last element to fill in all the remaining places of X_{2}.
This process is represented with a function lineup(), so that X_{2} = lineup(X_{1}).
Finally, sum up all the values of each element of X_{2} to get the information measure
info(A)=sum(X2)=sum(lineup(X1))=sum(lineup(arraystates(W))).
Following is an example of this value aggregate method.
With a ranking (highest to lowest) of all 255 possible statements (propositions, state collections), in the 3-atom space, the ordering of statements given by this method largely agrees with Niiniluoto's min-sum measure (with γ and λ as given in Section 4.5) for a significant portion of the first half of ordered statements. In fact it agrees for the first 107 positions, after which telling differences emerge. Also, it agrees more with the Tichy/Oddie approach towards the bottom of the rankings.
Table 10 contains some results using the value aggregate method. They seem reasonable and it is evident that their spread is closer to the Niiniluoto spread than the Tichy/Oddie. Also, unlike the Tichy/Oddie approach, where the differences between h, h ∧ r and h ∧ r ∧ w are constant, with the value aggregate approach the first of these differences is greater than the second. This variation is akin to the quadratic nature of Floridi's degrees of informativeness [17] (p. 210).
Adequacy Conditions
As mentioned earlier in Section 4.4, Niiniluoto states a number of adequacy conditions “which an explicate of the concept of truthlikeness should satisfy” [26] (p. 232). An investigation into the applicability of these conditions to an account of semantic information quantification (whether all or only some apply) will not be pursued here; apart from perhaps an occasional comment, I do not intend to discuss matter. Suffice it to say, at the least most of them seem applicable.
Nonetheless, in order to initiate this new method I here include a list of these conditions plus a summary of how the value aggregate approach, along with the Tichy/Oddie and Niiniluoto approaches, fare against them. This summary confirms that the value aggregate approach is similar but not equivalent to either of the other two approaches.
The presentation of these conditions will largely conform to their original form, so bear in mind for our purposes that Tr() corresponds to info() and as Δ decreases/increases info() increases/decreases. Before listing the conditions, some terms need to be (re)established:
Niiniluoto uses the term constituent as a more general term for state descriptions.
A is used to refer to statements in the logical space in general. I_{A} is the set of numbers used to index the set of states corresponding to the statement A. For example, in our weather framework, the statement h corresponds to the states {w_{1}, w_{2}, w_{3}, w_{4}}, so letting A stand for h we have I_{A} = {1, 2, 3, 4}.
S_{i} is used to refer to the constituent (state description) that corresponds to state w_{i}.
S_{*} is reserved for the state description that corresponds to the actual state w_{*}, so in our case S_{*} = S_{1}.
B stands for the set of mutually exclusive and jointly exhaustive constituents. So in our case B = {w_{i} ∣ 1 ≤ i ≤ 8}. I is the set of numbers corresponding to B.
Δ_{ij} stands for Δ(w_{i}, w_{j}).
An element w_{j} of B is called a Δ-complement of w_{i}, if Δ_{ij} = maxΔ_{ik}, k ∈ I. Also, our weather framework example is Δ- complemented; if each w_{i} ∈ B has a unique Δ-complement w_{j} in B, such that Δ_{ij} = 1, the system (B, Δ) is said to be Δ-complemented.
Here are the conditions:
(M1) (Range) 0 ≤ Tr(A, S_{*}) ≤ 1.
(M2) (Target) Tr(A, S_{*}) = 1 iff A = S_{*}.
(M3) (Non-triviality) All true statements do not have the same degree of truthlikeness, all false statements do not have the same degree of truthlikeness.
(M4) (Truth and logical strength) Among true statements, truthlikeness covaries with logical strength.
If A and B are true statements and A ⊢ B, then Tr(B, S_{*}) ≤ Tr(A, S_{*}).
If A and B are true statements and A ⊢ B and B ⊬ A, then Tr(B, S_{*}) < Tr(A, S_{*}).
(M5) (Falsity and logical strength) Among false statements, truthlikeness does not covary with logical strength; there are false statements A and B such that A ⊢ B but Tr(A, S_{*}) < Tr(B, S_{*}).
(M6) (Similarity) Tr(S_{i}, S_{*}) = Tr(S_{j}, S_{*}) iff Δ_{*i} = Δ_{*j} for all S_{i}, S_{j}.
(M7) (Truth content) If A is a false statement, then Tr(S_{*} ∨ A, S_{*}) > Tr(A, S_{*}).
(M9) (Distance from the truth) Let Δ_{*j} < Δ_{*i}. Then Tr(S_{j} ∨ S_{i}, S_{*}) decreases when Δ_{*i} increases.
(M10) (Falsity may be better than truth) Some false statements may be more truthlike than some true statements.
(M11) (Thin better than fat) If Δ_{*i} = Δ_{*j} > 0, i ≠ j, then Tr(S_{i} ∨ S_{j}, S_{*}) < Tr(S_{i}, S_{*}).
(M12) (Ovate better than obovate) If Δ_{*j} < Δ_{*i} < Δ_{*k}, then Tr(S_{j} ∨ S_{i} ∨ S_{k}, S_{*}) increases when Δ_{*i} decreases.
(M13) (Δ-complement) Tr(A, S_{*}) is minimal, if A consists of the Δ–complements of S_{*}.
Table 11 gives a summary of the measures against the adequacy conditions, where +(γ) means that the measure satisfies the given condition with some restriction on the value of γ and (λ) [29]:
The value aggregate method will satisfy those seeking a covariation of information and logical strength. As can be seen, it fails only M11. For example, Δ_{*2} = Δ_{*3} = Δ_{*5}, yet info(w_{2} ∨ w_{3} ∨ w_{5}) = info(w_{2} ∨ w_{3}) = info(w_{2}). The value aggregate method, like the Tichy/Oddie method, does not differentiate sets of states such as these. In terms of information, it could be argued that satisfaction of this condition is not essential; removing or adding a state description disjunct in this example does not inform an agent any more or less of the true state.
Misinformation and Adjusting Utilities
Using the value aggregate method, a metric for misinformation measurements can simply be obtained by inverting the utility values for each state; where u is original utility of a state, its new utility becomes 1 − u. As with the Tichy/Oddie method, this is equivalent to misinfo(A) = 1 − info(A). Unlike the Tichy/Oddie method though, where there is a perfect symmetry between truth and falsity, the accumulation of values with the aggregate approach results in a skew towards the information measure. As a consequence there are significant differences between the two in terms of misinformation value calculations. For example, whilst the Tichy/Oddie approach gives misinfo(w_{1} ∨ w_{8}) = misinfo(w_{4}∨w_{5}) = 0.5, this new approach gives misinfo(w_{1}∨w_{8}) = 0.125 < misinfo(w_{4}∨w_{5}) = 0.375.
Also, the value aggregate method can be extended to incorporate different utility functions for different purposes, in the same way as discussed in Section 4.7. As described in earlier Section 5, with the value aggregate approach the last element in the array X_{1} is the highest valued and is used to fill in the remaining positions of X_{2}. In the case of w_{1} in particular, this means that a collection of states consisting of w_{1} and relatively few other states will jump ahead quite quickly given the continued addition of the highest valued element.
The adoption of a non-linear utility function could be used to regulate this. For example, a simple logarithmic utility function such as y = 20log_{2}(x + 1) (x being the value of the standard utility) places w_{1}, w_{8} below w_{1}, w_{3}, w_{5}, w_{7}, whereas with the original standard linear utility this order is reversed. Also to note, as long as the actual state is assigned the highest utility, the information measure will covary with logical strength for true statements, whatever utility function is used.
Combining CSI and Truthlikeness Approaches
I would like to now briefly discuss the possibilities of (1) using a CSI-style inverse probabilistic approach as a basis for a quantitative account of semantic information (2) combining the CSI approach and truthlikeness approaches.
With regards to (1), there seems to be inherent issues with the CSI inverse probabilistic approach. Thinking about the two approaches to semantic information quantification we have looked at (CSI and truthlikeness), Table 12 depicts a rough conceptual principle that can be extracted. The four possible ways of combining the two factors under consideration are represented using a standard 2×2 matrix, with each possible way being assigned a unique number between 1 and 4. 1 represents ‘best’ for information yield and 4 represents ‘worst’.
Therefore, narrowing down the range of possibilities is a good thing for information yield when done truthfully. On the other hand, if done falsely it is a bad thing. Borrowing an idea from decision theory, this type of reward/punishment system can be formally captured with a simple scoring rule. Take the following, where for a statement A:
if A is true then give it an information measure of cont(A)
if A is false then give it an information measure of –cont(A)
Whilst this approach would give acceptable results for true statements, when it comes to false statements it is too coarse and does not make the required distinctions between the different classes of false statements. A state description in which every atom is false is rightly assigned the lowest measure. Furthermore, other false statements which are true in the state corresponding to that false state description will be assigned acceptable measures.
But this approach also has it that any false state description will be accorded the lowest measure. So the state description h ∧ r ∧ ¬w (w_{2}) would be assigned the same measure as ¬h ∧ ¬r ∧ ¬w (w_{8}), which is clearly inappropriate. Furthermore, equal magnitude supersets of the states correlating to these state descriptions (e.g., w_{2}, w_{4} and w_{4}, w_{8}) would also be assigned equal measures.
Given the aim to factor in considerations of truth value, the problem with any account of information quantification based on a CSI-style inverse probabilistic approach is related to the failure of Popper's content approach to truthlikeness. As Niiniluoto nicely puts it:
among false propositions, increase or decrease of logical strength is neither a sufficient nor necessary condition for increase of truthlikeness. Therefore, any attempt to define truthlikeness merely in terms of truth value and logical deduction fails. More generally, the same holds for definitions in terms of logical probability and information content [30] (p. 296).
With regards to (2), one option is to adopt a hybrid system. As was seen, the CSI approach works well when its application is confined solely to true statements. So perhaps a system which applies the CSI approach to true statements and a truthlikeness approach to false statements could be used. Furthermore, apart from such hybrid systems, there is the possibility of combining calculations from both approaches into one metric. For example, an incorporation of the CSI approach into the value aggregate approach could help distinguish between, say w_{2} and w_{2}, w_{3}, which as we saw are given the same measure using the value aggregate approach.
Formula-Based Approaches
Classes of update semantics for logical databases can be divided into formula-based and model-based approaches. Basically, with model-based approaches the semantics of an update for a database are based on the models of that database and with formula-based approaches statements in the database are operated on instead [31] (p. 12).
Borrowing this classification division for our investigation, it can be seen that the approaches discussed so far would fall under a model-based approach to semantic information quantification. Beyond suggesting the possibility of a formula-based approach, I will also outline a method that can be seen to fall on this side of the divide. This experimental outline is more so for the sake of illustration rather than anything else, though it could be useful.
To begin with, statements are first converted to conjunctive normal form. Although this means that logically equivalent statements are treated the same, they are still being dealt with directly and there is no analysis whatsoever of the models that they correspond to.
A logical statement is in conjunctive normal form (CNF) when it consists of a conjunction of disjunctive clauses, with each disjunct in each conjunction being a literal (either an atom or a negated atom). A ∧ B ∧ ¬C and (A ∨ ¬B) ∧ (A ∨ C) are two examples of statements in CNF. Also, here the normalised statements are fully converted to CNF, meaning that all redundancies are eliminated. Here is the procedure:
use equivalences to remove ↔ and →.
use De Morgan's laws to push negation signs immediately before atomic statements.
eliminate double negations.
use the distributive law A ∨ (B ∧ C) ≡ (A ∨ B) ∧ (A ∨ C) to effect the conversion to CNF.
if a conjunct contains both A and ¬A (i.e., A ∨ ¬A), remove it.
use absorption law A ∧ (A ∨ B) ≡ A so that for any literal that occurs as a sole disjunct (i.e., occurs in a disjunctive clause of which it is the only member), remove any others disjunctive clauses that contain that literal.
use rule of unsatisfiability A ∨ ⊥ ≡ A so that for any literal A that occurs as a sole disjunct (i.e., occurs in a disjunctive clause of which it is the only member), remove its negation from any disjunction it occurs in. So A ∧ (B ∨ ¬A) becomes A ∧ B.
Given a normalised statement, how can its information yield be measured? Let us start with the simplest type of statement in CNF, one consisting of a conjunction of literals. Each literal has a value of magnitude 1. At this stage there are two ways to go about this. The first method (Method 1) involves just positively incrementing the total information yield of the statement as each true literal is encountered. The second method (Method 2) involves also decrementing the total information yield of the statement as each false literal is encountered.
Take the statement h ∧ r ∧ ¬w. It contains 2 true literals and 1 false literal. Going by Method 1, the total information yield of this statement is 2. Going by Method 2, where the misinformation instances of a statement count against its total information yield, the total information yield of this statement is 1.
This is all very simple and gives the right results. h ∧ r ∧ w has a maximal information yield of 3. h ∧ r ∧ ¬w and h ∧ r have the same information yield of 2 going by Method 1, but going by method 2 the former is ‘punished’ for its assertion of misinformation and goes down to 1.
The introduction of proper disjunctions makes things more complicated. How can the calculation method reflect the fact that h ∨ r yields less information than other statements such as h and h ∧ r. The more disjuncts a disjunction contains, the less information it yields; information yield is inversely related to the number of disjuncts. Hence, where n is the number of disjuncts in a disjunction, a simple yet potentially suitable measure of the maximum information yield of the disjunction is
1n. With this way, each literal in the disjunction has a value of magnitude
1n2.
To illustrate all of this, take the statement h ∨ r ∨ w. The total information yield of this statement is
3×19=13. Given a disjunction with false literals, the results depend of whether Method 1 or Method 2 is adopted. For the statement h ∨ r ∨ ¬w, Method 1 gives a result of
13+13=23 and Method 2 gives a result of
13+13−13=13.
Table 13 lists results using Method 1 and results using Method 2. Clearly these methods are not as refined as the model-based approaches we have looked at, with the variation in assignments of measure values being significantly cruder. Method 1 is clearly just about aggregating truth. Interestingly, the results for Method 2 can be ranked in the same order as the results for the Tichy/Oddie approach in Table 5.
Estimated Information
In closing I shall briefly introduce the idea of estimated information. Within the literature a distinction is made between the semantic and epistemic problems of truthlikeness [32] (p. 121):
The semantical problem: “What do we mean if we claim that the theory X is closer to the truth than the theory Y?”
The epistemic problem: “On what evidence are we to believe that the theory X is closer to the truth than the theory Y?”
The focus of this paper has been the semantical problem of information quantification; what do we mean when we say that statement A yields more information than statement B? The epistemic problem relates to estimating information yield; given some partial evidence E, are we to estimate that statement A yields more information or less information than statement B?
Of course, an account of the semantical problem is primary, with any account of the epistemic problem being secondary, based on a semantical foundation. Nonetheless a method to estimate information yield is of the highest importance.
In practice judgements of information yield are going to be made with limited evidence, without knowledge of the complete truth (the one true state description). If an agent already knows the complete truth in a domain of inquiry then there would be no new information for them to acquire anyway, no statement in the domain of inquiry could be informative for them. In such scenarios, the actual information yield of any statement is already known, or can be calculated.
In general though, calculations of information yield are of interest to agents who do not know the complete truth and who are seeking new information. When such agents must choose amongst a set of different statements, their aim is to choose the statement that they estimate will yield the most information relative to the actual state (which the agent has limited epistemic access to). In making this estimation and choice the agent will often already posses some evidence and this evidence will rule out certain possible states from consideration in the calculations.
The standard formula for estimated utility in decision theory can be used to calculate the expected information yield of a statement A given prior evidence E:
infoest(A∣E)=∑i=1ninfo(A,Si)×pr(Si∣E)n stands for the number of possible states in the logical space and S_{i} stands for the state description corresponding to state i.
Given a certain piece of information as evidence, the difference between actual and estimated information yield calculations for a statement can be marked. For example, with info() as the Tichy/Oddie method:
info(h ∧ r ∧ w) = 1
info_{est}(h ∧ r ∧ w∣h ∨ ¬r ∨ ¬w) = 0.48
info(¬h ∧ ¬r ∧ ¬w) = 0
info_{est}(¬h ∧ ¬r ∧ ¬w∣h ∨ ¬r ∨ ¬w) = 0.52
Interestingly, although the completely true h ∧ r ∧ w has a maximum information yield of 1 and the completely false ¬h ∧ ¬r ∧ ¬w has a minimum information yield of 0, even certain true statements used as evidence in estimation calculations will give ¬h ∧ ¬r ∧ ¬w a higher estimated information measure than h ∧ r ∧ w.
Conclusions
The main point of this paper has been the advocacy of quantitative accounts of semantic information based on the notion of truthlikeness. Whilst this is in contrast to traditional inverse probabilistic approaches such as Bar-Hillel and Carnap's CSI, it is not to be seen as an attempt to completely dismiss their legitimacy. Such accounts prove useful in certain applications and effectively capture certain aspects of information, particularly the deep intuition that information is proportional to uncertainty reduction [33].
However there are certain features of an approach such as CSI which are at odds with both ordinary senses of information and criteria associated with certain applications of information. A truthlikeness approach on the other hand strongly resonates with these ordinary senses and satisfies these criteria. Furthermore, such an approach naturally accommodates the corresponding notion of misinformation. As was seen, fortunately there is already a significant body of work on truthlikeness to draw from. Yet also, as evidenced in this paper there is still room for further investigation, experimentation and the development of new quantification methods, with a specific focus on information.
Apart from any suggested or implied above, two possibilities for further research are:
The extension of information as truthlikeness approaches to richer systems beyond classical propositional logic. The methods in this paper that were drawn from the truthlikeness literature were actually extended to first order classical systems and even to systems of second order logic [21,26]. Also what, if any, are the possibilities for non-classical systems?
New semantic content (information and misinformation) is acquired by agents, who usually already possess some content. This makes things a bit more interesting. For example, if an agent already possesses the information h ∧ r, then the information w is going to be more informative to the agent than the information h ∧ r, even though the latter yields more information. Or if an agent holds the content A and accepts the content ¬A, they will need to revise their content base in order to maintain consistency. Often there will be more than one way to carry out such a revision, so one question to ask is which way will result in the revised content base with the greatest information as truthlikeness measure? Belief revision [34], the process of changing beliefs to take into account the acquisition of new content, comes into play here. This area has a very rich research field already and there remains potential work to be done on drawing connections between and integrating the fields of belief revision and truthlikeness/information.
Tables
Classes of inaccuracy.
number of erroneous atomic messages in s
Class of inaccuracy
Cardinality of Inac_{i}
Degree of inaccuracy
Degree of informativeness
1
Inac_{1}
6
−16
≈ 0.972
2
Inac_{2}
15
−13
≈ 0.888
3
Inac_{3}
20
−12
0.75
4
Inac_{4}
14
−23
≈ 0.555
5
Inac_{5}
7
−56
≈ 0.305
6
Inac_{6}
1
−1
0
Classes of vacuity.
number of compatible situations including w
Class of vacuity
Cardinality of Vac_{i}
Degree of vacuity
Degree of informativeness
63
Vac_{1}
63
6364
0.031
31
Vac_{2}
31
3164
0.765
15
Vac_{3}
15
1564
0.945
7
Vac_{4}
7
764
0.988
3
Vac_{5}
3
364
0.998
Classes of inaccuracy and vacuity examples.
Class
Statement
Inac_{1}
Ga ∧ Ha ∧ Gb ∧ Hb ∧ Gc ∧ ¬Hc
Inac_{2}
Ga ∧ Ha ∧ Gb ∧ Hb ∧ ¬Gc ∧ ¬Hc
Inac_{3}
Ga ∧ Ha ∧ Gb ∧ ¬Hb ∧ ¬Gc ∧ ¬Hc
Inac_{4}
Ga ∧ Ha ∧ ¬Gb ∧ ¬Hb ∧ ¬Gc ∧ ¬Hc
Inac_{5}
Ga ∧ ¬Ha ∧ ¬Gb ∧ ¬Hb ∧ ¬Gc ∧ ¬Hc
Inac_{6}
¬Ga ∧ ¬Ha ∧ ¬Gb ∧ ¬Hb ∧ ¬Gc ∧ ¬Hc
Vac_{1}
Ga ∨ Gb ∨ Gc ∨ Ha ∨ Hb ∨ Hc
Vac_{2}
(Ga ∨ Gb ∨ Gc ∨ Ha ∨ Hb) ∧ Hc
Vac_{3}
(Ga ∨ Gb ∨ Gc ∨ Ha) ∧ Hb ∧ Hc
Vac_{4}
(Ga ∨ Gb ∨ Gc) ∧ Ha ∧ Hb ∧ Hc
Vac_{5}
(Ga ∨ Gb) ∧ Gc ∧ Ha ∧ Hb ∧ Hc
Truth Table for 3-Proposition Logical Space.
State
h
r
w
w_{1}
T
T
T
w_{2}
T
T
F
w_{3}
T
F
T
w_{4}
T
F
F
w_{5}
F
T
T
w_{6}
F
T
F
w_{7}
F
F
T
w_{8}
F
F
F
Information yield results using Tichy/Oddie metric.
#
Statement (A)
T/F
info(A)
1
h ∧ r ∧ w
T
1
2
h ∧ r
T
0.83
3
h ∧ (r ∨ w)
T
0.78
4
h ∧ (¬r ∨ w)
T
0.67
5
(h ∧ r) ∨ w
T
0.67
6
h
T
0.67
7
h ∧ r ∧ ¬w
F
0.67
8
h ∨ r
T
0.61
9
(h ∧ ¬r) ∨ w
T
0.6
10
h ∨ r ∨ w
T
0.57
11
h ∨ r ∨ ¬w
T
0.52
12
h ∨ ¬r
T
0.5
13
h ∨ ¬h
T
0.5
14
h ∧ ¬r
F
0.5
15
h ∨ ¬r ∨ ¬w
T
0.48
16
¬h ∨ ¬r ∨ ¬w
F
0.43
17
h, ∧ ¬r ∧ ¬w
F
0.33
18
(h ∨ ¬w) ∧ ¬r
F
0.33
19
¬h
F
0.33
20
¬h ∧ ¬r
F
0.17
21
¬h ∧ ¬r ∧ ¬w
F
0
22
h ∧ ¬h
F
N/A
Results with adjusted weights.
#
Statement (A)
info(A)
1
h ∧ r ∧ ¬w
0.5
2
h ∧ ¬r ∧ w
0.67
3
¬h ∧ r ∧ w
0.83
4
h ∧ ¬r ∧ ¬w
0.167
5
¬h ∧ ¬r ∧ w
0.5
6
¬h ∧ r ∧ ¬w
0.33
LP Truth Table for 3-Proposition Logical Space.
State
h
r
w
State
h
r
w
State
h
r
w
w_{1}
T
T
T
w_{10}
B
T
T
w_{19}
F
T
T
w_{2}
T
T
B
w_{11}
B
T
B
w_{20}
F
T
B
w_{3}
T
T
F
w_{12}
B
T
F
w_{21}
F
T
F
w_{4}
T
B
T
w_{13}
B
B
T
w_{22}
F
B
T
w_{5}
T
B
B
w_{14}
B
B
B
w_{23}
F
B
B
w_{6}
T
B
F
w_{15}
B
B
F
w_{24}
F
B
F
w_{7}
T
F
T
w_{16}
B
F
T
w_{25}
F
F
T
w_{8}
T
F
B
w_{17}
B
F
B
w_{26}
F
F
B
w_{9}
T
F
F
w_{18}
B
F
F
w_{27}
F
F
F
Results for contradictions.
#
Statement (A)
info(A)
1
h ∧ ¬h ∧ r ∧ w
0.67
2
h ∧ ¬h ∧ r ∧ ¬r ∧ w
0.583
3
((h ∧ ¬h) ∨ (r ∧ ¬r)) ∧ w
0.583
4
(h ∧ ¬h) ∨ (r ∧ ¬r ∧ w)
0.5256
5
h ∧ ¬h
0.5
6
(h ∧ ¬h) ∨ (r ∧ ¬r)
0.5
7
(h ∧ ¬h) ∨ (r ∧ ¬r) ∨ (w ∧ ¬w)
0.5
8
(h ∧ ¬h) ∧ (r ∧ ¬r)
0.5
9
(h ∧ ¬h) ∧ (r ∧ ¬r) ∧ (w ∧ ¬w)
0.5
10
(h ∧ ¬h ∧ ¬r) ∨ (w ∧ ¬w)
0.474
11
h ∧ ¬h ∧ ¬r
0.417
12
(h ∧ ¬h ∧ ¬r) ∧ (w ∧ ¬w)
0.417
13
h ∧ ¬h ∧ ¬r ∧ ¬w
0.33
Information yield results using Niiniluto's min-sum measure.
#
Statement (A)
T/F
info(A)
1
h ∧ r ∧ w
T
1
2
h ∧ r
T
0.96
3
h ∧ (r ∨ w)
T
0.93
4
h ∧ (¬r ∨ w)
T
0.89
5
h
T
0.85
6
(h ∧ r) ∨ w
T
0.81
7
(h ∧ ¬r) ∨ w
T
0.78
8
h ∨ r
T
0.74
9
h ∧ r ∧ ¬w
F
0.67
10
h ∨ r ∨ w
T
0.67
11
h ∨ ¬r
T
0.67
12
h ∨ r ∨ ¬w
T
0.63
13
h ∨ ¬r ∨ ¬w
T
0.6
14
h ∧ ¬r
F
0.59
15
h ∨ ¬h
T
0.56
16
(h ∨ ¬w) ∧ ¬r
F
0.48
17
¬h
F
0.41
18
h ∧ ¬r ∧ ¬w
F
0.33
19
¬h ∨ ¬r ∨ ¬w
F
0.26
20
¬h ∧ ¬r
F
0.22
21
¬h ∧ ¬r ∧ ¬w
F
0
22
h ∧ ¬h
F
N/A
Information yield results using the value aggregate method.
#
Statement (A)
T/F
info(A)
1
h ∧ r ∧ w
T
1
2
h ∧ r
T
0.9583
3
h ∧ (r ∨ w)
T
0.9167
4
h ∧ (¬r ∨ w)
T
0.875
5
h
T
0.8333
6
(h ∧ r) ∨ w
T
0.7917
7
(h ∧ ¬r) ∨ w
T
0.75
8
h ∨ r
T
0.7083
9
h ∧ r ∧ ¬w
F
0.6667
10
h ∨ ¬r
T
0.625
11
h ∨ r ∨ w
T
0.625
12
h ∧ ¬r
F
0.625
13
h ∨ r ∨ ¬w
T
0.5833
14
h ∨ ¬r ∨ ¬w
T
0.5417
15
(h ∨ ¬w) ∧ ¬r
F
0.5417
16
¬h
F
0.5
17
h ∨ ¬h
T
0.5
18
¬h ∨ ¬r ∨ ¬w
F
0.4583
19
h ∧ ¬r ∧ ¬w
F
0.3333
20
¬h ∧ ¬r
F
0.2917
21
¬h ∧ ¬r ∧ ¬w
F
0
22
h ∧ ¬h
F
N/A
Measures against adequacy conditions.
Condition
Niiniluoto (ms)
Tichy/Oddie (av)
Value Aggregate
M1
+
+
+
M2
+
+
+
M3
+
+
+
M4a
+
−
+
M4b
+
−
+
M5
+(γ)
+
+
M6
+
+
N/A
M7
+
+
+
M8
+(γ)
−
+
M9
+
+
+
M10
+(γ)
+
+
M11
+
−
−
M12
+
+
+
M13
+(γ)
+
+
Possibility reduction and truth/falsity combinations.
More Truth than Falsity
More Falsity than Truth
High Reduction of Possibilities
1
4
Low Reduction of Possibilities
2
3
Information yield using formula-based approach.
Method 1
Method 2
#
Statement (A)
T/F
info(A)
#
Statement (A)
T/F
info(A)
1
h ∧ r ∧ w
T
3
1
h ∧ r ∧ w
T
3
2
h ∧ r ∧ ¬w
F
2
2
h ∧ r
T
2
3
h ∧ r
T
2
3
h ∧ (r ∨ w)
T
1.5
4
h ∧ (r ∨ w)
T
1.5
4
h ∧ (¬r ∨ w)
T
1
5
h ∧ (¬r ∨ w)
T
1.25
5
(h ∧ r) ∨ w
T
1
6
h ∧ ¬r ∧ ¬w
F
1
6
h
T
1
7
h ∧ ¬r
F
1
7
h ∧ r ∧ ¬w
F
1
8
h ∧ ¬h
F
1
8
h ∨ r
T
0.5
9
h
T
1
9
(h ∧ ¬r) ∨ w
T
0.5
10
(h ∧ r) ∨ w
T
1
10
h ∨ r ∨ w
T
0.33
11
(h ∧ ¬r) ∨ w
T
0.75
11
h ∨ r ∨ ¬w
T
0.11
12
h ∨ r
T
0.5
12
h ∨ ¬r
T
0
13
h ∨ r ∨ w
T
0.33
13
h ∨ ¬h
T
0
14
(h ∨ ¬w) ∧ ¬r
F
0.25
14
h ∧ ¬r
F
0
15
h ∨ ¬r
T
0.25
15
h ∧ ¬h
F
0
16
h ∨ ¬h
T
0.25
16
h ∨ ¬r ∨ ¬w
T
−0.11
17
h ∨ r ∨ ¬w
T
0.22
17
¬h ∨ ¬r ∨ ¬w
F
−0.33
18
h ∨ ¬r ∨ ¬w
T
0.11
18
h ∧ ¬r ∧ w
F
−1
19
¬h ∧ ¬r ∧ ¬w
F
0
19
(h ∧ ¬w) ∧ ¬r
F
−1
20
¬h
F
0
20
¬h
F
−1
21
¬h ∧ ¬r
F
0
21
¬h ∧ ¬r
F
−2
22
¬h ∨ ¬r ∨ ¬w
F
0
22
¬h ∧ ¬r ∧ ¬w
F
−3
Thanks to Greg Restall for reading a draft of this paper, Mark Burgin for editorial comments/suggestions and two anonymous referees.
Appendix
Adequacy condition results for the value aggregate method.
Let W_{A} stand for the set of states satisfying statement A.
w_{T} denotes the actual state.
S_{*} = T.
Consult Section 5.1 for further terminology.
Theorem A.1
(M1) 0 ≤ Tr(A, T) ≤ 1.
Proof
X = lineup(arraystate(W_{A})). The lowest possible value for sum(X) is 0, when each item of X is the lowest valued state (2^{n} × 0 = 0). The highest possible value for sum(X) is 1, when each item of X is the highest valued state
(2n×nn×2n=1).
Theorem A.2
(M2) Tr(A, T) = 1 iff A = T.
Proof
X = lineup(arraystate(W_{A})). sum(X) = 1 iff each item of X is the highest valued state (i.e., w_{T}). Each item of X is the highest valued state iff the statement being measured is a state description of the actual state.
Theorem A.3
(M3) All true statements do not have the same degree of truthlikeness, all false statements do not have the same degree of truthlikeness.
Proof
Evident with the results from Table 10.
Theorem A.4
(M4) Among true statements, truthlikeness covaries with logical strength:
If A and B are true statements and A ⊢ B, then Tr(B, T) ≤ Tr(A, T).
If A and B are true statements and A ⊢ B and B ⊬ A, then Tr(B, T) < Tr(A, T).
Proof
We will show that (b) holds, since this entails (a).
X1A=arraystates(WA)
X1B=arraystates(WB)
X2A=lineup(X1A)
X2B=lineup(X1B)
Now
where n is the number of propositional variables, the values of the states will be drawn from the following:
0(n×2n),1(n×2n),2(n×2n)…n(n×2n).
Since A and B are true, w_{T} ∈ W_{A} and w_{T} ∈ W_{B}.
since A ⊢ B and B ⊬ A, W_{A} ⊂ W_{B}
So
X2A is going to contain at least 1 more instance of w_{T} than
X2B. Say that
X2B does just have 1 less instance of w_{T} and that this is replaced by an instance of the second highest valued state (which will have a value of
n−1n×2n). This is the upper limit, the closest
X2B will come to having a sum value greater than
X2A, so it will suffice to show that in this case
sum(X2B)<sum(X2A).
The term X[i] denotes position i of array X. Let m be the position at which
X2A[m]≠X2B[m]. So 1 ≤ m ≤ 2^{n} − 1 and for each i ∈ {x ∣ 1 ≤ x ≤ m − 1},
X2A[i]=X2B[i], so the value of sum() is equal for both arrays up to point m.
After and including point m, whilst the remaining elements of
X2A sum up to
(2n−m)×n(n×2n), the remaining elements of
X2B sum up to
n−1(n×2n)+((2n−m)−1)×n(n×2n).
So the final thing to show is that
(2n−m)×n(n×2n)>n−1(n×2n)+((2n−m)−1)×n(n×2n)
(M5) Among false statements, truthlikeness does not covary with logical strength; there are false statements A and B such that A ⊢ B but Tr(A, T) < Tr(B, T).
Proof
Evident with the results from table 10.
Theorem A.6
(M7) If A is a false statement, then Tr(T ∨ A, T) > Tr(A, T)
Proof
Since A is false, it follows that w_{T} ∉ W_{A}. The set of states corresponding to T ∨ A is W_{A}∪{w_{T}}.
Let
X_{2} = lineup(arraystates((W_{A})))
X_{2′} = lineup(arraystates((W_{A} ∪ {w_{T}})))
So we need to show that sum(X_{2′}) > sum(X_{2}).
Say the highest valued element of W_{A} is w_{a} and that w_{a} takes up the last n positions in X_{2}. The addition of w_{T} results in it replacing w_{a} for the last n − 1 positions. Since val(w_{T}) > val(w_{a}), sum(X_{2′}) > sum(X_{2}).
First we show that
Δ∗j<Δmin(A,T)⇒Tr(A ∨Sj,T)>Tr(A,T)
If the antecedent here holds, then (∀w)(w ∈ W_{A} ⊃ val(w_{j}) > val(w)). Therefore the sum of X_{2′} will have a greater value than the sum of X_{2}.
Second we show that
Tr(A ∨Sj,T)>Tr(A,T)⇒Δ∗j<Δmin(A,T)via the contraposition
Δ∗j>Δmin(A,T)⇒Tr(A ∨Sj,T)<Tr(A,T)
If the antecedent here holds, then (∃w)(w ∈ W_{A} ∧ val(w_{j}) < val(w)). Therefore the sum of X_{2′} will have a lower value than the sum of X_{2}, since one instance of w ∈ W_{A} is taken off from the sum() calculations and replaced by the lower valued w_{j}, resulting overall in a lower value.
Theorem A.8
(M9) Let Δ_{*j} < Δ_{*i}. Then Tr(S_{j} ∨ S_{i}, T) decreases when Δ_{*i} increases
Proof
There are two states, w_{j} which corresponds to S_{j} and w_{i} which corresponds to S_{i}, with the value of w_{j} being greater than the value of w_{i}. Let X_{2} = lineup(arraystates({w_{j}, w_{i}})), so Tr(S_{j} ∨ S_{i}) = sum(X_{2}). Let X_{2′} = lineup(arraystates({w_{j}, w_{i′}})), where w_{i′} replaces w_{i} when Δ _{*i} increases. Since w_{i} is replaced by a lower w_{i′}, then sum(X_{2′}) < sum(X_{2}).
Theorem A.9
(M10) Some false statements may be more truthlike than some true statements.
Proof
Evident with the results from table 10.
Theorem A.10
(M12) If Δ_{*j} < Δ_{*i} < Δ_{*k}, then Tr(S_{j} ∨ S_{i} ∨ S_{k}, T) increases when Δ_{*i} decreases.
Proof
This is straightforward; any increase for Tr(S_{i}) means an increase in the value of w_{i}, which means an overall total increase in sum().
Theorem A.11
(M13) Tr(A, T) is minimal, if A consists of the Δ–complements of T.
Proof
If A is the Δ -complement of T then it is the state such that each atom is false. Therefore Tr(A, T) is a minimal 0.
References and NotesFloridiL.Semantic Conceptions of InformationCapurroR.HjorlandB.The concept of informationBurginM.Is Information Some Kind of Data?Proceedings of the Third Conference on the Foundations of Information Science (FIS 2005)PetitjeanM.MDPIBasel, Switzerland2005BurginM.
For a discussion about translating semantic information in general to propositional representation see [35] (p. 146). Fox also understands semantic information in terms of propositions, what he calls the propositional analysis of information [36] (p. 75).
See [22] and [23].
Throughout this paper information yield is synonymous with informativeness.
Bar-HillelY.CarnapR.Semantic InformationBar-HillelY.CarnapR.Sequoiah-GraysonS.The Metaphilosophy of Information
The spaces considered in this paper are finite.
This is based on a canonical example in the truthlikeness literature, which will be looked at in Section 4.
Carnap discusses various types of more sophisticated logical probability measures. See [37] for more.
HintikkaJ.On Semantic InformationHintikkaJ.Surface Information and Depth InformationHintikkaJ.Some Varieties of InformtaionFloridiL.Outline of aTheory of Strongly Semantic Information
Whilst we are using statements (s), Floridi uses situation theoretic infons (σ). This distinction is of no concern for the purposes of this paper, so talk of infons will be replaced by talk of statements here.
This should technically be −1 ≤ f(s) < 0.
Truthlikeness is also referred to as verisimilitude, although some use the two terms distinctively, to refer to a basic distinction in the types of approaches (verisimilitude for content approaches and truthlikeness for likeness approaches). Its origins can be traced back to Popper, who motivated by his philosophy of science, was the first philosopher to take the formal problem of truthlikeness seriously. See [38] for a brief introduction to truthlikeness. [21] and [26] are major and now somewhat classic pieces of work within the truthlikeness enterprise. A great piece of relatively recent work with a good summary of all that has come before it can be found in [32].
OddieG.FloridiL.Is Semantic Information Meaningful Data?FloridiL.In Defence of the Veridical Nature of Information
There is a sense in which true statements might contain misinformation, as will be discussed later.
See [39] for an overview of this many-valued logic. The adoption of such a framework is for instrumental purposes only and is not an endorsement of paraconsistency or dialetheism.
NiiniluotoI.
These conditions will be looked at in Section 5.1.
Claude Shannon's ‘premonition’. See [10] (p. 333).
See appendix for proofs relating to the value aggregate approach.
NiiniluotoI.Review of ‘Likeness to the Truth’WinslettM.ZwartS.
The ability of the CSI inverse probabilistic approach to accommodate the notion of conditional informativeness is another pro.
HanssonS.O.Logic of Belief RevisionFloridiL.Semantic Information and the Correctness Theory of TruthFoxC.HájekA.Interpretations of ProbabilityOddieG.TruthlikenessPriestG.TanakaK.Paraconsistent Logic