Relational Probabilistic Conditionals and Their Instantiations under Maximum Entropy Semantics for First-Order Knowledge Bases

For conditional probabilistic knowledge bases with conditionals based on propositional logic, the principle of maximum entropy (ME) is well-established, determining a unique model inductively completing the explicitly given knowledge. On the other hand, there is no general agreement on how to extend the ME principle to relational conditionals containing free variables. In this paper, we focus on two approaches to ME semantics that have been developed for first-order knowledge bases: aggregating semantics and a grounding semantics. Since they use different variants of conditionals, we define the logic PCI, which covers both approaches as special cases and provides a framework where the effects of both approaches can be studied in detail. While the ME models under PCI-grounding and PCI-aggregating semantics are different in general, we point out that parametric uniformity of a knowledge base ensures that both semantics coincide. Using some concrete knowledge bases, we illustrate the differences and common features of both approaches, looking in particular at the ground instances of the given conditionals.


Introduction
Probabilistic conditional knowledge bases containing conditionals of the form (B|A) [d] with the reading "if A, then B with probability d" are a powerful means for knowledge representation and reasoning when uncertainty is involved [1,2].If A and B are propositional formulas over a propositional alphabet Σ, possible worlds correspond to elementary conjunctions over Σ, where an elementary conjunction is a conjunction containing every element of Σ exactly once, either in non-negated or in negated form.A possible worlds semantics is given by probability distributions over the set of possible worlds, and a probability distribution P satisfies (B|A) [d] if for the conditional probability P (B|A) = P (A∧B) P (A) the relation P (B|A) = d holds.For a knowledge base R consisting of a set of propositional conditionals, P is a model of R if P satisfies each conditional in R. The principle of maximum entropy (ME principle) is a well-established concept for choosing the uniquely determined model of R having maximum entropy.This model is the most unbiased model of R in the sense that it completes the knowledge given by R inductively but adds as little additional information as possible [3][4][5][6][7][8][9].
While for a set of propositional conditionals there is a general agreement about its ME model, the situation changes when the conditionals are built over a relational first-order language.As an illustration, consider the following example.
Example 1 (Elephant Keeper).The elephant keeper example, adapted from [10,11], models the relationships among elephants in a zoo and their keepers.Elephants usually like their keepers, except for keeper Fred.However, elephant Clyde gets along with everyone, and therefore he also likes Fred.The knowledge base R EK consists of the following conditionals: Conditional ek 1 models statistical knowledge about the general relationship between elephants and their keepers, whereas conditional ek 2 represents knowledge about the exceptional keeper Fred and his relationship to elephants in general.Conditional ek 3 models the subjective belief about the relationship between the elephant Clyde and keeper Fred.From a common sense point of view, the knowledge base R EK makes perfect sense: conditional ek 2 is an exception from ek 1 , and ek 3 is an exception from ek 2 .
When trying to extend the ME principle from the propositional case to such a relational setting, a central question is how to interpret the free variables occurring in a conditional.For instance, note that a straightforward complete grounding of R EK yields a grounded knowledge base that can be viewed as a propositional knowledge base.However, this grounded knowledge is inconsistent since it contains both (likes(clyde, fred )|elephant(clyde), keeper (fred ))[0.9] and (likes(clyde, fred )|elephant(clyde), keeper (fred ))[0.05], and no probability distribution P can satisfy both P (likes(clyde, fred )|elephant(clyde), keeper (fred )) = 0.9 and P (likes(clyde, fred )|elephant(clyde), keeper (fred )) = 0.05.
Thus, when extending the ME principle to the relational case with free variables as in R EK , the exact role of the variables has to be specified.There are various approaches dealing with a combination of probabilities with a first-order language (e.g., [12,13]); a comparison and evaluation of some approaches is given [14]).In the following, we focus on two semantics that both employ the principle of maximum entropy for probabilistic relational conditionals, the aggregation semantics [15] proposed by Kern-Isberner and the logic FO-PCL [16] elaborated by Fisseler.While both approaches are related in the sense that they refer to a set of constants when interpreting the variables in the conditionals, there is also a major difference.FO-PCL requires all groundings of a conditional to have the same probability d given in the conditional, and in general, FO-PCL needs to restrict the possible instantiations for the variables occurring in a conditional by providing constraint formulas like U = V or U = a in order to avoid inconsistencies.On the other hand, under aggregation semantics the grounded instances may have distinct probabilities as long as they aggregate to the given probability d, and aggregation semantics is defined only for conditionals without constraint formulas.
In this paper, a logical framework PCI extending aggregation semantics to conditionals with instantiation restrictions and also providing a grounding semantics is proposed.From a knowledge representation point of view, this provides greater flexibility, e.g., when expressing knowledge about individuals known to be exceptional with respect to some relationship.We show that both the aggregation semantics of [15] and the semantics of FO-PCL [16] come out as special cases of PCI, thereby also helping to clarify the relationship between the two approaches.Moreover, we investigate the ME models under PCI-grounding and PCI-aggregating semantics, which are different in general, and we give a condition on knowledge bases ensuring that both ME semantics coincide.
This paper is a revised and extended version of [17] and is organized as follows.In Section 2, we very briefly recall the background of FO-PCL and aggregation semantics.In Section 3, the logic framework PCI is developed and two alternative satisfaction relations for grounding and aggregating semantics are defined for PCI by extending the corresponding notions of [15,16].In Section 4, the maximum entropy principle is employed with respect to these satisfaction relations; we show that the resulting semantics coincide for knowledge bases that are parametrically uniform [11,16].In Section 5, we present and discuss ME distributions for some concrete knowledge bases both under PCI-grounding and PCI-aggregating semantics, and point out their differences and common features, covering in particular the groundings of the given conditionals.Finally, in Section 6 we conclude and point out further work.

Background: FO-PCL and Aggregation Semantics
As already pointed out in Section 1, simply grounding a relational knowledge base R easily leads to inconsistency.Therefore, the logic FO-PCL [11,16] employs instantiation restrictions for the free variables of a conditional.An FO-PCL conditional has additionally a constraint formula determining the admissible instantiations of free variables, and the grounding semantics of FO-PCL requires that all admissible ground instances of a conditional c must have the probability given by c.
Example 2 (Elephant Keeper with instantiation restrictions).In FO-PCL, adding K = fred to conditional ek 1 and E = clyde to conditional ek 2 in R EK yields the knowledge base R EK with: Note that, e.g., the ground instance (likes(clyde, fred )|elephant(clyde), keeper (fred ))[0.05] of conditional ek 2 is not admissible, and that the set of admissible ground instances of R EK is indeed consistent under probabilistic semantics for propositional knowledge bases as considered, e.g., in [8,18].Thus, under FO-PCL semantics, R EK is consistent, where a probability distribution P satisfies an FO-PCL conditional r, denoted by P |= fopcl r, iff all admissible ground instances of r have the probability specified by r.
In contrast, the aggregation semantics, as given in [15], does not consider instantiation restrictions, since its satisfaction relation (in this paper denoted by |= no-ir to indicate no instantiation restriction), is less strict with respect to probabilities of ground instances: P |= no-ir (B|A) [d] iff the quotient of the sum of all probabilities P (B i ∧ A i ) and the sum of P (A i ) is d, where (B 1 |A 1 ), . . ., (B n |A n ) are the ground instances of (B|A).In this way, the aggregation semantics is capable of balancing the probabilities of ground instances, resulting in greater flexibility and higher tolerance with respect to consistency issues.Provided that there are enough individuals so that the corresponding aggregation over all probabilities is possible, the knowledge base R EK that is inconsistent under FO-PCL semantics is consistent under aggregation semantics.

PCI Logic
The logical framework PCI (probabilistic conditionals with instantiation restrictions) uses probabilistic conditionals with and without instantiation restrictions and provides different options for a satisfaction relation.The syntax of PCI given in [19] uses the syntax of FO-PCL [11,16].In the following, we will precisely state the formal relationship among |= no-ir , |= fopcl , and the satisfaction relations offered by PCI.
As FO-PCL, PCI uses function-free, sorted signatures of the form Σ = (S, D, Pred ).In a PCI-signature Σ = (S, D, Pred ), S = {s 1 , . . ., s k } is a set of sort names or just sorts.The set D is a finite set of constants symbols where each d ∈ D has a unique sort s ∈ S. With D (s) we denote the set of all constants having sort s; thus D = s∈S D (s) is a set being the union of (disjoint) sets of sorted constant symbols.Pred is a set of predicate symbols, each having a particular number of arguments.If p ∈ Pred is a predicate taking n arguments, each argument position i must be filled with a constant or variable of a specific sort s i .Thus, each p ∈ Pred comes with an arity of the form s 1 × . . .× s n ∈ S n indicating the required sorts for the arguments.Variables V also have a unique sort, and all formulas and variable substitutions must obey the obvious sort restrictions.In the following, we will adopt the unique names assumption, i.e., different constants denote different elements.The set of all terms is defined as Term Σ := V ∪ D. Let L Σ be the set of quantifier-free first-order formulas defined over Σ and V in the usual way.

Definition 1 (Instantiation Restriction
).An instantiation restriction is a conjunction of inequality atoms of the form t 1 = t 2 with t 1 , t 2 ∈ Term Σ .The set of all instantiation restriction is denoted by C Σ .
Since an instantiation restriction may be a conjunction of inequality atoms, we can express that a conditional has multiple restrictions, e.g., by stating E = clyde ∧ K = fred .Definition 2 (q-, p-, r-Conditional).Let A, B ∈ L Σ be quantifier-free first-order formulas over Σ and V.
1. (B|A) is called a qualitative conditional (or just q-conditional).Note that A is the antecedence and B the consequence of the qualitative conditional.The set of all qualitative conditionals over L Σ is denoted by (L Σ |L Σ ).C Σ be an instantiation restricted conditional.The set of admissible ground substitutions of r is defined as The set of admissible ground instances of r is defined as In the following, when we talk about the ground instances of a conditional, we will always refer to its admissible ground instances.
As for an FO-PCL knowledge base [11], for a PCI knowledge base (Σ, R) we define the Herbrand base H(R) as the set of all ground atoms in all gnd Σ (r i ) with r i ∈ R. Every subset ω ⊆ H(R) is a Herbrand interpretation, defining a logical semantics for R. The set Ω Σ := {ω | ω ⊆ H(R)} denotes the set of all Herbrand interpretations.Herbrand interpretations are also called possible worlds.
Definition 5 (PCI Interpretation).The probabilistic semantics of (Σ,R) is a possible worlds semantics [12] where the ground atoms in H(R) are binary random variables.A PCI interpretation P of a knowledge base (Σ, R) is thus a probability distribution P : Ω Σ → [0, 1].The set of all probability distributions over Ω Σ is denoted by P Ω Σ or just by P Ω .
The PCI framework offers two different satisfaction relations: |= pci is based on grounding as in FO-PCL, and |= pci extends aggregation semantics to r-conditionals.
We say that P satisfies (B|A) As usual, the satisfaction relations |= pci with ∈ { , } are extended to a set of conditionals R by defining P |= pci R iff P |= pci r for all r ∈ R.
The following proposition states that PCI properly captures both the instantiation-based semantics |= fopcl of FO-PCL [11] and the aggregation semantics |= no-ir of [15] (cf.Section 2).
Proposition 1 (PCI captures FO-PCL and aggregation semantics [19]).Let (B|A)[d], C be an r-conditional and let (B|A)[d] be a p-conditional, respectively.Then the following holds:

PCI Logic and Maximum Entropy Semantics
If a knowledge base R is consistent, there are usually many different models satisfying R. The principle of maximum entropy chooses the unique distribution that has maximum entropy among all distributions satisfying a knowledge base R [5,8].Applying this principle to the PCI satisfaction relations |= pci and |= pci yields with being or , and where is the entropy of a probability distribution P .
Example 3 (Misanthrope).The knowledge base R MI = {R 1 , R 2 }, adapted from [11], models friendship relations within a group of people with one exceptional member, a misanthrope.In general, if a person V likes another person U, then it is very likely that U likes V, too.However, there is one person, the misanthrope, who generally does not like other people: Example 3 shows that in general the ME model under PCI-grounding semantics of a knowledge base R differs from its ME model under PCI-aggregation semantics.However, if R is parametrically uniform [11,16], the situation changes.Parametric uniformity of a knowledge base R is introduced in [11] and refers to the fact that the ME distribution under FO-PCL (or PCI-grounding) semantics satisfying a set of m ground conditionals can be represented by a set of just m optimization parameters.A relational knowledge base R is parametrically uniform iff for every conditional r ∈ R, all ground instances of r have the same optimization parameter (see [11,16] for details).For instance, the knowledge base R EK from Example 2 is parametrically uniform, while the knowledge base R MI from Example 3 is not parametrically uniform.Thus, if R is parametrically uniform, just one optimization parameter for each conditional r ∈ R instead of one optimization parameter for each ground instance of r has to be computed; this can be exploited when computing the ME distribution [17].In [20], a set of transformation rules is developed that transforms any consistent knowledge base R into a knowledge base R such that R and R have the same ME model under grounding semantics and R is parametrically uniform.
Using the PCI framework providing both grounding and aggregating semantics for conditionals with instantiation restrictions, the ME models for PCI-grounding and PCI-aggregation semantics coincide if R is parametrically uniform.

Computation and Comparison of Maximum Entropy Distributions
In Example 3 we already presented some concrete probability values for ME distributions.We will now look into more details of the ME distributions obtained from both PCI-grounding and PCI-aggregation semantics.In particular, we will illustrate how the ME distribution for PCI-grounding and PCI-aggregation semantics evolve when transforming a knowledge base that is not parametrically uniform into a knowledge base that is parametrically uniform.

Achieving Parametric Uniformity
While transforming a knowledge base into one that is parametrically uniform [11] does not change its ME model under (FO-PCL or PCI) grounding semantics, it allows for a simpler ME model computation [17].In [20], a set of transformation rules PU is presented allowing to transform any consistent knowledge base R into a parametrically uniform knowledge base PU(R) with the same maximum entropy model under grounding semantics.An implementation of PU [21] is available within the KREATOR environment (KREATOR can be found at http://kreator-ide.sourceforge.net/),an integrated development environment for relational probabilistic logic [22].The CSPU (Conditional Structures and Parametric Uniformity) component [23] of KREATOR generates PU transformation protocols, and a part the protocol for the misanthrope knowledge base R MI from Example 3 is shown in Figure 1.For details of the PU transformation rules we refer to [20]; we just remark here that PU stepwise removes all interactions among the conditionals where an interaction in a knowledge R base indicates that R is not parametrically uniform [20].In each PU transformation step, one conditional R is replaced by two conditionals R 1 , R 2 originating from R. Table 1 illustrates how R MI evolves from R MI = R 1 to R 2 and from R 2 to R 3 = PU(R MI ).

Maximum Entropy Distributions for Grounding and Aggregation Semantics
Using KREATOR we computed the ME distributions for the three knowledge bases R 1 , R 2 , and R 3 involved in the PU transformation of R MI for both PCI-grounding and PCI-aggregation semantics.For all admissible ground instances of the conditionals occurring in R 1 , R 2 and R 3 , we computed their probability under the ME distributions for PCI-grounding and PCI-aggregation semantics.The results are shown in Table 2, using the abbreviation l (x, y) for likes(x, y).
There are three pairwise different ME distributions (i.e., ME (R 1 ), ME (R 2 ), ME (R 3 )) under PCI-aggregation semantics for the three pairwise different knowledge bases R 1 , R 2 , R 3 .On the other hand ME (R 1 ) = ME (R 2 ) = ME (R 3 ) = ME (R 3 ) holds since the PU transformation process does not change the maximum entropy model under PCI-grounding semantics and because R 3 is parametrically uniform.Table 1.Conditionals occurring in R 1 , R 2 , and R 3 given by the PU transformation steps from R MI = R 1 to R 2 and from R 2 to R 3 = PU(R MI ) for R MI from Example 3 (cf.Figure 1) using the abbreviation l (x, y) for likes(x, y).Conditional R 1 in R 1 is replaced by Maximum entropy probabilities of the ground instances of the conditionals in R 1 , R 2 , and R 3 (cf.Table 1) under PCI-aggregation semantics; for PCI-grounding semantics, )(g) holds since the PU transformation process does not change the maximum entropy model under grounding semantics and because R 3 is parametrically uniform.It is interesting to note that for the ground instances originating from R 1 there are two distinct probabilities under ME (R 1 ), three probabilities under ME (R 2 ), and as implied by Proposition 2 one probability under ME (R 3 ).In all cases, PCI-aggregation semantics ensures that the distinct probabilities aggregate to the probability stated in the corresponding conditionals.
For the comparison of PCI-grounding and PCI-aggregation, it is also interesting to compare their ME behavior with respect to queries that are not instances of a conditional given in the knowledge base.For example, for likes(b, c) we observe holds for i ∈ {1, 2, 3} under PCI-grounding semantics.

Conclusions and Further Work
In this paper, we considered maximum entropy based semantics for relational probabilistic conditionals.FO-PCL [16] employs a grounding semantics and uses instantiation restrictions for the free variables occurring in a conditional, requiring all admissible instances of a conditional to have the given probability.Aggregating semantics [15] defines probabilistic satisfaction by interpreting the intended probability of a conditional with free variables only as a guideline for the probabilities of its instances that aggregate to the conditional's given probability, while the actual probabilities for grounded instances may differ.
While the original definition of aggregation semantics [15] considered only conditionals without constraints representing instantiation restrictions, we developed the framework PCI extending aggregation semantics so that instantiation restrictions can also be taken into account, but without giving up the flexibility of aggregating over distinct probabilities.In comparison with [15], under PCI-aggregation semantics one can restrict the set of groundings of a conditional over which aggregating with respect to a conditional takes place by providing a corresponding constraint formula for the conditional.From a knowledge representation point of view, this can be useful in various situations, for instance when we talk about a particular relationship among individuals while already knowing that a specific individual like Clyde is an exception with respect to the given relationship.
Note that PCI captures both grounding semantics and aggregating semantics without instantiation restrictions as special cases.For the case that a knowledge base is parametrically uniform, PCI-grounding and PCI-aggregation semantics coincide when employing the maximum entropy principle, while for a knowledge base that is not parametrically uniform the two ME semantics induce different models in general.We illustrated the differences and common features of both semantics on a concrete knowledge base, using the KREATOR environment for computing the ME models and answering queries with respect to these distributions.We expect that observations of this kind will support the discussion of both formal and common sense properties of probabilistic first-order inference in general and inference according to the principle of maximum entropy in a first-order setting in particular.

2 .
Let (B|A) ∈ (L Σ |L Σ ) be a qualitative conditional and let d ∈ [0, 1] be a real value.Here (B|A)[d]  is called a probabilistic conditional (or just p-conditional) with probability d.The set of all probabilistic conditionals over L Σ is denoted by(L Σ |L Σ ) prob .3. Let (B|A)[d] ∈ (L Σ |L Σ) prob be a probabilistic conditional and let C ∈ C Σ be an instantiation restriction.In addition, (B|A)[d], C is called an instantiation restricted conditional (or just r-conditional).The set of all instantiation restricted conditionals over L Σ is denoted by(L Σ |L Σ ) prob C Σ .Instantiation restricted qualitative conditionals are defined analogously.If it is clear from the context, we may omit qualitative, probabilistic, and instantiation restricted and just use the term conditional.Definition 3 (PCI knowledge base).A pair (Σ, R) consisting of a PCI signature Σ = (S, D, Pred ) and a set of instantiation restricted conditionals R = {r 1 , . . ., r m } with r i ∈ (L Σ |L Σ ) prob C Σ is called a PCI knowledge base.For an instantiation restricted conditional r = (B|A)[d], C , Θ Σ (r) denotes the set of all ground substitutions with respect to the variables in r.A ground substitution θ ∈ Θ Σ (r) is applied to the formulas A, B and C in the usual way, i.e., each variable is replaced by a certain constant according to the mapping θ = {v 1 /c 1 , . . ., v l /c l } with v i ∈ V, c i ∈ D, 1 ≤ i ≤ l.Therefore, θ(A), θ(B), and θ(C) are ground formulas and we have θ((B|A)) := (θ(B)|θ(A)).Given a ground substitution θ over the variables occurring in an instantiation restriction C ∈ C Σ , the evaluation of C under θ, denoted by [[C]] θ , yields true iff θ(t 1 ) and θ(t 2 ) are different constants for all t 1 = t 2 ∈ C. Definition 4 (Admissible Ground Substitutions and Instances).Let Σ = (S, D, Pred ) be a many-sorted signature and let r = (B|A)[d], C ∈ (L Σ |L Σ ) prob
Within the PCI framework, consider R MI together with constants D = {a, b, c} and the corresponding ME distributions ME (R MI ) and ME (R MI ) under PCI-grounding and PCI-aggregation semantics, respectively.Under ME (R MI ), all six ground conditionals emerging from R 1 have probability 0.9, for instance, ME (R MI )(likes(a, b) | likes(b, a)) = 0.9.On the other hand, for the distribution ME (R MI ), we have ME (R MI )(likes(a, b) | likes(b, a)) = 0.46016768 and ME (R MI )(likes(a, c) | likes(c, a)) = 0.46016768, while the other four ground conditionals resulting from R 1 have probability 0.96674480.

Figure 1 .
Figure 1.The KREATOR protocol of the PU transformation steps from R MI = R 1 to R 2 and from R 2 to R 3 = PU(R MI ) for R MI from Example 3.