Next Article in Journal
An Approach for the Generation of an Nth-Order Chaotic System with Hyperbolic Sine
Next Article in Special Issue
Statistical Reasoning: Choosing and Checking the Ingredients, Inferences Based on a Measure of Statistical Evidence with Some Applications
Previous Article in Journal
An Economy Viewed as a Far-from-Equilibrium System from the Perspective of Algorithmic Information Theory
Previous Article in Special Issue
Prior and Posterior Linear Pooling for Combining Expert Opinions: Uses and Impact on Bayesian Networks—The Case of the Wayfinding Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

On the Coherence of Probabilistic Relational Formalisms

Escola Politécnica, Universidade de São Paulo, São Paulo 05508-010, Brazil
*
Author to whom correspondence should be addressed.
Entropy 2018, 20(4), 229; https://doi.org/10.3390/e20040229
Submission received: 22 February 2018 / Revised: 23 March 2018 / Accepted: 24 March 2018 / Published: 27 March 2018
(This article belongs to the Special Issue Foundations of Statistics)

Abstract

:
There are several formalisms that enhance Bayesian networks by including relations amongst individuals as modeling primitives. For instance, Probabilistic Relational Models (PRMs) use diagrams and relational databases to represent repetitive Bayesian networks, while Relational Bayesian Networks (RBNs) employ first-order probability formulas with the same purpose. We examine the coherence checking problem for those formalisms; that is, the problem of guaranteeing that any grounding of a well-formed set of sentences does produce a valid Bayesian network. This is a novel version of de Finetti’s problem of coherence checking for probabilistic assessments. We show how to reduce the coherence checking problem in relational Bayesian networks to a validity problem in first-order logic augmented with a transitive closure operator and how to combine this logic-based approach with faster, but incomplete algorithms.

1. Introduction

Most statistical models are couched so as to guarantee that they specify a single probability measure. For instance, suppose we have N independent biased coins, so that heads has probability p for each one of them. Then, the probability of a particular configuration of all coins is exactly p n ( 1 p ) N n , where n is the number of heads in the configuration. Using de Finetti’s terminology, we can say that the probabilistic assessments and independence assumptions are coherent as they are satisfied by a probability distribution [1]. The study of coherence and its consequences has influenced the foundations of probability and statistics, serving as a subjectivist basis for probability theory [2,3], as a broad prescription for statistical practice [4,5] and generally as a bedrock for decision-making and inference [6,7,8].
In this paper, we examine the coherence checking problem for probabilistic models that enhance Bayesian networks with relations and first-order formulas: more precisely, we introduce techniques that allow one to check whether a given relational Bayesian network, or a given probabilistic relational model is guaranteed to specify a probability distribution. Note that “standard” Bayesian networks are, given some intuitive assumptions, guaranteed to be coherent [9,10,11]. The challenge here is to handle models that enlarge Bayesian networks with significant elements of first-order logic; we do so by resorting to logical inference itself as much as possible. In the remainder of this section, we explain the motivation for this study and the basic terminology concerning it, and at the end of this section, we state our goals and our approach in more detail.
To recap, a Bayesian network consists of a directed acyclic graph, where each node is a random variable, and a joint probability distribution over those variables, such that the distribution and the graph satisfy a Markov condition: each random variable is independent of its non-descendants given its parents. (In a directed acyclic graph, node X is a parent of node Y if there is an edge from X to Y. The set of parents of node X is denoted Pa ( X ) . Similarly, we define the children of a node, the descendants of a node, and so on.)
If all random variables X 1 , , X n in a Bayesian network are categorical, then the Markov condition implies a factorization:
P ( X 1 = x 1 , , X n = x n ) = i = 1 n P ( X i = x i | Pa ( X i ) = π i ) ,
where π i is the projection of { X 1 = x 1 , , X n = x n } on Pa ( X i ) .
Typically, one specifies a Bayesian network by writing down the random variables X 1 , , X n , drawing the directed acyclic graph, and then settling on probability values P ( X i = x i | Pa ( X i ) = π i ) , for each X i , each x i and each π i . By following this methodological guideline, one obtains the promised coherence: a unique joint distribution is given by Expression (1).
The following example introduces some useful notation and terminology.
Example 1.
Consider two neighbors, Mary and Tina . The probability that a house is burglarized is 0.001 in their town. The alarm of a house rings with probability 0.9 given that the house is burglarized and with probability 0.01 given the house is not burglarized. Finally, if either alarm rings, the police are called. This little story, completed with some assumptions of independence, is conveyed by the Bayesian network in Figure 1, where burglary ( x ) means that the house of x (either Mary or Tina ) is burglarized; similarly alarm ( x ) means that the alarm of x’s house rings; and finally calls just means that the police are called by someone.
In this paper, every random variable is binary with values zero and one, the former meaning “ false ” and the latter meaning “ true ”. Furthermore, we often write P ( X ) , where X is a random variable, to mean the event { X = 1 } , and we often write P ( ¬ X ) to mean the event { X = 0 } .
Note also that we use, whenever appropriate, logical expressions with random variable names, such as alarm ( Mary ) burglary ( Tina ) to mean the disjunction of the proposition stating that alarm ( Mary ) is true and the proposition that burglary ( Tina ) is true . A random variable name has a dual use as a proposition name.
From the Bayesian network in Figure 1, we compute P ( alarm ( Mary ) ) = 0.9 × 0.001 + 0.01 × 0.999 = 0.01899 and P ( calls ) = 0.01899 + 0.01899 ( 0.01899 ) 2 .
Here are some interesting scenarios that enhance the previous example:
Example 2.
(Scenario 1) Consider that now we have three people, Mary , Tina and John , all neighbors. We can easily imagine an enlarged Bayesian network, with two added nodes related to John , and a modified definition where calls = alarm ( Mary ) alarm ( Tina ) alarm ( John ) .
(Scenario 2) It is also straightforward to expand our Bayesian network to accommodate n individuals a 1 , a 2 , , a n , all neighbors. We may even be interested in reasoning about calls without any commitments to a fixed n, where calls is a disjunction over all instances of alarm ( x ) . For instance, we have that P ( ¬ calls ) = ( 1 0.01899 ) n ; hence, the probability of a call to the police will be larger than half for a city with more than 36 inhabitants. No single Bayesian network allows this sort of “aggregate” inference.
(Scenario 3) Consider a slightly different situation with three people, where: Mary and Tina are neighbors Tina and John are neighbors, but Mary and John are not neighbors. Suppose also that each person may call the police, depending on neighboring alarms. This new situation is codified into the Bayesian network given in Figure 2.
(Scenario 4) Suppose we want to extend Scenario 3 to a town with n people. Without knowing which pairs are neighbors, there is no way we can predict in advance the structure of the resulting Bayesian network. However, we can reason about the possible networks: for instance, we know that each set of n people produces a valid Bayesian network, without any cycles amongst random variables.
There are many other scenarios where probabilistic modeling must handle repetitive patterns such as the ones described in the previous examples, for instance in social network analysis or in processing data in the semantic web [12,13,14]. The need to handle such “very structured” scenarios has led to varied formalisms that extend Bayesian networks with the help of predicates and quantifiers, relational databases, loops and even recursion [15]. Thus, instead of dealing with a random variable X at a time, we deal with parameterized random variables [16]. We write X ( x ) to refer to a parameterized random variable that yields a random variable for each fixed x in a domain; if we consider individuals a and b in a domain, we obtain random variables X ( a ) and X ( b ) .
Plates offer a popular scheme to manipulate parameterized random variables [17]. A plate is a set of parameterized random variables that share a logical variable, meaning that they are indexed by elements of the same domain. A plate is usually drawn as a rectangle (associated with a domain) containing parameterized random variables. Figure 3 shows simple plate models for the burglary-alarm-call scenario described in Scenario 2 of Example 2.
Plates appeared with the BUGS package, to facilitate the specification of hierarchical models, and have been successful in applications [18]. One restriction of the original plate models in the BUGS package is that a parameterized random variable could not have children outside of its enclosing plate. However, in practice, many plate models violate this restriction. Figure 3 depicts a partial plate model that satisfies the restriction of the original BUGS package (left), and the plate model that violates it (right). Note that as long as the graph consisting of parameterized random variables is acyclic, we know that every Bayesian network generated from the plate model is indeed consistent.
Several other combinations of parameterized random variables and graph-theoretical representations have been proposed, often grouped under the loose term “Probabilistic Relational Model (PRM)” [10,19,20]. Using PRMs, one can associate parameterized random variables with domains, impose constraints on domains and even represent limited forms of recursion [19,21]. A detailed description of PRMs is given in Section 4; for now, it suffices to say that a PRM is specified as a set of “classes” (each class is a set of individuals), where each class is associated with a set of parameterized random variables and additionally by a relational database that gives the relations amongst individuals in classes. The plate model in Figure 3 (left) can be viewed as a diagrammatic representation for a minimalistic PRM, where we have a class Person containing parameterized random variables. Note that such a minimalistic PRM with a single class Person cannot encode Scenario 4 in Example 2, as in that scenario, we have pairs of interacting individuals.
Suppose that we want a PRM to represent Scenario 4 in Example 2. Now, the class Person must include parameterized random variables burglary , alarm and calls . The challenge is how to indicate which Person s are parents of a particular calls ( x ) . To do so, one possibility is to introduce another class, say Neighborhood , where each element of Neighborhood refers to two elements of Person . In Section 4 we show how the resulting PRM can be specified textually; for now, we want to point out that finding a diagrammatic representing this PRM is not an obvious matter. Using the scheme suggested by Getoor et al. [19], we might draw the diagram in Figure 4. There, we have a class Person , a class Neighborhood and a “shadow” class Person that just indicates the presence of a second Person in any Neighborhood pair. Dealing with all possible PRMs indeed requires a very complex diagrammatic language, where conditional edges and recursion can be expressed [21].
Instead of resorting to diagrams, one may instead focus just on textual languages to specify repetitive Bayesian networks. A very solid formalism that follows this strategy is Jaeger’s Relational Bayesian Networks (RBNs) [22,23]. In RBNs, relations within domains are specified using a first-order syntax as input, returning an output that can be seen as a typical Bayesian network. For instance, using syntax that will be explained later (Section 2), one can describe Scenario 4 in Example 2 with the following RBN:
  • burglary(x) = 0.001;
  • alarm(x) = 0.9 * burglary(x) + 0.01 * (1-burglary(x));
  • calls(x) = NoisyOR { alarm(y) | y; neighbor(x,y) };
One problem that surfaces when we want to use an expressive formalism, such as RBNs or PRMs, is whether a particular model is guaranteed to always produce consistent Bayesian networks. Consider a simple example [19].
Example 3.
Suppose we are modeling genetic relationships using the parameterized random variable gene ( x ) , for any person x. Now, the genetic features of x depend on the genetic features of the mother and the father of x. That is, we want to encode:
If y and z are such that motherOf ( y , x ) and fatherOf ( z , x ) are true , then the probability of gene ( x ) depends on gene ( y ) and gene ( z ) .
It we try to specify a PRM for this setting, we face a difficulty in that some instances of gene could depend on other instances of the same parameterized random variable. Indeed, consider drawing a diagram for this PRM, using the conventions suggested by Getoor et al. [19]. We would need a class Person , containing parameterized random variable gene , and two shadow classes, one for the father and one for the mother; a fragment of the diagram is depicted in Figure 5. If we could have a Person that appears as the father of his own father, we would have a cycle in the generated Bayesian network. Of course, we know that such a cycle can never be generated because neither the transitive closure of motherOf , nor of fatherOf can contain a cycle. However, just by looking at the diagram, without any background understanding of motherOf and fatherOf , we cannot determine whether coherence is guaranteed.
The possibility that RBNs and PRMs may lead to cyclic (thus inconsistent) Bayesian networks has been noticed before. Jaeger [23] suggested that checking whether an RBN always produces consistent Bayesian networks, for a given class of domains, should be solved by logical inference, being reducible to deciding the validity of a formula from first-order logic augmented with a transitive closure operator. This path has been explored by De Bona and Cozman [24], yielding theoretical results of very high computational complexity. On a different path, Getoor et al. [19] put forward an incomplete, but more intuitive way of ensuring coherence for their PRMs; in fact, assuming that some sets of input relations never form cycles, one can easily identify a few cases where coherence is guaranteed.
Thus, we have arrived at the problems of interest in this paper: Suppose we have an RBN or a PRM. Is it coherent in the sense that it can be always satisfied by a probability distribution? Is it coherent in the sense that it always produces a unique probability distribution? Such are the questions we address, by exploring a cross-pollination of ideas described in the previous paragraph. In doing so, we bring logical rigor to the problem of coherence of PRMs and present a practical alternative to identifying coherent RBNs.
After formally introducing relational Bayesian networks in Section 2, we review, in Section 3, how its coherence problem can be encoded in first-order logic by employing a transitive closure operator. Section 4 presents PRMs and the standard graph-based approach to their coherence checking. The logical methods developed for the coherence problem of RBNs are adapted to PRMs in Section 5. Conversely, in Section 6, we adapt the graph techniques presented for tackling the coherence of PRMs to the formalism of RBNs.

2. Relational Bayesian Networks

In this section, we briefly introduce the formalism of Relational Bayesian Networks (RBNs). We use the version of RBNs presented in [23], as that reference contains a thorough exposition of the topic.
Let S and R be disjoint sets of relation symbols, called the predefined relations and probabilistic relations, respectively. We assume that S contains the equality symbol =, to be interpreted in the usual way. Each predicate symbol is associated with a positive integer k, which is its arity. Given a finite domain D = { d 1 , , d n } , if V is a set of relation symbols (as R or S), a V-structure D is an interpretation of the symbols in V into sets of tuples in D. Formally, a V-structure D maps each relation symbol v V with arity k into a subset of D k . We denote by Mod D ( V ) the set of all V-structures over a given finite domain D. Given a domain D, a v V with arity k and a tuple t D k , v ( t ) is said to be a ground V-atom. A V-structure D defines truth values for ground atoms: if v is mapped to a relation containing t, we say that v ( t ) is satisfied by D , which is denoted by D v ( t ) .
Employing the syntax of function-free first-order logic, we can construct formulas using a vocabulary of relations V, together with variables, quantifiers and Boolean connectives. We call these V-formulas, and their meaning is given by the first-order logic semantics, as usual, through the V-structures. We denote by φ ( x 1 , , x m ) a V-formula where x 1 , , x k are free variables, in the usual sense. If φ is a V-formula and D is a V-structure, D φ denotes that φ is satisfied by D .
A random relational structure model for S and R is a partial function P that takes an S-structure D , over some finite domain D, and returns a probability distribution P ( D ) : Mod D ( R ) [ 0 , 1 ] over the R-structures on the same domain. As an R-structure can be seen as total assignment over the ground R-atoms, P ( D ) can be seen as a joint probability distribution over these ground atoms. An example of random relational structure model would be a function P S 4 in Scenario 4 of Example 2 that receives an S-structure of neighbors and returns a joint probability distribution over ground atoms for burglary ( · ) , alarm ( · ) , calls ( · ) . In that scenario, a given configuration D of neighbors, over a given domain D, implies a specific Bayesian network whose variables are the ground atoms for burglary ( · ) , alarm ( · ) , calls ( · ) , which encodes a joint probability distribution, P S 4 ( D ) , over these variables. If D is the configuration of neighbors from Scenario 4 of Example 2, P S 4 ( D ) would be captured by the Bayesian network in Figure 2.
Relational Bayesian networks provide a way to compactly represent random relational structure models. This is achieved by mapping each S-structure into a ground Bayesian network that encodes a probability distribution over R-structures. To begin, this ground Bayesian network has nodes representing r ( t ) (ground atoms), for each r R and t D k , where k is the arity of r. Thus, given the domain D of the input S-structure, the nodes in the corresponding Bayesian network are already determined. To define the arcs and parameters of the Bayesian network associated with an arbitrary S-structure, relational Bayesian networks employ their central notion of probability formula.
Probability formulas are syntactical constructions intended to link the probability of a ground atom r ( t ) to the probabilities of other ground atoms r ( t ) , according to the S-structure. Once an R-structure and an S-structure are fixed, then for elements t 1 , , t k in the domain D, a probability formula F ( t 1 , , t k ) should evaluate to a number in [ 0 , 1 ] .
The definition of probability formulas makes use of combination functions, which are functions from finite multi-sets over the interval [ 0 , 1 ] to numbers in the same interval. We use { | · | } to denote multi-sets. For instance, NoisyOR is a combination function such that, if c 1 , , c n [ 0 , 1 ] , NoisyOR { | c 1 , , c n | } = 1 i = 1 n ( 1 c i ) .
Definition 1.
Given disjoint sets S and R of relation symbols and a tuple x of k variables, F ( x ) is a (S,R)-probability formula if:
  • (constants) F ( x ) = c for a c [ 0 , 1 ] ;
  • (indicator functions) F ( x ) = r ( x ) for an r R with arity k;
  • (convex combinations) F ( x ) = F 1 ( x ) F 2 ( x ) + ( 1 F 1 ( x ) ) F 3 ( x ) , where F 1 ( x ) , F 2 ( x ) , F 3 ( x ) are probability formulas, or;
  • (combination functions) F ( x ) = comb { | F 1 ( x , y ) , , F m ( x , y ) | y ; φ ( x , y ) | } , where comb is a combination function, F 1 ( x , y ) , , F m ( x , y ) are probability formulas, y is a tuple of variables and φ ( x , y ) is an S-formula.
Relational Bayesian networks associate a probability formula F r ( x ) to each probabilistic relation r R , where x is a tuple of k variables, with k the arity of r :
Definition 2.
Given disjoint sets of relation symbols S and R, the predefined and probabilistic relations, a relational Bayesian network is a set Φ = { F r ( x ) r R } , where each F r ( x ) is a (S,R)-probability formula.
To have an idea of how probability formulas work, consider a fixed S-structure D S over a domain D. Then, an R-structure D R over D entails a numeric value for each ground probability formula F r ( t ) , denoted by F r ( t ) [ D R ] , where t is tuple of elements in D. This is done inductively, by initially defining r ( t ) [ D R ] = 1 if D R r ( t ) , and r ( t ) [ D R ] = 0 otherwise, for each r ( t ) , for all r R . If F r ( x ) = c , then F r ( t ) [ D R ] = c , for any tuple t. The numeric value of F r ( t ) [ D R ] for probability formulas that are convex combinations or combination function will require the evaluation of its subformulas F i , which recursively end at the evaluation of ground atoms r ( t ) or constants c. As the set of ground atoms whose evaluation is needed to compute F r ( t ) [ D R ] depends only on the S-structure D S , and not on D R , it is denoted by α ( F r ( x ) , t , D S ) and can be defined recursively:
  • α ( c , t , D S ) = ;
  • α ( r ( x ) , t , D S ) = { r ( t ) } ;
  • α ( F 1 ( x ) F 2 ( x ) + ( 1 F 1 ( x ) ) F 3 ( x ) , t , D S ) = i = 1 3 α ( F i ( x ) , t , D S ) ;
  • α ( comb { | F 1 ( x , y ) , , F m ( x , y ) | y ; φ ( x , y ) | } , t , D S ) is given by:
    t   s . t .   D S φ ( t , t ) i = 1 m α ( F i ( x , y ) , ( t , t ) , D S ) .
    Here, ( t , t ) denotes the concatenation of the tuples t and t .
For a given S-structure D S , we can define a dependency relation between the nodes r ( t ) and r ( t ) in the Bayesian network via the probability formulas F r and F r by employing the corresponding α ( · , · , · ) . Intuitively, α ( F r ( x ) , t , D S ) contains the ground atoms r ( t ) whose truth value in a structure D R determines the value of F r ( t ) , which is meant to be the probability of r ( t ) . That is, α ( F r ( x ) , t , D S ) contains the parents of r ( t ) .
Definition 3.
Relation ⪯, over ground R-atoms, is defined as follows:
r ( t ) r ( t ) i f f r ( t ) α ( F r ( x ) , t , D S ) .
When this relation is acyclic, a relational Bayesian network Φ = { F r r R } defines, for a given S-structure D S over a finite domain D, a probability distribution over the R-structures D R over D via:
P D S Φ ( D R ) = r R t , D R r ( t ) F r ( t ) [ D R ] t , D R r ( t ) ( 1 F r ( t ) [ D R ] )
Example 4.
Scenario 4 of Example 2: We can define a relational Bayesian network that returns the corresponding Bayesian network for each number and configuration of neighbors. Let S = { neighbor ( · , · ) } and R = { burglary ( · ) , alarm ( · ) , calls ( · ) } . We assume that the relation neighbor is reflexive and symmetrical. For each relation in R, we associate a probability formula, forming the relational Bayesian network Φ:
  • F burglary ( x ) = 0 . 001 ; a constant;
  • F alarm ( x ) = 0 . 9 burglary ( x ) + 0.01 ( 1 burglary ( x ) ) ; a convex combination;
  • F call ( x ) = NoisyOR { | alarm ( y ) y ; neighbor ( x , y ) | } ; a combination function.
Note that, if F 1 ( x ) and F 2 ( x ) are probability formulas, then 1 F 1 ( x ) and F 1 ( x ) F 2 ( x ) are convex combinations and, thus, probability formulas. As the inputs of the NoisyOR above are in { 0 , 1 } , the combination function actually works like a disjunction.
Given an S-structure D S over a domain D, Φ determines a joint probability distribution over the ground R-atoms, via a Bayesian network. If we take an S-structure D S over a domain D = { d 1 , d 2 , d 3 } such that D S neighbor ( d 1 , d 2 ) neighbor ( d 2 , d 3 ) , but D S neighbor ( d 1 , d 3 ) , the resulting P D S Φ is the model for Scenario 3 in Example 2, whose Bayesian network is given in Figure 2.

3. The Coherence Problem for RBNs

It may happen for a relational Bayesian network Φ that some S-structures yield a cyclic dependency relation ⪯. When the relation ⪯ is cyclic for an S-structure, no probability distribution is defined over the R-structures. In such a case, we say Φ is incoherent for that S-structure. This notion can be generalized to a class of S-structures S , so that we say that Φ is coherent for S iff the resulting relation ⪯ is acyclic for each S-structure in S . To know whether a relational Bayesian network is coherent for a given class of S-structures is precisely one of the problems we are interested in this work.
In order to reason about the relation between a class of S-structures and the coherence of a relational Bayesian network Φ for it, we need to formally represent these concepts. To define a class of S-structures, note that they can be seen as first-order structures over which S-formulas are interpreted. That is, an S-formula defines the set of S-structures satisfying it. If φ is a closed S-formula (without free variables), we say that [ [ φ ] ] is the set of S-structures D S such that D S φ . We denote by θ S an S-formula satisfied exactly by the S-structures in a given class S ; that is, [ [ θ S ] ] = S .
To encode the coherence of Φ , we need to encode the acyclicity of the dependency relation ⪯ resulting from an S-structure. Ideally, we would like to have a (first-order) S-formula, say ψ Φ , that would be true only for S-structures yielding acyclic dependency relations ⪯. If that formula were available, a decision about the coherence of Φ for the class S would be reduced to a decision about the validity of the first-order formula θ S ψ Φ : When the formula is valid, then every S-structure in the class S guarantees that the resulting dependency relation ⪯ for Φ is acyclic; hence, Φ is coherent for S ; otherwise, there is an S-structure in S yielding a cyclic dependency relation ⪯ for Φ . Note that for S-formulas, only S-structures matter, and we could ignore any relation not in S. To be precise, if a first-order structure D falsifies θ S ψ Φ , then there is an S-structure D S (formed by ignoring non-S relations) falsifying it.
Alas, to encode cycles in a graph, one needs to encode the notion of path, which is the transitive closure of a relation encoding arcs. It is a well-known fact that first-order logic cannot express transitivity. To circumvent that, we can add a (strict) transitive closure operator to the logic, arriving at the so-called transitive closure logics, as described for instance in [25].
This approach was first proposed by Jaeger [23], who assumed one could write down the S-formula ψ Φ by employing a transitive closure operator. He conjectured that with some restrictions on the arity of the relations in S and R, one could hope to obtain a formula θ S ψ Φ that is decidable. Nevertheless, no hint was provided as to how to construct such a formula, or as to its general shape. A major difficulty is that, if an S-structure D satisfying θ S has domain D = { d 1 , , d n } , the size of the resulting Bayesian network is typically greater than n, with one node per ground atom, so a cycle can also contain more nodes than n. There seems to be no direct way of employing the transitive closure operator to devise a formula ¬ ψ Φ that encodes cycles with more than n nodes and that is to be satisfied by some structures D over a domain with only n elements. In the next sections, we review a technique (introduced by the authors in [24]) to encode ψ Φ for an augmented domain, through an auxiliary formula whose satisfying structures will represent both the S-structure and the resulting ground Bayesian network. Afterwards, we adapt the formula θ S accordingly.

3.1. Encoding the Structure of the Ground Bayesian Network

Our idea to construct a formula ψ Φ , for a given relational Bayesian network Φ , is first to find a first-order V-formula B Φ , for some vocabulary V containing S, that is satisfiable only by V-structures that encode both an S-structure D S and the structure of the ground Bayesian network resulting from it. These V-structures should contain, besides an S-structure D S , an element for each node in the ground Bayesian network and a relation capturing its arcs. Then, we can use a transitive closure operator to define the existence of paths (and cycles) via arcs, for enforcing acyclicity by negating the existence of a cycle.
Suppose we have two disjoint vocabularies S and R = { r 1 , , r m } of predefined and probabilistic relations, respectively. We use a ( v ) to denote the arity of a relation v. Consider a relational Bayesian network Φ = { F r ( x ) r R } , where each F r ( x ) is a (S,R)-probability formula. Let D be a V-structure satisfying B Φ . We want D to be defined over a bipartite domain D = D S D B , where D S is used to represent an S-structure D S and D B = D D S is the domain where the structure of the resulting ground Bayesian network is encoded. We overload names by including in V a unary predicate D S ( · ) that shall be true for all and only the elements in D S . The structure D shall represent the structure of the ground Bayesian network B Φ ( D S ) , over the elements of D B , that is induced by the S-structure D S codified in D S . In order to accomplish that, D must have an element in D B for each ground atom over the domain D S . Furthermore, the V-structure D must interpret a relation, say P a r e n t ( · , · ) , over D B according to the arcs of the Bayesian network B Φ ( D S ) .
Firstly, we need to define a vocabulary V that includes the predefined relations in S and contains the unary predicate D S (recall that the equality symbol (=) is included in S). Furthermore, V must contain a binary relation P a r e n t to represent the arcs of the ground Bayesian network. As auxiliary relations for defining P a r e n t , we will need a relation D e p i j , for each pair r i , r j R , whose arity is a ( r i ) + a ( r j ) . For elements in D B to represent ground atoms r ( t 1 , , t n ) , we use relations to associate elements in D B to relations r and to tuples t 1 , , t n . For each relation r i R , we have a unary relation r ¯ i V , where r ¯ i ( x ) is intended to mean that the element x D B represents a ground atom of the form r i ( · ) . As for the tuples, recall that each t i represents an element in the set D S over which the S-structure D S is codified. Hence, we insert in V binary relations t i for every 1 i max i a ( r i ) , such that t i ( x , y ) should be true iff the element x D B corresponds to a ground atom r ( t 1 , , t k ) where t i = y , for a y D S and some r R .
To save notation, we use R i ( x , y 1 , , y k ) to denote r ¯ i ( x ) t 1 ( x , y 1 ) t k ( x , y k ) henceforth, meaning the element x in the domain represents the ground atom r i ( y 1 , , y k ) , where a ( r i ) = k .
Now, we proceed to list, step-by-step, the set of conjuncts required in ψ Φ , together with their meaning, for the V-structure D in [ [ ψ Φ ] ] to hold the desired properties. To illustrate the construction, each set of conjuncts is followed by an example based on the RBN in Example 4, possibly given in an equivalent form for clarity.
We have to ensure that the elements in D B correspond exactly to the ground atoms in the ground Bayesian network B Φ ( D S ) .
  • Each element in D B = D D S should correspond to a ground atom for some r i R . Hence, we have the formula:
    x ¬ D S ( x ) i = 1 m r ¯ i ( x ) .
    x ¬ D S ( x ) burglary ¯ ( x ) alarm ¯ ( x ) calls ¯ ( x ) .
  • No element may correspond to ground atoms for two different r i R . Therefore, the formula below is introduced:
    x 1 i , j m i j ( ¬ r ¯ i ( x ) ¬ r ¯ j ( x ) ) .
    x ( ¬ burglary ¯ ( x ) ¬ alarm ¯ ( x ) ) ( ¬ burglary ¯ ( x ) ¬ calls ¯ ( x ) ) ( ¬ alarm ¯ ( x ) ¬ calls ¯ ( x ) ) .
  • Each element corresponding to a ground atom should correspond to exactly one tuple. To achieve that, let k = max j a ( r j ) , and introduce the formula below:
    x y z j = 1 k ( t j ( x , y ) t j ( x , z ) y = z ) .
    x y z ( t 1 ( x , y ) t 1 ( x , z ) y = z .
  • Each element corresponding to a ground atom for a r i R should be linked a to tuple with arity a ( r i ) . Thus, let k = max j a ( r j ) , and introduce the formula below for each r i R :
    x r ¯ i ( x ) ( y 1 y a ( r i ) R i ( x , y 1 , , y a ( r i ) ) z ¬ t a ( r i ) + 1 ( x , z ) ¬ t k ( x , z ) ) .
    x burglary ¯ ( x ) ( y t 1 ( x , y ) ) ; x alarm ¯ ( x ) ( y t 1 ( x , y ) ) ; x calls ¯ ( x ) ( y t 1 ( x , y ) ) .
  • Only elements in D B = D D S should correspond to ground atoms. This is enforced by the following formula, where k = max i a ( r i ) :
    y D S ( y ) ( i = 1 m ¬ r ¯ i ( y ) x j = 1 k ¬ t j ( y , x ) ) .
    y D S ( y ) ( ¬ burglary ¯ ( y ) ¬ alarm ¯ ( y ) ¬ calls ¯ ( y ) x ¬ t 1 ( y , x ) ) .
  • Each ground atom must be represented by at least one element (in D B = D D S ). Therefore, for each r i R , with a ( r i ) = k , we need a formula:
    y 1 y k D S ( y 1 ) D S ( y k ) x R i ( x , y 1 , , y k ) .
    y D S ( y ) ( x 1 burglary ¯ ( x 1 ) t 1 ( y , x 1 ) ) ; same for alarm ¯ and calls ¯ .
    These formulas enforce that each ground atom r ( t ) is represented by an element x that is in D B , due to the formula (6).
  • No ground atom can be represented by two different elements. Hence, for each r i R , with a ( r i ) = k , we introduce a formula:
    y 1 , y k x z R i ( x , y 1 , , y k ) R i ( z , y 1 , , y k ) x = z .
    y x z burglary ¯ ( x ) t 1 ( y , x ) burglary ¯ ( z ) t 1 ( z , y ) x = z ; same for alarm ¯ and calls ¯ .
The conjunction of all formulas in (2)–(8) is satisfied only by structures D over the domain D = D S D B such that there is a bijection between D B and the set of all possible ground atoms { r ( t ) for some r R and t D S a ( r ) } . Now, we can put the arcs over these nodes to complete the structure of the ground Bayesian network B Φ ( D S ) .
The binary relation P a r e n t must hold only between elements in the domain D representing ground atoms r ( t ) and r ( t ) such that r ( t ) r ( t ) . Recall that the dependency relation ⪯ is determined by the S-structure D S . While the ground atoms represented in D B , for a fixed R, are determined by the size of D S by itself, the relation P a r e n t between them depends also on the S-formulas that hold for the S-structure D S . We want these S-structures to be specified by D over D S only, not over D B . To ensure this, we use the following group of formulas:
  • For all s S , consider the formula below, where a ( s ) = k :
    y 1 y k s ( y 1 , , y k ) D S ( y 1 ) D S ( y k ) .
    y 1 y 2 neighbor ( y 1 , y 2 ) D S ( y 1 ) D S ( y 2 ) .
The formula above forces that s ( t ) , for any s S , can be true only for tuples t D S a ( s ) .
For a known S-structure D S , it is straightforward to determine which ground atoms r ( t ) are the parents of r ( t ) in the ground Bayesian network B Φ ( D S ) . One can use recursively the definition of the set of parents α ( F r ( x ) , t , D S ) given in Section 2. Nonetheless, with an unknown S-structure D S specified in D over D S , the situation is a bit trickier. The idea is to construct, for each pair r i ( t ) and r j ( t ) , an S-formula D e p i j ( t , t ) that is true iff r i ( t ) r j ( t ) for the D S encoded in D . To define D e p i j ( t , t ) , we employ auxiliary formulas C F ( t ) r ( t ) , for a ground probability formula F ( t ) and a ground atom r ( t ) , that will be an S-formula that is satisfied by D iff r ( t ) α ( F ( x ) , t , S ) . We define C F ( t ) r ( t ) recursively, starting from the base cases.
  • If F ( t ) = c , for a c [ 0 , 1 ] , then C F ( t ) r ( t ) = ; e . g . , C F burglary ( t ) alarm ( t ) = .
  • If F ( t ) = r ( t ) , then C F ( t ) r ( t ) = ( t = t ) if r = r ; and C F ( t ) r ( t ) = otherwise;
    e . g . , C F burglary ( t ) burglary ( t ) = ( t = t ) and C F burglary ( t ) calls ( t ) = .
Above, ( t = t ) is a short form for ( t 1 = t 1 ) ( t k = t k ) , where k is the arity of t. These base cases are in line with the recursive definition of α ( F ( x ) , t , S ) presented in Section 2. The third case is also straightforward:
  • If F ( t ) = F 1 ( t ) F 2 ( t ) + ( 1 F 1 ( t ) ) F 3 ( t ) , then C F ( t ) r ( t ) = i = 1 3 C F i ( t ) r ( t ) .
    C F alarm ( t ) burglary ( t ) = C F burglary ( t ) burglary ( t ) C 0 . 9 burglary ( t ) C 0 . 01 burglary ( t ) = ( t = t )
In other words, the computation of F ( t ) [ D R ] depends on r ( t ) [ D R ] , for some D R , if the computation of some F i ( t ) [ D R ] , for 1 i 3 , depends on r ( t ) [ D R ] .
The more elaborated case happens when F ( x ) is a combination function, for which there is an S-formula involved. Recall that if F ( x ) = comb { | F 1 ( x , y ) , , F m ( x , y ) | y ; φ ( x , y ) | } , then the parents of F ( t ) are given by t , D S φ ( t , t ) i = 1 m α ( F i ( x , y ) , ( t , t ) , D S ) . Thus, to recursively define C F ( t ) r ( t ) , we need an S-formula that is satisfied by an S-structure D S iff:
r ( t ) t , D S φ ( t , t ) i = 1 m α ( F i ( x , y ) , ( t , t ) , D S ) .
The inner union is analogous to the definition of C F ( t ) r ( t ) for convex combinations. However, to cope with any t such that D S φ ( t , t ) , we need an existential quantification:
  • If F ( x ) = comb { | F 1 ( x , y ) , , F m ( x , y ) | y ; φ ( x , y ) | } , then we have that:
    C F ( t ) r ( t ) = t φ ( t , t ) i = 1 m C F i ( t , t ) r ( t ) .
    C F calls ( t ) alarm ( t ) = t neighbor ( t , t ) C F alarm ( t ) alarm ( t ) = t neighbor ( t , t ) ( t = t )
Now, we can employ the formulas C F ( t ) r ( t ) to define the truth value of the ground relation D e p i j ( t , t ) , that codifies when r i ( t ) r j ( t ) .
  • For each pair r i , r j R , with a ( r i ) = k and a ( r j ) = k , we have the formula:
    x 1 x k y 1 y k D e p i j ( x 1 , , x k , y 1 , , y k ) C F r i ( x 1 , , x k ) r j ( y 1 , , y k ) .
    x y D e p calls alarm ( x , y ) z neighbor ( x , z ) ( z = y ) ; x y D e p alarm burglary ( x , y ) ( x = y ) .
In the formula above, C F r i ( x 1 , , x k ) r j ( y 1 , , y k ) has free variables x 1 , , x k , y 1 , , y k and is built according to the four recursive rules that define C F ( t ) r ( t ) , replacing the tuples t and t by x and y. We point out that such construction depends only on probability formulas in the relational Bayesian network Φ , and not on any S-structure. To build each C F r i ( x ) r j ( y ) , one just starts from the probability formula F r i ( x ) and follows the recursion rules until reaching the base cases, when C F r i ( x ) r j ( y ) will be formed by subformulas like , , S-formulas φ ( · ) and equalities ( · = · ) , possibly quantified on variables appearing in φ .
The relation P a r e n t ( · , · ) is defined now over elements that represent ground atoms r i ( t ) and r j ( t ) such that D e p i j ( t , t ) , meaning that r i ( t ) r j ( t ) . This can be achieved in two parts: ensuring that each r i ( t ) r j ( t ) implies P a r e n t ( t , t ) ; and guaranteeing that P a r e n t ( t , t ) is true only if r i ( t ) r j ( t ) for a pair of relations r i , r j .
  • For each pair r i , r j R , with a ( r i ) = k and a ( r j ) = k , let y and y denote y 1 , , y k and y 1 , , y k , respectively:
    x x y 1 y k y 1 y k R i ( x , y ) R j ( x , y ) D e p i j ( y , y ) P a r e n t ( x , x ) .
    x x y y calls ( x ) t 1 ( y , x ) alarm ( x ) t 1 ( y , x ) D e p calls alarm ( y , y ) P a r e n t ( x , x ) .
  • Let k = max j a ( r j ) be the maximum arity in R, and let y and y denote the tuples y 1 , , y a ( r i ) and y 1 , , y a ( r j ) , respectively:
    x x P a r e n t ( x , x ) y 1 y k y 1 y k 1 i , j m R i ( x , y r i ) R j ( x , y r j ) D e p i j ( y r i , y r j ) .
Definition 4.
Given disjoint sets of relations S and R and a relational Bayesian network Φ = { F r i r i R } , the formula B Φ is the conjunction of all formulas in (2)–(12).
For some fixed relational Bayesian networks Φ , the formula B Φ is satisfied only by V-structures D over a bipartite domain D S D B such that:
  • the relations in S are interpreted in D S , forming an S-structure D S ;
  • there is a bijection b between the domain D B = D D S and set of all ground R-atoms formed by the tuples in D S ;
  • each x D B is linked exactly to one r i R , via the predicate r ¯ i ( x ) , and exactly k = a ( r i ) elements in D S , via the relations t 1 ( x , . ) , t k ( x , . ) , and no ground atom is represented through these links twice;
  • the relation P a r e n t ( · , · ) is interpreted as arcs in D B in such a way that D B , P a r e n t form a directed graph that is the structure of the ground Bayesian network B Φ ( D S ) .

3.2. Encoding Coherence via Acyclicity

The original formula ψ Φ was intended to capture the coherence of the relational Bayesian network Φ . Our idea is to check the coherence by looking for cycles in the ground Bayesian network B Φ ( D S ) encoded in any V-structure satisfying B Φ . Hence, we replace ψ Φ by an implication B Φ ψ , which is to be satisfied only by V-structures D such that, if D represents an S-structure D S and the resulting ground Bayesian network B Φ ( D S ) , then B Φ ( D S ) is acyclic. Thus, ψ should avoid cycles of the relation P a r e n t in the V-structures satisfying it.
There is a cycle with P a r e n t -arcs in a V-structure D over a domain D iff there exists a x D such that there is a path of P a r e n t -arcs from x to itself. Consequently, detecting P a r e n t -cycles reduces to computing P a r e n t -paths or P a r e n t -reachability. We say y is P a r e n t -reachable from x, in a V-structure D , if there are z 0 , , z k D such that x = z 0 , y = z k , and D 1 i k P a r e n t ( z i 1 , z i ) . Thus, for each k, we can define reachability through k P a r e n t -arcs: P a r e n t P a t h k ( x , y ) = z 0 z k ( z 0 = x ) ( z k = y ) 1 i k P a r e n t ( z i 1 , z i ) . Unfortunately, the size of the path (k) is unbounded a priori, as the domain D can be arbitrarily large. Therefore, there is no means in the first-order logic language to encode reachability, via arbitrarily large paths, with a finite number of formulas. In order to circumvent this situation, we can resort to a transitive closure logic.
Transitive closure logics enhance first-order logics with a transitive closure operator TC that we assume to be strict [25]. If φ ( x , y ) is a first-order formula, TC ( φ ) ( x , y ) means that y is φ -reachable from x, with a non-empty path. Accordingly, a V-structure D , over a domain D, satisfies TC ( φ ) ( x , y ) iff there is a k N and there are z 0 , , z k D such that x = z 0 , y = z k and D 1 i k φ ( z i 1 , z i ) .
Employing the transitive closure operator, the existence of a P a r e n t -path from a node x to itself (a cycle) is encoded directly by TC ( P a r e n t ) ( x , x ) ; similarly, the absence of a P a r e n t -cycle is enforced by ψ = x ¬ TC ( P a r e n t ) ( x , x ) .
At this point, the V-structures D over a domain D satisfying B Φ ψ have the following format:
  • either it encodes an S-structure in D S D (the part of the domain satisfying D S ( · ) ) and the corresponding acyclic ground Bayesian network B Φ ( D S ) in D B = D D S .
  • or it is not the case that D encodes both an S-structure in D S D and the corresponding ground Bayesian network B Φ ( D S ) in D B = D D S ;
Back to the coherence-checking problem, we need to decide, for a fixed relational Bayesian network Φ , whether or not a given class S of S-structures ensures the acyclicity of the resulting ground Bayesian network B Φ ( D S ) . Recall that the class S must be defined via a (first-order) S-formula θ S . As we are already employing the transitive closure operator in ψ , we can also allow its use in θ S , which is useful to express S-structures without cycles, for instance.
To check the coherence of Φ for a class S , we cannot just check the validity of:
θ S ( B Φ ψ ) ,
because θ S specifies S-structures over D, while B Φ ψ presupposes that the S-structure is given only over D S = { d D D D S ( d ) } D . To see the kind of problem that might occur, think of the class S of all S-structures D where each d D is such that s i ( d ) holds, for some unary predefined relation s i S . Consider an S-structure D S ( D θ S ), over a domain D. The formula B Φ cannot be satisfied by D , for D S ( x ) must hold for all x D , because of the formulas in (9), so no x D can represent ground formulas, due to the formulas in (6), contradicting the restrictions in (7) that require all ground atoms to be represented. Hence, this D satisfies θ S without encoding the ground Bayesian network, thus falsifying B Φ and satisfying B Φ ψ , yielding the satisfaction of Formula (13). Consequently, Formula (13) is valid for this specific class S , no matter what the relational Bayesian network Φ looks like. Nonetheless, it is not hard to think of a Φ that is trivially incoherent for any class of S-structures, like Φ = { F r ( x ) = r ( x ) } , with S = and R = { r } , where the probability formula associated with the relation r R is the indicator function r ( x ) , yielding a cyclic dependency relation ⪯.
In order to address the aforementioned issue, we need to adapt θ S , constructing θ S to represent the class S in the extended, bipartite domain D = D S D B . The unary predicate D S ( · ) is what delimits the portion of D that is dedicated to define the S-structure. Actually, we can define D S as the set { x D D D S ( x ) } D . Therefore, we must construct a V-formula θ S such that the V-structure D satisfies θ S iff the S-structure D S , formed by D S D and the interpretation of the S relations, satisfies θ S . That is, the S-formulas that hold in an S-structure D S must hold for the subset of a V-structure D defined over the part of its domain that satisfies D S ( · ) . This can be performed by inserting guards in the quantifiers inside θ S .
Definition 5.
Given a (closed) S-formula θ S , θ S is the formula resulting from applying the following substitutions to θ S :
  • Replace each x φ ( x ) in θ S by x D S ( x ) φ ( x ) ;
  • Replace each x φ ( x ) in θ S by x D S ( x ) φ ( x ) .
Finally, we can define the formula that encodes the coherence of a relational Bayesian network Φ for a class of S-structures S :
Definition 6.
For disjoint sets of relations S and R, a given relational Bayesian network Φ and a class of S-structures defined by θ S , C Φ , S = θ S ( B Φ ψ ) .
Putting all those arguments together, we obtain the translation of the coherence-checking problem to the validity of a formula from the transitive closure logic:
Theorem 1
(De Bona and Cozman [24]). For disjoint sets of relations S and R, a given relational Bayesian network Φ and a class of S-structures S defined by θ S , Φ is coherent for S iff C Φ , S is valid.
As first-order logic in general is already well-known to be undecidable, adding a transitive closure operator clearly does not make things easier. Nevertheless, decidability remains an open problem, even restricting the relations in R to be unary and assuming a decidable θ S (even though there are some decidable fragments of first-order logic with transitive closure operators [25,26]). Similarly, a proof of general undecidability remains elusive.

3.3. A Weaker Form of Coherence

Jaeger introduced the coherence problem for RBNs as checking whether every input structure in a given class yields a probability distribution via an acyclic ground Bayesian network. Alternatively, we might define the coherence of an RBN as the existence of at least one input structure, out of a given class, resulting in an acyclic ground Bayesian network. This is closer to the satisfiability-like notion of coherence discussed by de Finetti and closer to work on probabilistic logic [27,28].
In this section, we show that, if one is interested in a logical encoding for this type of coherence for RBNs, the transitive closure operator can be dispensed with.
Suppose we have an RBN Φ and class S of input structures codified via a first-order formula θ S and we want to decide whether Φ is coherent for some structure in S . This problem can be reduced to checking the satisfiability of a first-order formula, using the machinery introduced above, with the bipartite domain. This formula can be easily built as θ S B Φ ψ . By construction, this formula is satisfiable iff there is a structure D over a bipartite domain D = D S D B where D S encodes an S-structure in S ( D θ S ), D B encodes the corresponding ground Bayesian network ( D B Φ ) and the latter is acyclic ( D ψ ). Nonetheless, since now we are interested in satisfiability instead of validity, we can replace ψ by a formula ψ that does not employ the transitive closure operator.
The idea to construct ψ is to use a binary relation P a r e n t ( · , · ) and to force it to extend, or to contain, the transitive closure of P a r e n t ( · , · ) . The formula ψ then also requires P a r e n t ( · , · ) to be irreflexive. If there is such P a r e n t ( · , · ) , then P a r e n t ( · , · ) must be acyclic. Conversely, if P a r e n t ( · , · ) is acyclic, then P a r e n t ( · , · ) can be interpreted as its transitive closure, being irreflexive. In other words, we want a structure to satisfy ψ iff it interprets a relation P a r e n t ( · , · ) that both is irreflexive and extends the transitive closure of P a r e n t ( · , · ) .
In order to build ψ , the vocabulary V is augmented with the binary relation P a r e n t . Now, we can define ψ as the conjunction of two parts:
  • x y z P a r e n t ( x , y ) P a r e n t ( x , y ) P a r e n t ( x , y ) P a r e n t ( y , z ) P a r e n t ( x , z ) , forcing P a r e n t to extend the transitive closure of P a r e n t ;
  • x ¬ P a r e n t ( x , x ) , requiring P a r e n t to be irreflexive.
By construction, one can verify the following result:
Theorem 2.
For disjoint sets of relations S and R, a given relational Bayesian network Φ and a class of S-structures S defined by θ S , Φ is coherent for some structure in S iff θ S B Φ ψ is satisfiable.
The fact that θ S B Φ ψ does not use the transitive closure operator makes its satisfiability decidable for any decidable fragment of first-order logic.

4. Probabilistic Relational Models

In this section, we introduce the machinery of PRMs by following the terminology by Getoor et al. [19], focusing on the simple case where uncertainty is restricted to descriptive attributes, which are assumed to be binary. We also review the coherence problem for PRMs and the proposed solutions in the literature. In the next section, we show how this coherence problem can also be tackled via logic, as the coherence of RBNs.

4.1. Syntax and Semantics of PRMs

To define a PRM, illustrated in Example 5, we need a relational model, with classes associated with descriptive attributes and reference slots that behave like foreign keys. Intuitively, each object in a class is described by the values of its descriptive attributes, and reference slots link different objects. Formally, a relational schema is described by a set of classes X = { X 1 , , X n } , each of which associated with a set of descriptive attributes A ( X i ) and a set of reference slots R ( X i ) . We assume descriptive attributes take values in { 0 , 1 } . A reference slot ρ in a class X (denoted X . ρ ) is a reference to an object of the class Range [ ρ ] (its range type) specified in the schema. The domain type of ρ , Dom [ ρ ] , is X. We can view this reference slot ρ as a function f ρ taking objects in Dom [ ρ ] and returning singletons of objects in Range [ ρ ] . That is, f ρ ( x ) = { y } is equivalent to x . ρ = y .
For any reference slot ρ , there is an inverse slot ρ 1 such that Range [ ρ 1 ] = Dom [ ρ ] and Dom [ ρ 1 ] = Range [ ρ ] . The corresponding function, f ρ 1 takes an object x from the class Range [ ρ ] and returns the set of objects { y f ρ ( y ) = { x } } from the class Dom [ ρ ] . A sequence of slots (inverted or not) K = ρ 1 , , ρ k is called a slot chain if Range [ ρ i ] = Dom [ ρ i + 1 ] for all i. The function corresponding to a slot chain K = ρ 1 , ρ 2 , f K , is a type of composition of the functions f ρ 1 , f ρ 2 , taking an object x from Range [ ρ 1 ] and returning a set objects { z y : y f ρ 1 ( x ) z f ρ 2 ( y ) } from Range [ ρ 2 ] . The corresponding function can be obtained by applying this type of composition two-by-two. We write y x . K when y f K ( x ) .
An instance I of a relational schema populates the classes with objects, associating values with the descriptive attributes and reference slots. Formally, I is an interpretation specifying for each class X X : a set of objects I ( X ) ; a value A . x { 0 , 1 } for each descriptive attribute in A A ( X ) and each object x I ( X ) ; and an object x . ρ I ( Range [ ρ ] ) for each reference slot ρ R ( X ) and object x I ( X ) . Note that, if x . ρ = y , f ρ ( x ) = { y } . We use I x . A and I x . ρ to denote the value of x . A and x . ρ in I .
Given a relational schema, a PRM defines a probability distribution over its instances. In the simplest form, on which we focus, objects and the relations between them are given as input, and there is uncertainty only over the descriptive attributes values. A relational skeleton σ r is a partial specification of an instance that specifies a set of objects σ r ( X i ) for each class X i in the schema besides the relation holding between these objects: σ r x . ρ for each x σ r ( X i ) and ρ R ( X i ) . A completion of a relational skeleton σ r is an instance I such that, for each class X i X : I ( X i ) = σ r ( X i ) and, for each x I ( X i ) and ρ R ( X i ) , I x . ρ = σ r x . ρ . We can see a PRM as a function taking relational skeletons and returning probability distributions over the completions of these partial instances, which can be seen as joint probability distributions for the random variables formed by the descriptive attributes of each object.
The format of a PRM resembles that of a Bayesian network: for each attribute X . A , we have a set of parents Pa ( X . A ) and the corresponding parameters P ( X . A Pa ( X . A ) ) . The parent relation forms a direct graph, as usual, called the dependency graph; and the set of parameters define the conditional probability tables. The attributes in Pa ( X . A ) are called formal parents, as they will be instantiated for each object x in X according to the relational skeleton. There two types of formal parents: X . A can depend either on another attribute X . B of the same object or on an attribute X . K . B of other objects, where K is a slot chain.
In general, for an object x, x . K . B is a multiset { y . B y x . K } , whose size is defined by the relational skeleton. To compactly represent the conditional probability distribution when X . K . B Pa ( X . A ) , the notion of aggregation is used. The attribute x . A will depend on some aggregate function γ of this multiset, like its mean value, mode, maximum or minimum, and so on; that is, γ ( X . K . B ) will be a formal parent of X . A .
Definition 7.
A probabilistic Relational model Π for a relational schema R is defined as a pair Π S , Π θ where:
  • Π S defines, for each class X X and each descriptive attribute A A ( X ) , a set of formal parents Pa ( X . A ) = { U 1 , , U l } , where each U i has the form X . B or γ ( X . K . B ) ;
  • Π θ is the set of parameters defining legal Conditional Probability Distributions (CPDs) P ( X . A Pa ( X . A ) ) for each descriptive attribute A A ( X ) of each class X X .
The semantics of a PRM is given by the ground Bayesian network induced by a relational skeleton, where the descriptive attributes of each object are the random variables.
Definition 8.
A PRM Π = Π S , Π θ and a relational skeleton σ r define a ground Bayesian network where:
  • There is a node representing each attribute x . A , for all x σ r ( X i ) , A A ( X i ) and X i X ;
  • For each X i X , each x σ r ( X i ) and each A A ( X i ) , there is a node representing γ ( x . K . B ) for each γ ( X i . K . B ) Pa ( X i . A ) ;
  • Each x . A depends on parents x . B , for formal parents X . B Pa ( X . A ) , and on parents γ ( x . K . B ) , for formal parents γ ( X . K . B ) Pa ( X . A ) , according to Π S ;
  • Each γ ( x . K . B ) depends on parents y . B with y x . K ;
  • The CPD for P ( x . A Pa ( x . A ) ) is P ( X . A Pa ( X . A ) ) , according to Π θ .
  • The CPD for P ( γ ( x . K . B ) Pa ( γ ( x . K . B ) ) ) is computed through the aggregation function γ .
The joint probability distribution over the descriptive attributes can be factored as usual to compute the probability of a specific instance I that is a completion of the skeleton σ r . If we delete each γ ( x . K . B ) from the ground Bayesian network, making its children depend directly on the nodes y . B with y x . K (defining a new parent relation Pa ) and updating the CPDs accordingly, we can construct a simplified ground Bayesian network. The latter can be employed to factor the joint probability distribution over the descriptive attributes:
P ( I σ r , Π ) = x σ r A A ( x ) P ( I x . A I Pa ( x . A ) ) = X i x σ r ( X i ) A A ( x ) P ( I x . A I Pa ( x . A ) ) .
Viewing Π as a function from skeletons to probability distributions over instances, we use Π ( σ r ) to denote the probability distribution P ( I σ r , Π ) over the completions I of σ r .
Example 5.
Recall again Scenario 4 in Example 2. We can define a PRM that returns the corresponding Bayesian network for each number and configuration of neighbors. In our relational schema, we have a class Person , whose set of descriptive attributes is A ( Person ) = { burglary , alarm , calls } . Furthermore, to capture multiple neighbors, we also need a class Neighborhood , with two reference slots, R ( Neighborhood ) = { neighbor 1 , neighbor 2 } , whose domain is Person . For instance, to denote that Alice and Bob are neighbors, we would have an object, say n A B , in the class Neighborhood , whose reference slots would be n A B . neighbor 1 = Alice and n A B . neighbor 2 = Bob .
We assume that the relation neighbor is reflexive (that is, for each Person x, there is always a Neighborhood n x with n x . neighbor 1 = n x . neighbor 2 = x ) and symmetrical (if x y . neighbor 1 1 . neighbor 2 , we also have y x . neighbor 1 1 . neighbor 2 ).
For each descriptive attribute in our relational schema, we associate a set of formal parents and a conditional probability table, forming the following PRM Π to encode Scenario 4:
  • Pa ( Person . burglary ) = ; P ( Person . burglary ) = 0.001 ;
  • Pa ( Person . alarm ) = { burglary } ; P ( Person . alarm burglary ) = 0 . 9 and P ( Person . alarm ¬ burglary ) = 0 . 1 ;
  • Pa ( Person . calls ) = { or ( Person . neighbor 1 1 . neighbor 2 ) } ;
    P ( Person . calls or ( Person . neighbor 1 1 . neighbor 2 ) = c ) = c , for c { 0 , 1 } .
Given a relational skeleton σ r with persons and neighbors, Π determines a joint probability distribution over the the descriptive attributes, via a Bayesian network. Consider a skeleton σ r with σ r ( Person ) = { x 1 , x 2 , x 3 } and n 12 , n 23 σ r ( Neighborhood ) , with σ r n i j . neighbor 1 = x i and σ r n i j . neighbor 2 = x j , for each n i j σ r ( Neighborhood ) , but such that no n σ r ( Neighborhood ) has n . neighbor 1 = x 1 and n . neighbor 2 = x 3 . Then, the resulting probability distribution is the model of Scenario 3 in Example 2, whose Bayesian network is given in Figure 2.

4.2. Coherence via Colored Dependency Graphs

As with RBNs, for the model to be coherent, one needs to guarantee that the ground Bayesian network is acyclic. Getoor et al. [19] focused on guaranteeing that a PRM yields acyclic ground Bayesian networks for all possible relational skeletons. To achieve that, possible cycles are detected in the class dependency graph.
Definition 9.
Given a PRM Π , the class dependency graph G Π is a directed graph with a node for each descriptive attribute X . A and the following arcs:
  • Type I arcs: X . B , X . A , where X . B is a formal parent of X . A ;
  • Type II arcs: Y . B , X . A , where γ ( X . K . B ) is a formal parent of X . A and Y = R a n g e [ X . K ] .
When the class dependency graph is acyclic, so is the ground Bayesian network for any relational skeleton. Nevertheless, it may be the case that, even for cyclic class dependency graphs, any relational skeleton occurring in practice leads to a coherent model. In other words, there might be classes of skeletons for which the PRM is coherent. To easily recognize some of these classes, Getoor et al. [19] put forward an approach based on identifying slot chains that are acyclic in practice. A set of slot chains K g a = { K 1 , , K m } is guaranteed acyclic if we are guaranteed that, for any possible relational skeleton σ r , there is a partial ordering ⪯ over its objects such that, for each K i K g a , x y for any pair x and y x . K i (we use x y to denote x y and x y ).
Definition 10.
Given a PRM Π and a set of guaranteed acyclic slot chains K g a , the colored class dependency graph G Π is a directed graph with a node for each descriptive attribute X . A and the following arcs:
  • Yellow arcs: X . B , X . A , where X . B is a formal parent of X . A ;
  • Green arcs: Y . B , X . A , where γ ( X . K . B ) is a formal parent of X . A , Y = R a n g e [ X . K ] and K K g a ;
  • Red arcs: Y . B , X . A , where γ ( X . K . B ) is a formal parent of X . A , Y = R a n g e [ X . K ] and K K g a .
Intuitively, yellow cycles in the colored class dependency graph correspond to attributes of the same object, yielding a cycle in the ground Bayesian network. If we add some green arcs to such a cycle, then it is guaranteed that, departing from a node x . A in the ground Bayesian network, these arcs form a path to y . A , where x y , since ⪯ is transitive. Hence, x is different from y, and there is no cycle. If there is a red arc in a cycle, however, one may have a skeleton that produces a cycle.
A colored class dependency graph is stratified if every cycle contains at least one green arc and no red arc. Then:
Theorem 3
(Getoor et al. [19]). Given a PRM Π and a set of guaranteed acyclic slot chains K g a , if the colored class dependency graph G Π is stratified, then the ground Bayesian network is acyclic for any possible relational skeleton.
In the result above and in the definition of guaranteed acyclic slot chains, “possible relational skeleton” refers to the class of skeletons that can occur in practice. The user must detect the guaranteed acyclic slot chains, taking advantage of his a priori knowledge on the possible skeletons in practice. For instance, consider a slot chain motherOf linking objects of the same class Person (Example 3). A genetic attribute, like Person . blueEyes , might depend on Person . motherOf . blueEyes . Mathematically, we can conceive of a skeleton with a cyclic relation motherOf , resulting in a red cycle in the colored class dependency graph. Nonetheless, being aware of the intended meaning of motherOf , we know that such skeletons are not possible in practice, so the cycle is green, and coherence is guaranteed.
Identifying guaranteed acyclic slot chains is by no means trivial. In fact, Getoor et al. [19] also define guaranteed acyclic (g.a.) reference slots and g.a. slot chains are defined as those formed only by g.a. reference slots. Still, these maneuvers miss the cases where two possible reference slots cannot be g.a. according to the same ⪯, but combine to form a g.a. slot chain. Getoor et al. [19] mention the possibility of assuming different partial orders to define different sets of g.a. slot chains: in that case, each ordering would correspond to a shade of green in the colored class dependency graph, and coherence would not be ensured if there were two shades of green in a cycle.

5. Logic-Based Approach to the Coherence of PRMs

The simplest approach to the coherence of PRMs, via the non-colored class dependency graph, is intrinsically incomplete, in the sense that some skeletons might yield a coherent ground Bayesian network even for cyclic graphs. The approach via colored class dependency graph allows some cyclic graphs (the stratified ones) to guarantee consistency for the class of all possible skeletons. However, this method depends on a pre-specified set of guaranteed acyclic slot chains, and the colored class dependency graph being stratified for this set is only a sufficient, not a necessary condition for coherence. Therefore, the colored class dependency graph method is incomplete, as well. Even using different sets of g.a. slot chains (corresponding to shades of green) to eventually capture all of them, it is still possible that a cycle with red arcs cannot entail incoherence in practice. Besides being incomplete, the graph-based method is not easily applicable to an arbitrary class of skeletons. Given a class of skeletons as input, the user would have to detect somehow which slot chains are guaranteed acyclic for that specific class; this can be considerably more difficult than ensuring acyclicity in the general case.
To address these issues, thus obtaining a general, complete method for checking the coherence of PRMs for a given class of skeletons, we can resort to the logic-based approach we introduced for the RBNs in previous sections. The goal of this section is to adapt the logic-basic techniques to PRMs.
PRMs can be viewed as RBNs, as conditional probability tables of the former can be embedded into combination functions of the latter. This translation is out of our scope though, and it suffices for our purposes to represent PRMs as random relational structures, taking S-structures to probability distributions on R-structures. While the S-vocabulary is used to specify classes of objects and relations between them (that is, the relational skeleton), the R-vocabulary expresses the descriptive attributes of the objects. Employing this logical encoding of PRMs, we can apply the approach from Section 3.1 to the coherence problem for PRMs.
To follow this strategy, we first show how a PRM can be seen as a random relational structure described by a logical language.

5.1. PRMs as Random Relational Structures

Consider a PRM Π = Π S , Π θ over a relational schema described by a set of classes X = { X 1 , X n } , each associated with a set of descriptive attributes A ( X i ) and a set of reference slots R ( X i ) . Given a skeleton σ r , which is formed by objects and relations holding between them, the PRM Π yields a ground Bayesian network over the descriptive attributes of these objects, defining a probability distribution Π ( σ r ) over the completions of σ r . Hence, if the relational skeleton is given as a first-order S-structure over a set of objects and a set of unary relations R denotes their attributes, the PRM becomes a random relational structure.
We need to represent a skeleton σ r as a first-order S-structure Σ . Objects in σ r can be seen as the elements of the domain D of Σ . Note that PRMs are typed, with each object belonging to specific class X i X . Thus, we use unary relations X 1 , , X n in the vocabulary S to denote the class of each object. Accordingly, for each x D , X i ( x ) holds in Σ iff x σ r ( X i ) . As each object belongs to exactly one class in the relational skeleton, the class of possible first-order structures is restricted to those where the relations X 1 , , X n form a partition of the domain.
The first-order S-structure Σ must also encode the relations holding between the objects in the skeleton that are specified via the values of the reference slots. To capture these, we assume they have unique names and consider, for each reference slot X i . ρ R ( X i ) with R a n g e [ ρ ] = X j , a binary relation S ρ . In Σ , S ρ ( x , y ) holds iff σ r x . ρ = y . Naturally, S ρ ( x , y ) should imply X i ( x ) and X j ( y ) . Now, Σ encodes, through the vocabulary S, all objects of a given class, as well as the relations between them specified in the reference slots. In other words, there is a computable function b S from relational skeletons σ r to S-structures Σ = b S ( σ r ) . For b S to be a bijection, we make its codomain equal to its range.
The probabilistic vocabulary of the random relational structure corresponding to a PRM is formed by the descriptive attributes of every class in the relational schema. We assume that attributes in different classes have different names, as well, in order to define the vocabulary of unary relations R = { A A ( X i ) X i X } . If A j is an attribute of X i , x . A j = 1 (resp. x . A j = 0 ) in the PRM is mirrored by the ground R-atom A j ( x ) being true (resp. false ) in the random relational structure. Thus, as a completion I corresponds to a value assignment to descriptive attributes of objects x 1 , , x m from a relational skeleton σ r , it also corresponds to an R-structure D I over a domain D = { x 1 , , x m } in the following way: D I A i ( x j ) iff x j . A i = 1 . Note that we assume that for D I to correspond to a completion I of σ r , D I A i ( x j ) whenever A i is not an attribute of the class X X such that x j σ r ( X ) . Let b R denote the function taking instances I and returning the corresponding R-structures D I = b R ( I ) . As we cannot recover the skeleton σ r from the R-structure D I = b R ( I ) , b R is not a bijection. Nevertheless, fixing a skeleton σ r , there is a unique I such that b R ( I ) = D I .
Now, we can define a random relational structure P Π that corresponds to the PRM Π . For every relational skeleton σ r over a domain D, let P Π ( b S ( σ r ) ) : Mod R ( D ) [ 0 , 1 ] be a probability distribution over R-structures such that P Π ( b S ( σ r ) ) ( D R ) = Π ( σ r ) ( I R ) , if D R = b R ( I R ) , for a completion I R of σ r , and P Π ( b S ) ( D R ) = 0 otherwise.

5.2. Encoding the Ground Bayesian Network and its Acyclicity

The probability distribution P Π ( b S ( σ r ) ) can be represented by a ground Bayesian network B P Π ( b S ( σ r ) ) , where nodes represent the ground R-atoms. The structure of this network is isomorphic to the simplified ground Bayesian network yielded by Π for the skeleton σ r , if we ignore the isolated nodes representing the spurious A i ( x j ) = 0 , when A i is not an attribute of the class to which x j belongs. The coherence of Π ( σ r ) depends on the acyclicity of the corresponding ground Bayesian network B Π ( σ r ) , which is acyclic iff B P Π ( σ r ) is so. Therefore, we can encode the coherence of a PRM Π for a skeleton σ r via the acyclicity of B P Π ( b s ( σ r ) ) by applying the techniques from Section 3.
We want to construct a formula that is satisfied only by those S-structures b S ( σ r ) such that Π ( σ r ) is coherent. Again, we consider an extended, bipartite domain D = D S D B , with b S ( σ r ) encoded over D S and the structure of B P Π ( σ r ) encoded in D B . We want to build a formula B Π that is satisfied by structures D over D = D S D B such that, if D encodes b S ( σ r ) over D S , then D encodes the structure of B P Π ( b S ( σ r ) ) over D B . The nodes are encoded exactly as shown in Section 3.1.
To encode the arcs, we employ once more a relation P a r e n t ( · , · ) . P a r e n t ( y , y ) must hold only if x , y D B denote ground R-atoms A i ( x ) and A j ( x ) such that x . A j Pa ( x . A i ) in the simplified ground Bayesian network, which is captured by the formula D e p i j ( x , x ) , as in Section 3.1. The only difference here is that now D e p i j ( x , x ) can be defined directly. We use D e p i j ( x , x ) here to denote a formula recursively defined, not an atom over the binary relation D e p i j ( · , · ) . For each pair A i , A j R , we can simply look at Π S to see the conditions on which x . A j is a parent of x . A i in the simplified ground Bayesian network ( x . A j Pa ( x . A i ) ), in which case A j ( x ) will be a parent of A i ( x ) in B P π ( b S ( σ r ) ) . If X . A j Pa ( X . A i ) , then D e p i j ( x , x ) should be true whenever x = x . If γ ( X . K . A j ) Pa ( X . A i ) , for a slot chain K, then x . K . A j Pa ( x . A i ) and D e p i j ( x , x ) should be true whenever x is related to x via K = ρ 1 , , ρ k . This is the case if:
y 1 , y 2 y k 1 S ρ 1 ( x , y 1 ) S ρ 2 ( y 1 , y 2 ) S ρ k ( y k 1 , x )
is true . If K = ρ , this formula is simply S ρ ( x , x ) .
Note that it is possible that both X . A j and γ ( X . K . A j ) are formal parents of X . A i , and there can even be different parents γ ( X . K . A j ) , for different K. Thus, we define D e p i j ( x , x ) algorithmically. Initially, make D e p i j ( x , x ) = . If X . A j Pa ( X . A i ) for some A j , make D e p i j ( x , x ) = D e p i j ( x , x ) ( x = x ) . Finally, for each γ ( X . K . A j ) in Pa ( X . A i ) , for a slot chain K = ρ 1 , , ρ k , make:
D e p i j ( x , x ) = D e p i j ( x , x ) y 1 , y 2 y k 1 S ρ 1 ( x , y 1 ) S ρ 2 ( y 1 , y 2 ) S ρ k ( y k 1 , x ) ,
using fresh y 1 , , y k 1 .
Analogously to Section 3.1, we have a formula B Π , for a fixed PRM Π , that is satisfied only by structures D over a bipartite domain D S D B where the P a r e n t ( · , · ) relation over D B brings the structure of the ground Bayesian network B P Π ( Σ ) corresponding to the S-structure Σ encoded in D S . Again, acyclicity can be captured via a transitive closure operator: ψ Π = x ¬ TC ( P a r e n t ) ( x , x ) . The PRM Π is coherent for a skeleton σ r if, for every structure D over a bipartite domain D S D B encoding b S ( σ r ) in D S , we have D B Π ψ Φ .
Consider now a class S of skeletons σ r such that { b S ( σ r ) σ r S } is the set of S-structures satisfying a first-order formula θ S . To check whether the PRM Π is coherent for the class S , we construct θ S by inserting guards to the quantifiers, as explained in Definition 5. Finally, the PRM Π is coherent for a class S of relational skeletons iff θ S ( B Π ψ Π ) is valid.
We have thus succeeded in turning coherence checking for PRMs into a logical inference, by adapting techniques we developed for RBNs. In the next section, we travel, in a sense, the reverse route: we show how to adapt the existing graph-based techniques for coherence checking of PRMs to coherence checking of RBNs.

6. Graph-Based Approach to the Coherence of RBNs

The logic-based approach to the coherence problem for RBNs can be applied to an arbitrary class of input structures, as long as the class can be described by a first-order formula, possibly with a transitive closure operator. Given any class of input structures S , via the formula θ S , we can verify the coherence of a RBN Φ via the validity of θ S ( B Φ ψ ) , as explained in Section 3. Furthermore, this method is complete, as Φ is coherent for S if and only if such a formula is valid. Nonetheless, completeness and flexibility regarding the input class come at a very high price, as deciding the validity of this first-order formula involving a transitive closure operator may be computationally hard, if at all decidable. Therefore, RBNs users can benefit from the ideas introduced by Getoor et al. [19] for the coherence of PRMs, using the (colored) dependency graphs. While Jaeger [23] proposes to investigate the coherence for a given class of models described by a logical formula, Getoor et al. [19] are interested in a single class of inputs: the skeletons that are possible in practice. With a priori knowledge, the RBN user perhaps can attest to the acyclicity of the resulting ground Bayesian network for all possible inputs.
Any arc r ( t ) , r ( t ) in the output ground Bayesian network B Φ ( D S ) , for an RBN Φ and input D S reflects that the probability formula F r ( x ) , when x = t , depends on r ( t ) . Hence, possible arcs in this network can be anticipated by looking into the probability formulas F r ( x ) , for the probabilistic relations r R , in the definition of Φ . In other words, by inspecting the probability formula F r ( x ) , we can detect those r R for which an arc r ( t ) , r ( t ) can possibly occur in the ground Bayesian network. Similarly to the class dependency graph for PRMs, we can construct a high-level dependency graph for RBNs that brings the possible arcs, and thus cycles, in the ground Bayesian network.
Definition 11.
Given an RBN Φ , the R-dependency graph G Φ is a directed graph with a node for each probabilistic relation r R and the following arcs:
  • Type I arcs: r , r , where r ( x ) occurs in F r ( x ) outside the scope of a combination function;
  • Type II arcs: r , r , where r ( y ) occurs in F r ( x ) inside the scope of a combination function.
Intuitively, a Type I arc r , r in the R-dependency graph of an RBN Φ means that, for any input structure D S over D and any tuple t D a ( r ) , F r ( t ) depends on r ( t ) in the ground Bayesian network B Φ ( D S ) ; formally, r ( t ) α ( F r ( x ) , t , D S ) . For instance, if F r 1 ( x ) = mean ( { | r 2 ( y ) y ; S ( x , y ) | } ) ( 1 r 3 ( x ) ) , then, given any S-structure, F r 1 ( t ) depends on r 3 ( t ) for any t. Type II arcs capture dependencies that are contingent on the S-relations holding in the input structure. In other words, a Type II arc r , r means that F r ( t ) will depend on r ( t ) if some S-formula φ holds in the input structure D S ; D S φ . For instance, if F r 1 ( x ) = ( mean ( { | r 2 ( y ) y ; S ( x , y ) | } ) ( 1 r 3 ( x ) ) , r 1 ( t ) depends on r 2 ( t ) (for t , t in the domain D) in the output ground Bayesian network iff the input D S is such that D S S ( t , t ) . If combination functions are nested, the corresponding S-formula might be fairly complicated. Nevertheless, the point here is simply noting that, given a Type II arc r , r , the conditions on which r ( t ) is actually a child of r ( t ) in the ground Bayesian network can be expressed with an S-formula parametrized by t , t , which will be denoted by φ S r , r ( t , t ) . Consequently, for t , t D , D S φ S r , r ( t , t ) iff r ( t ) α ( F r ( x ) , t , D S ) , i.e., r ( t ) depends on r ( t ) in B Φ ( D S ) .
As each arc in the ground Bayesian network corresponds to an arc on the R-dependency graph, when the latter is acyclic, so will be the former, for any input structure. As it happens with class dependency graphs and PRMs, though, a cycle in the R-dependency graph does not entail a cycle in the ground Bayesian network if a Type II arc is involved. It might well be the case that the input structures D S found in practice do not cause cycles to occur. This can be captured via a colored version of the R-dependency graph.
In the same way that Type I arcs in the class dependency graph of a PRM relate to attributes of different objects, in the R-dependency graph of an RBN, these arcs encode the dependency between relations r , r R to be grounded with (possibly) different tuples. For a PRM, the ground Bayesian network can never reflect a cycle with green arcs, but no red one, in the class dependency graph, for a sequence of green arcs guarantees different objects, according to a partial ordering. Analogously, with domain knowledge, the user can identify Type II arcs in the R-dependency graph whose sequence will prevent cycles in the ground Bayesian network, via a partial ordering over the tuples.
For a vocabulary S of predefined relations, let T D = { D a a N } denote the set of all tuples with elements of D. We say a set A g a = { r i , r i 1 i n } of Type II arcs is guaranteed acyclic if, for any possible input structure D S over D, there is a partial ordering ⪯ over T D such that, if D S φ S r , r ( t , t ) for some t , t T D , then t t . Here, again, “possible” means “possible in practice”.
Definition 12.
Given the R-dependency graph of an RBN Φ and a set A g a of guaranteed acyclic type II arcs, the colored R-dependency graph G Π is a directed graph with a node for each r R and the followin arcs:
  • Yellow arcs: Type I arcs in the R-dependency graph;
  • Green arcs: Type II arcs r , r in the R-dependency graph such that r , r A g a ;
  • Red arcs: The remaining (Type II) arcs in the R-dependency graph.
Again, yellow cycles in the colored R-dependency graph correspond to relations r R grounded with the same tuple t, yielding a cycle in the ground Bayesian network. If green arcs are added to a cycle, then it is guaranteed that, departing from a node r ( t ) in the ground Bayesian network, these arcs form a path to r ( t ) , where t t for a partial ordering ⪯, and there is no cycle. Once more, red arcs in cycles may cause t = t , and coherence is not ensured. Calling stratified a R-dependency graph whose every cycle contains at least one green arc and no red arc, we have:
Theorem 4.
Given the R-dependency graph of an RBN Φ and a set A g a of guaranteed acyclic Type II arcs, if the colored class dependency graph G Φ is stratified, then the ground Bayesian network is acyclic for any possible input structure.
Of course, detecting guaranteed acyclic Type II arcs in R-dependency graphs of RBNs is even harder than, as a generalization of, detecting guaranteed acyclic slot chains in PRMs. In any case, if the involved relations r , r R are unary, one is in a position similar to finding acyclic slot chains, as the arguments of r , r can be seen as objects, and only a partial ordering over the elements of the domain (not tuples) is needed.

7. Conclusions

In this paper, we examined a new version of coherence checking, a central problem in the foundations of probability as conceived by de Finetti. The simplest formulation of coherence checking takes a set of events and their probabilities and asks whether there can be a probability measure over an appropriate sample space [1]. This sort of problem is akin to inference in propositional probabilistic logic [28]. Unsurprisingly, similar inference problems have been studied in connection with first-order probabilistic logic [27]. Our focus here is on coherence checking when one has events specified by first-order expressions, on top of which one has probability values and independence relations. Due to the hopeless complexity of handling coherence checking for any possible set of assessments and independence judgments, we focus on those specifications that enhance the popular language of Bayesian networks. In doing so, we address a coherence checking problem that was discussed in the pioneering work by Jaeger [23].
We have first examined the problem of checking the coherence of relational Bayesian networks for a given class of input structures. We used first-order logic to encode the output ground Bayesian network into a first-order structure, and we employed a transitive closure operator to express the acyclicity demanded by coherence, finally reducing the coherence checking problem to that of deciding the validity of a logical formula. We conjecture that Jaeger’s original proposal concerning the format of the formula encoding the consistency of a relational Bayesian network Φ for a class S cannot be followed as originally stated; as we have argued, the possible number of tuples built from a domain typically outnumbers its size, so that there is no straightforward way to encode the ground Bayesian network, whose nodes are ground atoms, into the input S-structure. Therefore, it is hard to think of a method that translates the acyclicity of the ground Bayesian network into a formula φ Φ to be evaluated over an input structure in the class S (satisfying θ S ). Our contribution here is to present a logical scheme that bypasses such difficulties by employing a bipartite domain, encoding both the S-structure and the corresponding Bayesian network. We have also extended those results to PRMs, in fact mixing the existing graph-based techniques for coherence checking with our logic-based approach. Our results seem to be the most complete ones in the literature.
Future work includes searching for decidable instances of the formula encoding the consistency of a relational Bayesian network for a class of input structures and exploring new applications for the logic techniques herein developed.

Acknowledgments

GDB was supported by Fapesp, Grant 2016/25928-4. FGC was partially supported by CNPq, Grant 308433/2014-9. The work was supported by Fapesp, Grant 2016/18841-0.

Author Contributions

Both authors have contributed to the text and read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:
CPDConditional Probability Table
g.a.guaranteed acyclic
iffif and only if
PRMProbabilistic Relational Model
PSATProbabilistic Satisfiability
RBNRelational Bayesian Network

References

  1. De Finetti, B. Theory of Probability; Wiley: New York, NY, USA, 1974; Volumes 1 and 2. [Google Scholar]
  2. Coletti, G.; Scozzafava, R. Probabilistic Logic in a Coherent Setting; Trends in Logic, 15; Kluwer: Dordrecht, The Netherlands, 2002. [Google Scholar]
  3. Lad, F. Operational Subjective Statistical Methods: A Mathematical, Philosophical, and Historical, and Introduction; John Wiley: New York, NY, USA, 1996. [Google Scholar]
  4. Berger, J.O. In Defense of the Likelihood Principle: Axiomatics and Coherency. In Bayesian Statistics 2; Bernardo, J.M., DeGroot, M.H., Lindley, D.V., Smith, A.F.M., Eds.; Elsevier Science: Amsterdam, The Netherlands, 1985; pp. 34–65. [Google Scholar]
  5. Regazzini, E. De Finetti’s Coherence and Statistical Inference. Ann. Stat. 1987, 15, 845–864. [Google Scholar] [CrossRef]
  6. Shimony, A. Coherence and the Axioms of Confirmation. J. Symb. Logic 1955, 20, 1–28. [Google Scholar] [CrossRef]
  7. Skyrms, B. Strict Coherence, Sigma Coherence, and the Metaphysics of Quantity. Philos. Stud. 1995, 77, 39–55. [Google Scholar] [CrossRef]
  8. Savage, L.J. The Foundations of Statistics; Dover Publications, Inc.: New York, NY, USA, 1972. [Google Scholar]
  9. Darwiche, A. Modeling and Reasoning with Bayesian Networks; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  10. Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
  11. Pearl, J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
  12. Getoor, L.; Taskar, B. Introduction to Statistical Relational Learning; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
  13. De Raedt, L. Logical and Relational Learning; Springer: Berlin, Heidelberg, 2008. [Google Scholar]
  14. Raedt, L.D.; Kersting, K.; Natarajan, S.; Poole, D. Statistical Relational Artificial Intelligence: Logic, Probability, and Computation; Morgan & Claypool: San Rafael, CA, USA, 2016. [Google Scholar]
  15. Cozman, F.G. Languages for Probabilistic Modeling over Structured Domains. Tech. Rep. 2018. submitted. [Google Scholar]
  16. Poole, D. First-order probabilistic inference. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Acapulco, Mexico, 9–15 August 2003; pp. 985–991. [Google Scholar]
  17. Gilks, W.; Thomas, A.; Spiegelhalter, D. A language and program for complex Bayesian modeling. Statistician 1993, 43, 169–178. [Google Scholar] [CrossRef]
  18. Lunn, D.; Spiegelhalter, D.; Thomas, A.; Best, N. The BUGS project: Evolution, critique and future directions. Stat. Med. 2009, 28, 3049–3067. [Google Scholar] [CrossRef] [PubMed]
  19. Getoor, L.; Friedman, N.; Koller, D.; Pfeffer, A.; Taskar, B. Probabilistic relational models. In Introduction to Statistical Relational Learning; Getoor, L., Taskar, B., Eds.; MIT Press: Cambridge, MA, USA, 2007. [Google Scholar]
  20. Koller, D. Probabilistic relational models. In Proceedings of the International Conference on Inductive Logic Programming, Bled, Solvenia, 24–27 June 1999; pp. 3–13. [Google Scholar]
  21. Heckerman, D.; Meek, C.; Koller, D. Probabilistic Entity-Relationship Models, PRMs, and Plate Models. In Introduction to Statistical Relational Learning; Getoor, L., Taskar, B., Eds.; MIT Press: Cambridge, MA, USA, 2007; pp. 201–238. [Google Scholar]
  22. Jaeger, M. Relational Bayesian networks. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, Providence, RI, USA, 1–3 August 1997; pp. 266–273. [Google Scholar]
  23. Jaeger, M. Relational Bayesian networks: A survey. Electron. Trans. Art. Intell. 2002, 6, 60. [Google Scholar]
  24. De Bona, G.; Cozman, F.G. Encoding the Consistency of Relational Bayesian Networks. Available online: http://sites.poli.usp.br/p/fabio.cozman/Publications/Article/bona-cozman-eniac2017F.pdf (accessed on 23 March 2018).
  25. Alechina, N.; Immerman, N. Reachability logic: An efficient fragment of transitive closure logic. Logic J. IGPL 2000, 8, 325–337. [Google Scholar] [CrossRef]
  26. Ganzinger, H.; Meyer, C.; Veanes, M. The two-variable guarded fragment with transitive relations. In Proceedings of the 14th IEEE Symposium on Logic in Computer Science, Trento, Italy, 2–5 July 1999; pp. 24–34. [Google Scholar]
  27. Fagin, R.; Halpern, J.Y.; Megiddo, N. A Logic for Reasoning about Probabilities. Inf. Comput. 1990, 87, 78–128. [Google Scholar] [CrossRef]
  28. Hansen, P.; Jaumard, B. Probabilistic Satisfiability; Technical Report G-96-31; Les Cahiers du GERAD; École Polytechique de Montréal: Montreal, Canada, 1996. [Google Scholar]
Figure 1. Bayesian network modeling the burglary-alarm-call scenario with Mary and Tina . In the probabilistic assessments (right), the logical variable x stands for Mary and for Tina .
Figure 1. Bayesian network modeling the burglary-alarm-call scenario with Mary and Tina . In the probabilistic assessments (right), the logical variable x stands for Mary and for Tina .
Entropy 20 00229 g001
Figure 2. Bayesian network modeling Scenario 3 in Example 2. Probabilistic assessments are just as in Figure 1, except that, for each x, calls ( x ) is the disjunction of its corresponding parents.
Figure 2. Bayesian network modeling Scenario 3 in Example 2. Probabilistic assessments are just as in Figure 1, except that, for each x, calls ( x ) is the disjunction of its corresponding parents.
Entropy 20 00229 g002
Figure 3. Plate models for Scenario 2 of Example 2; that is, for the burglary-alarm-call scenario where there is a single random variable calls . Left: A partial plate model (without the calls random variable), indicating that parameterized random variables burglary ( x ) and alarm ( x ) must be replicated for each person x; the domain consists of the set of persons as marked in the top of the plate. Note that each parameterized random variable must be associated with probabilistic assessments; in this case, the relevant ones from Figure 1. Right: A plate model that extends the one on the left by including the random variable calls .
Figure 3. Plate models for Scenario 2 of Example 2; that is, for the burglary-alarm-call scenario where there is a single random variable calls . Left: A partial plate model (without the calls random variable), indicating that parameterized random variables burglary ( x ) and alarm ( x ) must be replicated for each person x; the domain consists of the set of persons as marked in the top of the plate. Note that each parameterized random variable must be associated with probabilistic assessments; in this case, the relevant ones from Figure 1. Right: A plate model that extends the one on the left by including the random variable calls .
Entropy 20 00229 g003
Figure 4. A Probabilistic Relational Model (PRM) for Scenario 4 in Example 2, using a diagrammatic scheme suggested by Getoor et al. [19]. A textual description of this PRM is presented in Section 4.
Figure 4. A Probabilistic Relational Model (PRM) for Scenario 4 in Example 2, using a diagrammatic scheme suggested by Getoor et al. [19]. A textual description of this PRM is presented in Section 4.
Entropy 20 00229 g004
Figure 5. The PRM for the genetic example, as proposed by Getoor et al. [19].
Figure 5. The PRM for the genetic example, as proposed by Getoor et al. [19].
Entropy 20 00229 g005

Share and Cite

MDPI and ACS Style

De Bona, G.; Cozman, F.G. On the Coherence of Probabilistic Relational Formalisms. Entropy 2018, 20, 229. https://doi.org/10.3390/e20040229

AMA Style

De Bona G, Cozman FG. On the Coherence of Probabilistic Relational Formalisms. Entropy. 2018; 20(4):229. https://doi.org/10.3390/e20040229

Chicago/Turabian Style

De Bona, Glauber, and Fabio G. Cozman. 2018. "On the Coherence of Probabilistic Relational Formalisms" Entropy 20, no. 4: 229. https://doi.org/10.3390/e20040229

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop