Alternative Initial Probability Tables for Elicitation of Bayesian Belief Networks

: Bayesian Belief Networks are used in many ﬁelds of application. Deﬁning the conditional dependencies via conditional probability tables requires the elicitation of expert belief to ﬁll these tables, which grow very large quickly. In this work, we propose two methods to prepare these tables based on a low number of input parameters using speciﬁc structures and one method to generate the table using probability tables of each relation of a child node with a certain parent. These tables can be used further as a starting point for elicitation.


Introduction
A Bayesian Belief Network (BBN) is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph. Bayesian networks can be used for probabilistic reasoning, deriving the effects of the occurrence of an event by predicting the likelihood of other related events. The use of BBNs has seen an enormous increase in recent years. Examples can be found in all kinds of application areas where complex systems are found, such as natural ecosystems [1,2], human behavior [3,4], risk assessment in constructions and complex technological systems [5][6][7][8], military operations [9][10][11], medicine and healthcare [12,13] and business and cyber threat analysis [14,15].
The main idea of a BBN is that it specifies the relations between variables, capturing the probability that a variable has a specific state, depending on the state of other variables, its parents. If we have three variables, X, Y and Z, each having two possible states (0, 1), and the state of X depends on the state of Y and Z, which are mutually independent, the marginal probability of X being 1 can be derived: An example is depicted in Figure 1. The figure was created using the BBN tool Genie (www.bayesfusion.com/genie, accessed on 23 July 2021). Important here is that the conditional probability P(X|Y, Z) has to be specified. In the example in Figure 1, this means a table with conditional probabilities has to be provided as depicted in Table 1.
As an example, in this conditional probability table (CPT), it is stated that P(X = 1|Y = 0, Z = 0) = 0.1. This table is not that hard to fill. However, when a child has a larger number of states and multiple parents that have many states of their own, the number of entries in this table can grow very large. Such tables are not only tedious to fill, but basic relationships between parents (and their states) are difficult to identify and unambiguously process in the table.  A lot has been written about the elicitation process of expert beliefs, merely about how to quantify the opinions and uncertainty of experts, starting with the work of Cooke [16] and O'Hagan [17], as well as the work of Hanea [18]. A review of methods to fill the large conditional probability tables is given by Werner et al. [19]. Elaborate methods to fill the table automatically, e.g., as a start point for the experts, are given in the works of Wisse et al. [20] and Hassall et al. [21]. The main differences between their work are the number of values to be scored and the way the overall score is divided over the states of the child node that is under assessment. Examples of values to be scored are the relative weights of influence of the parent nodes on the child node and the direction of the relationship. What is left open is the fact that still there are a huge number of ways to translate these values in the CPT. One could of course try to fit it best to elicited beliefs of experts; however, we see in practice that this fitting is already very hard [17]. It then can be helpful to create a number of conditional probability tables that represent a trend to assess the influence of these patterns on the main, target variable of the BBN.
For some applications, the CPT or even the structure of the model can be generated automatically by using ontology-driven approaches, machine learning techniques or structural equation modeling, including entropy-based approaches [22]. However, Maung and Paris have shown that the general problem of finding the maximum entropy solution in probabilistic systems is NP complete [23,24].
In this work, we elaborate on the works of Wisse et al. [20] and Hassall et al. [21]. We assume BBNs having discrete states. We also assume that experts are able to order the states that follow from all combinations of the parents' states in some way, e.g., the best or worst case. First, an idea is presented to create conditional probability tables that use limited input from experts, based on pre-defined patterns for the distribution of the probabilities over the table, and can be used as a starting point for further elicitation. We will compare these patterns with the approach as presented in [21]. Next, we present an approach for specific applications, where conditional probability tables for each parent are combined to create the full CPT. We end with some conclusions.

CPT Algorithms Using Limited Input
In this section, we present the CPT algorithms. First, we present the algorithm as introduced by Hassall et al. [21]. Next, we suggest two other methods in which we use the freedom we have to create other patterns within the CPT. In Figure 2, we see a graphical representation of the outcome of Hassall's algorithm. Here, the CPT, for some order of all combinations of the parents' states, for four child states is shown as a heat map, where a probability of one for a child state gives a very small black rectangle, and a probability of zero results in a white rectangle. What we try to do here is look through the eyelashes to the total CPT, where ones are black and zeros are white. In this way, you see in one glance the general distribution of the probability mass. Note that you can see now that Hassall's algorithm distributes the probability quite evenly over all states, resulting in many gray rectangles. In Figures 3 and 4, the two other algorithms are shown, where we see a clear pattern. These approaches also reach higher values, evident from the black parts. The first, in Figure 3, we will name 'Weighted Diagonal', giving a non-zero probability to at most two states per combination of parent states. The second, in Figure 4, we will name 'Weighted Diamond', where the non-zero values form a rhomboid or diamond shape.

Hassall's Algorithm
To specify a score that captures the relative effects of different parent nodes, in a first step, an expert assigns a weight of relative importance, as shown in [21], to each parent node. This weighting is used to define the relative effects of each parent on the probability distribution of the child node. Parents with a larger weight are assigned a greater level of influence in determining the conditional probability table such that changes in the states of the parent with the largest weight will result in the biggest differences in the distribution of the child node.
The second step is to define the direction of the relationship between each parent and child. Each parent can have either a positive, negative or other relationship with the child node. A relationship is considered positive if, as the states of the parent change, according to the order they have been defined, the probability the child node is in its higher states also increases. Conversely, a negative relationship is appropriate if as the states of the parent changes according to the order they have been defined, the probability the child node is in its higher states decreases. Not every parent-child relationship can be categorized as having either a positive or negative relationship.
So, Hassall's algorithms use as input the relative weights of the influence of the parent nodes on the child node (w i ∈ R + ) and the direction of the relationship of the parent state on the child states. They allow for a relationship where the order can be defined separately from the order the parent states are defined in the BBN. We assume that the ranking is done beforehand such that the relationship is always positive, without loss of generality. This means that we can define a score of the jth state of the ith parent by where n i is the number of states of parent i. Now, for a combination k of parent states, we define a score, given by a weighted average of the constituent scores: where {k} is the kth combination of parent states, with P i{k} denoting the associated score of parent i for combination k. This score is translated to probabilities for the child states.
Here, a specific translation is made, where all kinds of translations are possible. Formally, it is stated as: having a child with M > 2 states, the probability that the child is in state m is given by twice the area of the mth trapezium formed when the linear line between the two probabilities of a corresponding two-state child is cut into M equal intervals. For the mth child state and the kth combination of parent nodes, this means using an auxiliary variable δ: This scoring system assumes that all states can be considered on an equally spaced linear scale and that the range of CPT rows for a two-state child node will contain values in the full range of 0-100%. These assumptions act as a constraint on the construction of the scores. The next two alternative algorithms will relax parts of these assumptions to create other patterns.

Weighted Diagonal
The first alternative algorithm that we propose is the 'Weighted Diagonal' algorithm. Now, at most, two child states get a non-zero probability for each combination of parent states. The steps in Hassall's algorithm as presented in the previous section are followed; however, we not only ask for a ranking of the parent states, but also for a relative weight (ω i,j ∈ R + ), with a higher weight corresponding to a better state, as defined by the experts. This means that choosing ω i,j = n j − j + 1 results in the same ranking as Hassall's algorithm. We now define: score best combination: score worst combination: and auxiliary variables: δ = 1 M−1 (BS − WS), and η m = BS − mδ.
Now, for the kth combination of parent nodes, we define We can define for each child state m the following variables: and from these variables, we calculate the probability for the mth child state, given the kth combination of parent nodes

Weighted Diamond
The second algorithm that we propose is the 'Weighted Diamond' algorithm. First, both the child state and the parent state have to be ordered. The child state has to be ordered in some relation from most preferred to least preferred. The parent state has to be ordered such that the resulting child state is expected to be decreasing. In practice, this is done by defining some ranking rules, with which the order can be generated automatically. Now, start with the combination of parent states, for which the most preferred child state gets a probability of one, and end with the combination of parent states, for which the least preferred child state gets a probability of one. For the middle one of all the combinations of the parent states, all child states have the same probability 1/M. Again, we add extra flexibility by not only asking for a ranking of the parent states, but also assigning a relative weight (ω i,j ∈ R + ), with a higher weight corresponding to a better state and no further scaling needed. Using the same definition for BS and WS, we define For the kth combination of parent nodes, we define Score k = Score k − 1 2 (BS + WS) + 1 2 (BS + WS).
Now, find the j for which η j−1 ≤ Score k < η j and calculate: The last step is calculating the resulting probabilities:

Example
We now look at an example. Assume that a child node has four states (Child 1 , ..., Child 4 ) and four parent nodes (Parent 1 , ..., Parent 4 ), having four, two, three and four states, respectively. The weights of the parent nodes and the weights of the parent states (State XY for state Y of parent X) are depicted in Table 2 When we order the states of the system on the score per combination of parent states, the probability per child state can be depicted. This is shown in Figures 5-7. Note that despite the weights, the patterns are totally symmetric. The weights have an influence on the order of the combinations of parent states, which influences the probability per combination. The figures show that Hassall's algorithm spreads the probability over the child nodes with small deviations. In all cases, all child states have a non-zero probability of occurrence. The Weighted Diagonal algorithm indeed gives a value to (at most) two child states at the same time, using the total range from zero to one. The Weighted Diamond algorithm starts (and ends) with full probability on one of the states and has, for the middle combination of parent states, the situation that all states have an equal probability.
Because the definition of the score parameter differs per method, the combinations of parent states are not ordered the same for each method. This is shown in Figures 8-10. The given combination is for Hassall's algorithm a bit left of the middle, and the other two are on the right side of the middle.

CPT Generation Using CPT per Parent
The second approach to generate the CPT for a child with multiple parents is using a CPT per parent and combining these to one generic table. This approach can be used when the influence of a parent, i.e., the CPT per parent, is available or can be generated quite easily and the parents are (supposed) independent of each other. Here also, there are multiple ways to realize this. Again, we have the relative weights of the influence of the parent nodes on the child node (w i ∈ R + ). Now, we assume a given CPT per parent and use p m,i,j to denote the probability of parent i, state j on child state m. Next, we introduce a child state (state M) that stands for 'NONE'. Now, for a specific combination of parent states k = {k 1 , ..., k K }, we calculate, using the intermediate variables Z m and Y m : to obtain the probability for the mth child state, given the kth combination of parent nodes This means that we calculate the probabilities of combining the parent states. If a certain combination is not possible, meaning p m,i,j = 0, this probability mass is assigned to the state 'NONE'. We will call this method using the 'NONE' state for all combinations of parent states that are not possible in the first approach. An alternative, the second approach, is that the probability mass that disappears by a certain p m,i,j = 0 is redistributed pro rata. Now, state 'NONE' is a generic state, leading to

Example
We look at an example where a certain node has three parents, where each parent has two states and the child has three states and a 'NONE' state. The conditional probability table of each parent is given in Table 3. Here, SP11 stands for State 1 of Parent 1. The resulting CPT for the first approach is shown in Table 4. See, for example, that the combination {SP11, SP21, SP31} has a zero probability for State 3 (Table 4), caused by the zero probability of SP31 on State 3 (Table 3). For the second approach (P m,k ), the probabilities are listed in Table 5. Now, the zero-probability entries disappear, as expected.

Conclusions
Filling conditional probability tables when working with BBNs can be hard to do, caused by the size of those tables and the necessity of insight into all the relations and dependencies to fill the table in a structured way. Generating standardized starting points, based on limited input, for further use in the elicitation process can help here. Using those processes shows that there is a lot of freedom, where the modeler has to make a choice. Starting from the algorithm by Hassall et al., we proposed two other algorithms to create specific patterns for the CPT. These patterns provide a starting point, based on a small number of parameters, that can be elaborated further in co-operation with domain experts. For when there is more information available, for example a CPT per parent node, we propose another algorithm that creates the full CPT over all parent nodes. Here also, there are many choices that a modeler can exploit. For further research, we recommend to compare the approaches presented in this paper, Wisse et al. [20] and Hassall et al. [21] to the automated entropy-based solutions, like [22].