Propagation Computation for Mixed Bayesian Networks Using Minimal Strong Triangulation

: In recent years, mixed Bayesian networks have received increasing attention across various fields for probabilistic reasoning. Though many studies have been devoted to propagation computation on strong junction trees for mixed Bayesian networks, few have addressed the construction of appropriate strong junction trees. In this work, we establish a connection between the minimal strong triangulation for marked graphs and the minimal triangulation for star graphs. We further propose a minimal strong triangulation method for the moral graph of mixed Bayesian networks and develop a polynomial-time algorithm to derive a strong junction tree from this minimal strong triangulation. Moreover, we also focus on the propagation computation of all posteriors on this derived strong junction tree. We conducted multiple numerical experiments to evaluate the performance of our proposed method, demonstrating significant improvements in computational efficiency compared to existing approaches. Experimental results indicate that our minimal strong triangulation approach provides a robust framework for efficient probabilistic inference in mixed Bayesian networks.


Introduction
Originating from artificial intelligence [1], Bayesian networks offer intuitive graphical representations for joint probability distributions and efficient algorithms for probabilistic reasoning.These networks are powerful tools for managing uncertainty and are applied across various fields, including bioinformatics [2,3], computational biology [4][5][6], and neuroscience [7], among others.
Over the past two decades, there has been a growing interest in Bayesian networks that incorporate both continuous and discrete variables [8].This paper focuses on probabilistic models of such mixed networks, specifically those constrained to conditional linear Gaussian (CLG) distributions.Lauritzen [9] introduced a method for the exact local computation of means and variances in mixed Bayesian networks, but it suffered from numerical stability issues.Subsequently, Lauritzen and Jensen [10] developed an alternative, stable local computation scheme for these conditional Gaussian networks.However, this scheme is intricate, involving evaluations of matrix generalized inverses and recursive combinations of potentials.Cowell [11] proposed an alternative architecture utilizing exchange operations for variable elimination within an elimination tree.These exchange operations avoid complex matrix computations.Building upon the push and exchange operations, Madsen [12] extended lazy propagation, leveraging independence relations induced by evidence to enhance inference efficiency.The push operation is employed to obtain the posterior of continuous variables, while the exchange operation ensures the maintenance of a valid directed acyclic graph (DAG) structure during the belief update process.
Previous works have significantly improved the propagation computation for mixed Bayesian networks, primarily relying on strong junction trees.However, to our best knowledge, few works discuss how to construct appropriate strong junction trees.In this paper, we try to construct a strong junction tree under a minimal strong triangulation framework, which employs a polynomial-time minimal triangulation algorithm to minimize the number of fill-in edges.Additionally, we focus on the propagation computation of all posteriors.The process primarily involves three steps: finding triangulation and junction trees, performing inward and outward message passing, and computing posteriors for all nonevidence variables.This propagation computation based on junction trees facilitates the sharing of computations [13,14] among different posteriors.As suggested by Madsen [12], we applied exchange operations to eliminate both continuous and discrete variables during message passing.We conducted multiple simulation experiments, comparing our method with two existing methods: CGBayesNets [15] and BayesNetBP [16].The results indicate that our method outperforms the existing methods in computational speed.
The primary contribution of this paper is the construction of a strong junction tree within a minimal strong triangulation framework.Specifically, we establish a connection between the minimal strong triangulation for marked graphs and the minimal triangulation for star graphs.We further propose a minimal strong triangulation method for the moral graph of mixed Bayesian networks and develop a polynomial-time algorithm to derive a strong junction tree from this minimal strong triangulation.Additionally, we present two theorems that guarantee our algorithm can successfully identify strong junction trees for mixed Bayesian networks.
This paper is organized as follows.In Section 2, the notation and definitions used in mixed Bayesian networks are reviewed.In Section 3, we discuss the minimal strong triangulation for marked graphs.Section 4 describes our method to find strong junction trees for mixed Bayesian networks by using minimal strong triangulations.In Section 5, we give a detailed analysis on exchange operation.Section 6 presents message passing on strong junction trees and posterior computation.Some numerical experimental results are shown in Section 7. Finally, Section 8 concludes this paper.

Basic Notions on Decomposition and Triangulation for Marked Graphs
Considering two graphs A graph is complete if every pair of distinct vertices is adjacent.A clique is a maximal vertex set inducing a complete graph.Given a graph G = (V, E), a vertex subset S is called a separator for two disjoint vertex subsets A, B if every path in G between some x ∈ A and y ∈ B contains a vertex in S. We also say that S separates A and B in G.
are separated by S in G, and S is complete.
Let G = (V = ∆ Γ, E) be an undirected graph with two types of vertices, ∆ and Γ, and this graph G is called a marked graph.Moral graphs of directed acyclic graphs (DAGs) of mixed Bayesian networks are marked graphs and those graph operations introduced below can be applied to moral graphs.
) is a decomposition of G and any of the three conditions A ⊆ Γ, B ⊆ Γ or S ⊆ ∆ holds.
Leimer introduced the concept of marked graphs and their strong decomposition [17].We specifically adopt strong decomposition in Definition 2 for marked graphs because it preserves the secondary structure [18] of the marked graph, as indicated by the closure [19] of marginal distributions during inward message passing.
A graph G is called triangulated if G is complete or there is a decomposition (A, B, S) of G such that G(A S) and G(B S) are triangulated.Similarly, a marked graph G = (V = Γ ∆, E) is called strongly triangulated if G is complete or there is a strong decomposition (A, B, S) of G such that G(A S) and G(B S) are strongly triangulated.Given a marked graph G, we can construct a star graph from G by adding a vertex ⋆ to its vertex set and connecting ⋆ with every discrete variable.This star graph is denoted as G ⋆ = (V {⋆}, E ⋆ ), where E ⋆ = {(δ, ⋆)|δ ∈ ∆} E. For any subset B ⊆ V, the star graph (G(B)) ⋆ of the induced graph G(B) is just the induced subgraph G ⋆ (B {⋆}) of the star graph G ⋆ .The following Proposition 1 [20] characterizes the equivalent relationship between the strongly triangulated marked graph and its star graph.

Proposition 1 ([20]).
A marked graph G is strongly triangulated if and only if G ⋆ is triangulated.

Minimal Strong Triangulation for Marked Graphs
is not strongly triangulated for any proper subset F ′ of F. The following theorem establishes the relationship between minimal strong triangulations and minimal triangulations.
Proof.(⇒) If H = (V, E F) is a minimal strong triangulation of G, then H is strongly triangulated and H ⋆ is triangulated.Denote E 1 as the edge set {(δ, ⋆)|δ ∈ ∆}.Thus, Theorem 1 indicates that a minimal strong triangulation for a marked graph G can be obtained from a minimal triangulation for the star graph G * , which is an undirected graph.Specifically, we can apply the minimal triangulation algorithm MSC-M [21] to G * , and subsequently obtain a minimal strong triangulation for G by dropping the star vertex.
Notice that the weight w(G) of a marked graph G can be defined as the minimum sum of the weights of all the cliques over all possible strong triangulations of G, that is, where K H is the set of all the cliques in the triangulation H.The weight w(C) of a clique C is calculated as s + sm(m + 3)/2 [22], where s represents the product of the number of states of the discrete nodes in C, and m denotes the number of continuous nodes in C. In w(C), s has an exponential weight on the number of discrete nodes in C, while m(m + 3)/2 acts as a polynomial weight on the number of continuous nodes.Unlike the optimal triangulation computation for w(G) with an NP-hard computational complexity [23], our approach in this paper employs a polynomial-time minimal triangulation strategy to minimize the number of fill-in edges.Specifically, the time cost [21] of a minimal triangulation is O(|V||E|), where |V| and |E| are the number of vertices and edges, respectively.
In this paper, we focus on directed acyclic graphs (DAGs) G = (V, E), where V is marked into two groups, ∆ and Γ.The vertices in ∆ represent discrete variables, while those in Γ represent continuous variables.Since moral graphs of those DAGs with two types of variables are marked graphs, then we deduce the following corollary from Theorem 1: Corollary 1.Let G m be the moral graph of a DAG G with two types of variables; H is a minimal strong triangulation of G m if and only if H ⋆ is a minimal triangulation of (G m ) ⋆ .

Construction of Strong Junction Trees for Moral Graphs
Given a DAG G with two types of variables, its moral graph G m is a marked graph.Utilizing the MCS-M algorithm [21], we can derive a minimal triangulation graph K of (G m ) ⋆ .In this algorithm, the star vertex ⋆ can be designated as the first numbered vertex.Consequently, the resulting graph K manifests as a star graph of some graph H, as ⋆ remains unconnected to any continuous vertex.Therefore, H is a minimal strong triangulation of G m according to Corollary 1.The time complexity of the MCS-M algorithm for , where V 1 and E 1 denote the set of vertices and edges of (G m ) ⋆ , respectively.
The SMCS algorithm [17] shares a similar definition with the MCS algorithm [24], but always chooses discrete vertices if there is some discrete vertex with the maximum weight.Utilizing the SMCS algorithm enables us to acquire a D-numbering [25] Proof.Without loss of generalization, we assumes that V Γ ̸ = ∅, x n ∈ ∆, and m ≥ 2. Since H is strongly triangulated and the partition (V \ C m , R m , S m ) is a decomposition of H, this partition is also a strong decomposition of H. Otherwise, there are means there is only one clique in H, and the result is trivial.Since x n is the first vertex numbered by the SMCS algorithm, the cardinality of the discrete vertex is the same as that of the continuous vertex in C m .Thus, the discrete vertex is always numbered earlier than the continuous vertex by the SMCS.The number of discrete vertex in R m is always bigger than that of the continuous vertex in S m .It is a contradiction.So, S m ⊆ ∆ if R m ∆ ̸ = ∅.Furthermore, since the subgraph H(V \ R m ) is also strongly triangulated, the conclusion is obtained by recursive decompositions.
As demonstrated above, the SMCS algorithm can be employed to generate a strong D-ordered clique sequence for a strong triangulation H, enabling the subsequent construction of a strong junction tree using this sequence.The time complexity for utilizing the SMCS algorithm [17] to find a strong D-ordered clique sequence and construct a strong , where V 2 and E 2 denote the sets of vertices and edges of H, respectively.
From Corollary 1 and Theorem 2, we can devise the following Algorithm 1 with polynomial time complexity to find strong junction trees for mixed Bayesian networks:

The Exchange Operation for Mixed Bayesian Networks
In this paper, our focus lies on directed acyclic graphs (DAGs) denoted as G = (V = ∆ Γ, E).Here, the vertices of ∆ represent discrete variables, while those of Γ represent continuous variables.To exploit conditional linear Gaussian (CLG) distributions for modeling, we make an additional assumption that no continuous variables have discrete children in G.A mixed Bayesian network N = (G, P, F ) with CLG distributions comprises a DAG G, a set of conditional probability distributions P = {p(X|pa(X)) : X ∈ ∆}, and a set of CLG density functions F = { f (Y|pa(Y)) : Y ∈ Γ}, where pa(Z) is the set of the parents of Z in G.A mixed Bayesian network N has a joint distribution of the form: Let Y be a continuous variable and To compute all the marginal posteriors given evidence in mixed Besian networks, we employ propagation computation based on message passing within a strong junction tree structure.During message passing, barren variables [26] in the local structure can be directly eliminated without computation.And exchange operations [11] can avoid complex matrix operations, and make for finding barren variables [27].In this section, we will discuss the usage of exchange operations in mixed Bayesian networks in detail.
Assume that Z, Y, Y 1 , • • • , Y n are continuous variables and there is a pair of conditional Gaussian distributions: The exchange operation, equivalent to Theorem 1 of the work [28], converts (1) and (2) into another pair of distributions: where Z y i ).These formulae from Cowell [11] are essentially from the Bayes' theorem.( 3) and ( 4) keep the same joint distribution as (1) and (2).
Let us consider exchange operations in mixed Bayesian networks N = (G, P, F ). Assume that Y, Z ∈ Γ, Y ∈ pa(Z) and there are no other directed paths from Y to Z in G.The parent set pa(Y) of Y is partitioned into A 1 and A 2 , where The conditional density f Y of Y and f Z of Z have the following forms: Since there are no other directed paths from Y to Z except Y → Z, variables in pa(Z) \ {Y} are non-descendants of Y.By conditional independence relations in Bayesian networks, we have that Similarly, we also have that since any parent vertex of Y is non-descendant of Z.
Thus from Equations ( 5)-( 8), we have that: For any 1 Z ≡ 0. The exchange operation can convert Equations ( 5) and ( 6) into the following pair f ′ Y , f ′ Z of conditional densities: where Following the exchange operation, we reconstruct the mixed Bayesian network G into G ′ based on the altered conditional distributions of Y and Z. Specifically, the new network G ′ is the same as the original G except that Z → Y and X → Z, X ′ → Y in G ′ for any X ∈ pa(Y) and X ′ ∈ pa(Z) \ {Y}.And since Y ∈ pa(Z) and no other directed path exists from Y to Z in G, it can be verified that the new network G ′ is also a DAG.Thus, N ′ = (G ′ , P, F ′ ) is also a mixed Bayesian network with the same joint distribution with N , where The process from G to G ′ is called arc-reversal.By using arc-reversals step by step, any continuous variable can become a node with no children.Thus, the exchange operation provides us a method to eliminate continuous variables and avoids matrix operations [11].Now, we consider handling discrete variables by using the exchange operation.Let the corresponding probabilities of Q and X, respectively.The exchange operation converts the above pair of conditional probabilities into p(X|X 1 , • • • , X n ) and p(Q|X, X 1 , • • • , X n ), which maintain the same joint probability distribution of the original pair: From ( 13) and ( 14), the exchange operation converts p Q and p X into the following two conditional probabilities p ′ Q , p ′ X : by the assumption that there are no other directed paths from Q to X in G.
After the exchange operation, similar to the case of continuous variables, the mixed Bayesian network G can be reconstructed into G ′ according to the changed conditional distributions of Q and X. G ′ is the same as and G ′ is also a DAG.Thus, N ′ = (G ′ , P ′ , F ) is also a mixed Bayesian network with the same joint distribution with N , where The process from G to G ′ is also called arc-reversal.In this paper, arc-reversals for discrete variables with no continuous children are only used in message passing for further finding barren variables, and the lazy propagation [12,27,29] is also utilized to improve the computational efficiency.Following message passing, direct variable eliminations [30] are used to eliminate discrete variables in the marginal posterior computation.

Message Passing and Posterior Computing
Let H(p) denote the head variable of p ∈ P and T(p) the tail variables of p ∈ P.
denote the domain of p. H( f ), T( f ) and dom( f ) are similarly defined for f ∈ F .A potential π is represented as (P π , F π ), where P π denotes the set of conditional probability and F π denotes that related to conditional density.The combination of two potentials, π 1 = (P π 1 , F π 1 ) and π 2 = (P π 2 , F π 2 ), is defined as Let T = (C, E T ) be a strong junction tree of H, which is a minimal strong triangulation of G m .For any p ∈ P, choose a clique C ∈ C such that H(p) T(p) ⊆ C and allocate p to C. For any f ∈ F , choose a clique C ∈ C such that H( f ) T( f ) ⊆ C and allocate f to C. For any clique A ∈ C, A has an original potential π A = (P π A , F π A ), where P π A = {p ∈ P | p is allocated to A} and Assume that two cliques A, B ∈ C are adjacent on T and the current potential A .The information potential π A→B from A to B is computed by the following operations: (1) Find barren variables N [12] with respect to set S, evidence ϵ and DAG G ′ .Remove conditional probabilities and densities from π ′ A , whose head variable is contained in N, and the remaining potential is denoted by π 1 .Remove N from G ′ and the remaining graph is denoted by G ′′ = G ′ (D \ N).Let X be the set d-connected to S about ϵ R in G ′′ .Remove conditional probabilities and densities from π 1 , whose domain has no variables in S X.The resulting potential is denoted by π 2 .The set X can be partitioned into X d and X c , which denote discrete and continuous non-evidence variables, respectively.
(2) Eliminate variables in X c by exchange operations.Based on these operations, conditional densities of π 2 are changed, and those densities whose head variable is contained in X c are removed from π 2 .The remaining potential is denoted by π 3 .
( By these operations, the remaining potential is denoted by A is the combination of the original potential of A with all the information from the clique C ∈ adj T (A) \ {B} to A. Then, π A→B = (π A ( ∀X ∈ V, there is a clique A containing X.To compute the posterior distribution For Equation (20), exchange operations are performed on continuous variables, and the sum-out procedure [14] is used to eliminate discrete variables.Thus we can obtain the following Algorithm 2 for computing all the posteriors: Algorithm 2: Propagation Computation for All the Posteriors.
Input : A mixed Bayesian network N = (G, P, F ) with given evidence variable set ϵ = e Output : All the posteriors for non-evidence variables 1 Construct a strong junction tree T on G by Algorithm 1; 2 Choose a clique in the strong junction tree T as the root R; 3 Perform inward message passing to the root such that root R receives information from its neighbors; 4 Perform outward message passing from the root such that each clique in T receives information from its neighbors; 5 Compute the posterior P(X|ϵ = e) for each non-evidence variable X by Equations ( 17)-(20).

Numerical Experiment
Our experiments were conducted on a collection of randomly generated CLG sparse Bayesian networks consisting of n variables and ⌊n log(log(n))⌋ edges.Referring to the work [12], we set the number of variables n to {50, 75, 100} and the ratio of continuous variables to {0, 0.25, 0.5, 0.75, 1}.With this simulation setting, we can evaluate the performances of our method across various sizes of mixed Bayesian networks and different proportions of continuous variables.For each network size, each proportion of continuous variables, and each size of the evidence set, one hundred networks were randomly generated, with each discrete variable having two states.These experiments were conducted using C++ on a PC equipped with an Intel Core i9-9900X CPU running at 3.5GHz.We mainly utilized three standard libraries in C++: <vector>, <random>, and <algorithm>.The <vector> library is a versatile container for dynamic arrays but often allocates more memory than necessary, leading to potential overhead.The <random> library offers a comprehensive set of tools for generating random numbers but may produce inconsistent sequences across different platforms due to varying implementations.The <algorithm> library provides a rich set of functions for operations on data sequences, like sorting and searching, but does not directly support complex data structures such as graphs and trees.
In our experiments, the average time required to find minimal strong triangulations and strong junction trees for networks with 50, 75, and 100 variables is approximately 0.0005, 0.0014, and 0.0031 s, respectively.These results demonstrate that the algorithm outlined in Section 4 is efficient in producing strong junction trees for mixed Bayesian networks.Subsequent belief updates based on these strong junction trees are also carried out in our experiments.
Figures 1-3 depict the average time costs of the propagation computation of our method, including both message passing and the posterior computation for all nonevidence variables, across networks with 50, 75, and 100 variables.Meanwhile, Figures 4-6 display the median time costs of the propagation computation.As shown in Figures 1-6, the speed of belief updates in networks only containing continuous variables remains consistently fast, whether considering the average or median time costs.Moreover, as shown in Figures 5 and 6, at low numbers of instantiations, a low proportion of continuous variables tends to increase the median time costs for belief updates in Bayesian networks.This is due to the state space size of the largest clique potentially being larger in networks with a lower proportion of continuous variables compared to networks with a higher proportion, under similar conditions.However, once the number of instantiations exceeds half of the variables, the discrepancy among networks with varying proportions of continuous variables becomes small.This reduction occurs because the average largest discrete domain size decreases as the size of the evidence set increases [12].
A notable observation when comparing the average time costs depicted in Figure 3 with the median time costs shown in Figure 6 is the significant disparity between the two measures.This disparity arises primarily from the considerable variance in time costs resulting from the random sampling of Bayesian networks.For instance, in networks with 100 variables, a continuous variable proportion of 0.5, and 10 pieces of evidence, the average time is 54.3024 s with a standard deviation of 538.1081, while the median time is 0.1355 s.It is evident that this huge average time is caused by certain outliers in the time cost for 100 randomly generated networks.The reason for this abnormal result may lie in the generation of large-scale cliques with discrete variables under certain random seeds when the size of the evidence set is ten.For a further analysis of the efficiency of our method, we conducted a comparative study with two existing packages: CGBayesNets (a MATLAB package [15]) and BayesNetBP (an R package [16]).Figure 7 illustrates the average time costs on propagation computation of three methods for 100 randomly generated networks, each comprising 50 variables.In the left panel, the size of the evidence set is fixed at five, while in the right panel, the proportion of continuous variables is set to 0.5.As shown in both panels, it is evident that our method outperforms the other two methods in terms of computational speed.

Conclusions
The Bayesian network is a prominent predictive modeling framework, where its nodes often represent genes or proteins interconnected by edges denoting relationships.These edges, informed by experimental data, impart a predictive capability to Bayesian computational methods [15,31], capturing both the strength and direction of connections.Despite over two decades of development in mixed Bayesian networks, combining continuous and discrete variables [8], the field still lacks efficient algorithms for identifying appropriate strong junction trees in their moral graphs.In our study, we address this gap by investigating the minimal strong triangulation of moral graphs and presenting a polynomial-time algorithm for deriving strong junction trees based on this approach.Moreover, we also consider the message passing computation of all posteriors on this derived strong junction tree.Our experimental findings affirm the superior efficiency of our algorithm compared to existing methods for message passing in mixed Bayesian networks.The proposed approach not only demonstrates remarkable improvements in computational speed but also provides a robust framework for efficient probabilistic inference in mixed Bayesian networks, making it a valuable tool for diverse applications in bioinformatics, computational biology, and neuroscience.

Algorithm 1 :1
Construct a Strong Junction Tree Input : A DAG G with two types of variables Output : A strong junction tree T on G Construct the Star Graph: Create the star graph (G m ) ⋆ from the moral graph G m of the DAG G; 2 Minimal Triangulation: Apply the MCS-M algorithm to the star graph (G m ) ⋆ with the star chosen as the first numbered vertex to derive its minimal triangulation graph K; 3 Obtain Strong Triangulation: Remove the star vertex ⋆ from K to yield a minimal strong triangulation graph H of G m ; 4 Construct Strong Junction Tree: Apply the SMCS algorithm to the minimal strong triangulation graph H for generating a strong junction tree T.
as the union of domains of all variable in P π ′ A and F π ′ A , and R = D \ S, where S = A B. Let G ′ be a DAG induced by the current potential π ′ Eliminate variables in X d with no continuous children by exchange operations.Let Φ(x) denote {p ∈ π 3 |x ∈ dom(p)}.If | p∈Φ(x) dom(p)| > δ for some proper δ, exchange operations for eliminating x are delayed.Based on these operations, conditional probabilities of π 3 are changed, and those probabilities whose head variable x is contained in X d and | p∈Φ(x) dom(p)| ≤ δ are removed from π 3 .

Figure 1 .
Figure 1.Average time in seconds for belief update in networks with 50 variables.

Figure 2 .
Figure 2. Average time in seconds for belief update in networks with 75 variables.

Figure 3 .
Figure 3. Average time in seconds for belief update in networks with 100 variables.

Figure 5 .
Figure 5. Median time in seconds for belief update in networks with 75 variables.

Figure 6 .
Figure 6.Median time in seconds for belief update in networks with 100 variables.

Figure 7 .
Figure 7. Time costs of propagation computation for our method, CGBayesNets, and BayesNetBP.