A Novel Algorithm for Merging Bayesian Networks
Abstract
:1. Introduction
1.1. Probability Theory
1.2. Bayesian Network
Review of Algorithms for Bayesian Networks
- Algorithms for learning structure include score-based methods and structure-based methods. Score-based methods aim to learn the network structure by searching for the structure that maximizes a scoring function, such as the Bayesian Information Criterion (BIC) [8] or the Akaike Information Criterion (AIC) [9]. These methods do not impose any constraints on the search space and are often used when no prior knowledge about the network structure is available. Structure-based methods, on the other hand, use prior knowledge about the structure of the network to constrain the search space. These methods impose a set of constraints on the possible network structures, such as the absence of certain edges or the presence of specific edge patterns. Examples of structure-based algorithms include constraint-based algorithms such as the PC (Peter and Clark) algorithm [10] and the Grow–Shrink algorithm [11].
- Algorithms for learning parameters aim to estimate the parameters of a given network structure. These algorithms can be used in conjunction with either score-based or structure-based methods. Examples of learning parameter algorithms include maximum likelihood estimation [6] and Bayesian parameter estimation [2].
- Inference algorithms are used to compute the probabilities of events of interest in a Bayesian network. Exact inference methods, such as variable elimination [2], provide the exact probability distribution of the events of interest, while approximate inference methods, such as Monte Carlo methods [2], provide an approximation of the probability distribution.
1.3. State of the Art
2. Materials and Methods
- Mathematical description of the algorithm;
- Solutions to issues related to cycles and CPT computation;
- Description of the node evaluation method used for algorithm evaluation.
2.1. Evaluation Method
2.2. Algorithm
- Two Bayesian networks, each containing a structure and conditional probability tables assigned to each node.
- Score values for all nodes in the networks, obtained using a chosen score method.
- A dataset that corresponds to the nodes in the Bayesian network structure.
- The issue of conditional probability tables (CPTs) will not be covered in the algorithm description to avoid unnecessary complexity. They will be addressed in a separate section.
- The dataset is not needed for the algorithm, as it is only necessary for node score, CPT counting, and Bayesian network evaluation.
- The algorithm does not depend on the chosen node score method, as long as the lower the value of the node score, the better the dataset fits on that node.
- Throughout the algorithm, the input networks will be referred to as “first” and “second”, while the resulting network will be referred to as “final”.
- The final network is initially empty, and nodes and edges are added during the algorithm’s execution.
2.2.1. Algorithm Inputs
2.3. Mathematical Description
- For each node x in the input dataset :
- (a)
- Retrieve the set of parent nodes of node x in the first input network, denoted as .
- (b)
- Retrieve the set of parent nodes of node x in the second input network, denoted as .
- (c)
- If the sets of parents are equal:
- Retrieve the conditional probability table of node x in the first input network, denoted as .
- Retrieve the conditional probability table of node x in the second input network, denoted as .
- If the conditional probability tables are equal , add node x to the set of symmetric nodes .
- (d)
- If the sets of parents are not equal, move on to the next node.
- Add Symmetric Nodes to Final Sets.
- (a)
- For each node x in the set of symmetric nodes :
- Add the selected node x and its parents to the final node set : .
- Add the edges between the parents and node x to the final edge set : , where represents the edge from node to node in . This way, we are selecting edges from where the target node is the node x. The index k is not necessary in this context.
- Add the conditional probability table of node x to the final set of conditional probability tables : .
- Choose a node that was not already chosen from X.In the first step of the Merging Algorithm, a node is chosen from the input dataset X based on the difference between node scores. The node with the highest score difference is selected, and this node will not be selected again. Mathematically, the index with the highest value from the vector of absolute score differences, S, is found using the argmax function, and this index is stored in the variable i. The selected node is denoted as , and its score is set to zero to avoid being selected again in the future. The first step can be summarized by the following equations:
- Choose input Bayesian Network. The second step is to choose an input Bayesian network. This is determined by comparing the score values in the corresponding score vectors, and . Specifically, we check whether the score of the selected node in is greater than the score in (i.e., whether ). If this is the case, we choose as the input network for the current step. Otherwise, we choose as the input network. We denote the index of the chosen input network as j, which is defined as follows:Once the input network has been selected, we can proceed to the next step of the algorithm.
- Add the selected node into Final Network.In the third step of the algorithm, we check if the selected node is already present in the final node set . This is done by introducing the variable , which is set to 0 if is already present in and 1 otherwise. Mathematically,This variable will be useful in later steps of the algorithm.If the variable is equal to 1, we add the selected node to the final node set , i.e., . Otherwise, nothing happens, i.e., remains the same.
- Search the input edges of the selected node .To summarize, in step 4, we are searching for edges in the input Bayesian network that are directed towards the selected node . We accomplish this by iterating over all the edges in the set for the selected input network (indexed by j), and checking if the target node of each edge is equal to the selected node . The edges that meet this condition are saved into the set :The mathematical notation for this step, as given in Equation (8), specifies the set of edges in such that the target node index t is equal to the index i of the selected node . The variable k iterates over all edges in , while s iterates over all possible source nodes in the node set . Finally, the condition ensures that the edges we consider are only from the input network we have chosen.
- Find parents of the selected node .We have already identified the edges leading to the selected node in the set H. The goal of this step is to find the parents of the selected node .Mathematically, we create a new set of nodes that will include all the parents of the selected node , regardless of whether they are already in the final node set . The set is defined as follows:This notation goes through all nodes in the set X and selects those that satisfy the condition that there exists an edge in the set H where the index of the source node s is equal to the index p of the node in the set X. The relevant nodes are then saved to the set .
- Algorithm branching based on the value of the variable.The variable was introduced in point 5 in order to determine whether the selected node is already present in the final node set . Depending on its value, the algorithm will branch into two paths:
- If , it means that the selected node is already present in the final node set from previous iterations. In this case, we can proceed to point 9, keeping in mind that adding this node may create a cycle.
- If , it means that the selected node has already been added to the final node set in this iteration. Therefore, we can proceed to point 12. It is important to note that no cycle can occur in this case.
The branching of the algorithm is based on this simple principle: If the selected node is already in the final node set , then no cycle can occur, so we can proceed to the next step. Otherwise, we need to be careful not to introduce a cycle by adding the node to the final set, and so we proceed to a different path in the algorithm. - Find parents of the selected node present in the set .In this step, we are in the branch where the selected node is already present in the final node set . Therefore, we need to determine whether there are any parents of already in the set . To achieve this, we take the intersection of sets and P, where is the set of parents of the selected node . We denote the resulting set by T:If T is empty, it means that no parents of are in the final node set , and therefore a cycle cannot be created. In this case, the algorithm continues with point 12. Otherwise, it continues with point 10. We can express this mathematically as
- Find cycle.Unfortunately, we now risk that by adding the selected node with its parent links, we will cause a cycle in the final network. We will therefore prepare a set U of all relevant edges in which we will search for the cycle. It will consist of the set of edges of the final network and the set of edges between the selected node and its parents H. Mathematically written, we haveWe then apply a cycle search algorithm to this set.Detecting cycles in directed graphs is a well-known problem in graph theory, and there are various algorithms to accomplish this task. However, discussing these algorithms is outside the scope of this particular algorithm. For completeness, we reference a well-known algorithm for cycle detection in directed graphs by Tarjan [24].All we need here is information on whether the set U contains a cycle. We introduce the auxiliary variable cycle. Its value will depend on this information as follows:
- New setup while the same selected node is .We have reached a point where we cannot add the edges between the selected node and its parents due to the cycle in the final network.If this is the first time the selected node is being processed in the current iteration, we can try selecting the other input Bayesian network even though the selected node has a worse score in it. In this case, we change the value of the variable j as follows:We also need to clear all the temporary sets, set the variable to false (since the selected node is already present in the set ), and continue with point 6:If, however, this is the second time the selected node is being processed in the current iteration, and there is still a cycle in the final network, the algorithm will leave the node as it is. This means that it remains in the final set of nodes , and the algorithm continues at point 14.Note that searching for cycles in oriented graphs is a separate topic in graph theory, and we will not delve into it here. For more information, refer to [24].Regrettably, this approach could result in the final network being no better than the input networks. Additional information on this matter can be found in a separate section (Section 2.5).
- Add missing parents into the final node set .At this point, we will add the missing parents to the final node set . We have the parents of the selected node available in the set P and we just need to unite with the final node set .This approach also avoids problems with the presence of a parent in both sets P and .
- Add edges into the final edge set .We will now do the same with the relevant edges (i.e. edges leading to the selected node ) of the selected node stored in the set H. Again, we will use the union of this set and the final edge set .
- Assignment of the local probability function.So far, all points of the algorithm have solved the structure of the network. Now, it is time to assign a local probability function to the selected node .This is based on the previous point of the algorithm:
- The previous point is point 13.In this case, the selected node and its input edges correspond to the node and the input edges from the selected Bayesian network. Thus, the conditional probability function from the selected Bayesian network is simply added into the final set of conditional probability functions :
- The previous point is point 11.In this situation, we do not have a chance to assign a conditional probability function directly from the input Bayesian networks. Section 2.6 deals with the issue of calculating conditional probability functions of a selected node when it is not possible to take any of the input networks.For this description, it is enough to somehow obtain the conditional probability function and then assign it to the selected node . This function is then added into the final set of conditional probability functions :
After the selected node and its parents, as well as the edges between them, have been added to the final network, if possible, we perform the setup for the next iteration. This involves setting all variables and auxiliary sets to their default values:If the vector of score value differences S includes only zero values,
2.4. Flowchart of Algorithm
2.5. Dealing with Cycles
2.6. Filling in Missing Conditional Probability Tables
3. Results
3.1. Example S1—Traffic Accident Data
3.1.1. Data Description
3.1.2. Description of Input Networks
3.1.3. Node Evaluation Results
3.1.4. Final (Merged) Network
3.1.5. Overall Network Evaluation
3.1.6. Validation Results
3.2. Example S2—Railway Crossing Accident Data
3.2.1. Data Description
3.2.2. Description of Input Networks
3.2.3. Node Evaluation Results
3.2.4. Final (Merged) Network
3.2.5. Overall Network Evaluation
3.3. Validation Results
4. Discussion
5. Conclusions
Supplementary Materials
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
AIC | Akaike Information Criterion |
BIC | Bayesian Information Criterion |
CPT | Conditional Probability Table |
DAG | Directed Acyclic Graph |
EM | Expectation Maximization |
FCI | Fast Causal Inference |
GES | Greedy Equivalence Search |
MDPI | Multidisciplinary Digital Publishing Institute |
MLE | Maximum Likelihood Estimator |
PC (algorithm) | Peter and Clark (algorithm) |
References
- Glymour, M.; Pearl, J.; Jewell, N.P. Causal Inference in Statistics: A Primer; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Scanagatta, M.; Salmerón, A.; Stella, F. A survey on Bayesian network structure learning from data. Prog. Artif. Intell. 2019, 8, 425–439. [Google Scholar] [CrossRef]
- Kjærulff, U.B.; Madsen, A.L. Probabilistic Networks—An Introduction to Bayesian Networks and Influence Diagrams; Aalborg University: Aalborg, Denmark, 2005; pp. 10–31. [Google Scholar]
- Wasserman, L. All of Statistics: A Concise Course in Statistical Inference; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
- Darwiche, A. Modeling and Reasoning with Bayesian Networks; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
- Schwarz, G. Estimating the dimension of a model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
- Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control. 1974, 19, 716–723. [Google Scholar] [CrossRef]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, prediction, and search. In Causation, Prediction, and Search; MIT Press: Cambridge, MA, USA, 2000; pp. 7–94. [Google Scholar]
- Chickering, D.M. Optimal structure identification with greedy search. J. Mach. Learn. Res. 2003, 3, 507–554. [Google Scholar]
- Govender, I.H.; Sahlin, U.; O’Brien, G.C. Bayesian network applications for sustainable holistic water resources management: Modeling opportunities for South Africa. Risk Anal. 2022, 42, 1346–1364. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Yazdi, M.; Huang, H.Z.; Huang, C.G.; Peng, W.; Nedjati, A.; Adesina, K.A. A fuzzy rough copula Bayesian network model for solving complex hospital service quality assessment. Complex Intell. Syst. 2023. [Google Scholar] [CrossRef] [PubMed]
- Del Sagrado, J.; Moral, S. Qualitative combination of Bayesian networks. Int. J. Intell. Syst. 2003, 18, 237–249. [Google Scholar] [CrossRef] [Green Version]
- Jiang, C.a.; Leong, T.Y.; Kim-Leng, P. PGMC: A framework for probabilistic graphical model combination. In Proceedings of the American Medical Informatics Association Annual Symposium, Washington, DC, USA, 22–26 October 2005; Volume 2005, p. 370. [Google Scholar]
- Feng, G.; Zhang, J.D.; Liao, S.S. A novel method for combining Bayesian networks, theoretical analysis, and its applications. Pattern Recognit. 2014, 47, 2057–2069. [Google Scholar] [CrossRef]
- Gross, T.J.; Bessani, M.; Junior, W.D.; Araújo, R.B.; Vale, F.A.C.; Maciel, C.D. An analytical threshold for combining bayesian networks. Knowl. Based Syst. 2019, 175, 36–49. [Google Scholar] [CrossRef]
- Hansen, L.; Salamon, P. Neural network ensembles. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 993–1001. [Google Scholar] [CrossRef] [Green Version]
- Opitz, D.W.; Maclin, R. Popular ensemble methods: An empirical study. J. Artif. Intell. Res. 1999, 11, 169–198. [Google Scholar] [CrossRef]
- Polikar, R. Ensemble learning. In Ensemble Machine Learning: Methods and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012; pp. 1–34. [Google Scholar]
- Scutari, M.; Vitolo, C.; Tucker, A. Learning Bayesian networks from big data with greedy search: Computational complexity and efficient implementation. Stat. Comput. 2019, 29, 1095–1108. [Google Scholar] [CrossRef] [Green Version]
- Kareem, S.; Okur, M.C. Bayesian Network Structure Learning Using Hybrid Bee Optimization and Greedy Search; Çukurova University: Adana, Turkey, 2018. [Google Scholar]
- Vaniš, M. Optimization of Bayesian Networks and Their Prediction Properties. Ph.D. Thesis, Czech Technical University in Prague, Prague, Czech Republic, 2021. [Google Scholar]
- Tarjan, R. Depth-first search and linear graph algorithms. SIAM J. Comput. 1972, 1, 146–160. [Google Scholar] [CrossRef]
- Meek, C. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; pp. 403–410. [Google Scholar]
- Wiecek, W.; Bois, F.Y.; Gayraud, G. Structure learning of Bayesian networks involving cyclic structures. arXiv 2019, arXiv:1906.04992. [Google Scholar]
- Korb, K.B.; Nicholson, A.E. Bayesian Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. (Methodol.) 1977, 39, 1–22. [Google Scholar]
- Casella, G.; Berger, R.L. Statistical Inference; Cengage Learning: Boston, MA, USA, 2021. [Google Scholar]
- Vaniš, M.; Urbaniec, K. Employing Bayesian Networks and conditional probability functions for determining dependences in road traffic accidents data. In Proceedings of the 2017 Smart City Symposium Prague (SCSP), Prague, Czech Republic, 26–27 May 2016; pp. 1–5. [Google Scholar] [CrossRef]
Node Name | Expert | Algorithm | Merged Net Evaluation | Difference | Rank | Selected Network |
---|---|---|---|---|---|---|
Alcohol present | 590.41 | 590.41 | 590.41 | 0.00 | 15 | Sym |
State of surface | 321.22 | 316.06 | 316.06 | 5.16 | 13 | Alg |
Wind conditions | 248.05 | 144.85 | 144.85 | 103.21 | 8 | Alg |
Visibility | 806.83 | 794.36 | 794.36 | 12.47 | 12 | Alg |
Main causes | 614.09 | 511.32 | 511.32 | 102.77 | 9 | Alg |
Road division | 1028.11 | 608.61 | 608.61 | 419.50 | 4 | Alg |
Location intersection | 603.12 | 865.82 | 865.45 | 262.70 | 5 | Alg |
Type of accident | 548.18 | 336.88 | 686.72 | 211.29 | 6 | - |
Location accident | 318.74 | 336.48 | 304.79 | 17.74 | 11 | Exp |
Specific places nearby | 906.52 | 904.35 | 906.43 | 2.17 | 14 | - |
Type of collision | 314.13 | 894.25 | 387.38 | 580.13 | 2 | Exp |
Type of barrier | 493.08 | 2.00 | 2.00 | 491.09 | 3 | Alg |
Number of involved | 458.58 | 254.82 | 254.82 | 203.77 | 7 | Alg |
Layout | 862.92 | 114.78 | 114.78 | 748.14 | 1 | Alg |
Accident severity | 428.93 | 457.95 | 429.93 | 29.02 | 10 | Exp |
Overall | 8722.31 | 7132.95 | 6917.91 |
Node Name | Expert | Algorithm | Merged Net Evaluation |
---|---|---|---|
Alcohol present | 172.40 | 172.40 | 172.40 |
State of surface | 150.51 | 149.07 | 149.07 |
Wind conditions | 102.80 | 98.81 | 98.81 |
Visibility | 327.23 | 323.21 | 323.21 |
Main causes | 190.01 | 167.98 | 167.98 |
Road division | 307.83 | 185.86 | 185.86 |
Location intersection | 178.38 | 263.52 | 263.42 |
Type of accident | 165.99 | 106.43 | 209.46 |
Location accident | 79.98 | 91.14 | 91.20 |
Specific places nearby | 263.47 | 261.39 | 263.44 |
Type of collision | 94.87 | 268.61 | 94.87 |
Type of barrier | 157.57 | 0.61 | 0.61 |
Number of involved | 144.75 | 79.63 | 79.63 |
Layout | 258.40 | 35.19 | 35.19 |
Accident severity | 137.77 | 136.56 | 137.77 |
Overall | 2731.96 | 2340.42 | 2272.93 |
Node Name | Expert | Algorithm | Merged Net Evaluation | Difference | Rank | Selected Network |
---|---|---|---|---|---|---|
Accident severity | 343.26 | 310.88 | 343.26 | 32.37 | 2 | Exp |
Cause | 277.30 | 281.25 | 277.30 | 3.95 | 6 | Exp |
Killed | 14.23 | 14.23 | 14.23 | 0.00 | 8 | Sym |
Light injury | 25.85 | 25.85 | 25.85 | 0.00 | 9 | Sym |
Material damage over 400,000 CZK | 383.54 | 406.13 | 383.54 | 22.60 | 4 | Exp |
Railway electrification | 466.22 | 466.22 | 466.22 | 0.00 | 10 | Sym |
Railway gauge | 23.73 | 23.79 | 23.73 | 0.06 | 7 | Exp |
Railway tracks | 89.20 | 89.20 | 89.20 | 0.00 | 11 | Sym |
Region | 354.46 | 364.53 | 364.53 | 10.06 | 5 | Alg |
Road class | 457.52 | 427.98 | 427.98 | 29.54 | 3 | Alg |
Season | 434.85 | 434.85 | 434.85 | 0.00 | 12 | Sym |
Severe injury | 103.92 | 252.20 | 103.92 | 148.28 | 1 | Exp |
Overall | 2974.08 | 3097.11 | 2954.60 |
Node | Expert Network | Algorithm Network | Merged Network |
---|---|---|---|
Accident severity | 107.26 | 96.15 | 107.26 |
Cause | 97.35 | 98.89 | 97.35 |
Killed | 4.64 | 4.64 | 4.64 |
Light injury | 7.67 | 7.67 | 7.67 |
Material damage over 400,000 CZK | 125.79 | 134.95 | 125.79 |
Railway electrification | 155.17 | 155.17 | 155.17 |
Railway gauge | 8.90 | 8.91 | 8.90 |
Railway tracks | 27.67 | 27.67 | 27.67 |
Region | 122.33 | 125.58 | 125.58 |
Road class | 155.74 | 141.79 | 141.79 |
Season | 145.87 | 145.87 | 145.87 |
Severe injury | 32.17 | 83.92 | 32.17 |
Total | 990.56 | 1031.21 | 979.86 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vaniš, M.; Lokaj, Z.; Šrotýř, M. A Novel Algorithm for Merging Bayesian Networks. Symmetry 2023, 15, 1461. https://doi.org/10.3390/sym15071461
Vaniš M, Lokaj Z, Šrotýř M. A Novel Algorithm for Merging Bayesian Networks. Symmetry. 2023; 15(7):1461. https://doi.org/10.3390/sym15071461
Chicago/Turabian StyleVaniš, Miroslav, Zdeněk Lokaj, and Martin Šrotýř. 2023. "A Novel Algorithm for Merging Bayesian Networks" Symmetry 15, no. 7: 1461. https://doi.org/10.3390/sym15071461
APA StyleVaniš, M., Lokaj, Z., & Šrotýř, M. (2023). A Novel Algorithm for Merging Bayesian Networks. Symmetry, 15(7), 1461. https://doi.org/10.3390/sym15071461