A Novel Hyper-Heuristic Algorithm for Bayesian Network Structure Learning Based on Feature Selection
Abstract
1. Introduction
- A single metaheuristic algorithm cannot assure optimal performance while learning various Bayesian networks.
- In the case of large-scale Bayesian networks, these metaheuristic algorithms often succumb to local optima, leading to diminished accuracy of the resulting graph.
- A feature selection-based method for MB discovery was employed to construct the search space and direction determination, and this knowledge served as soft constraints.
- We develop a library of low-level operations for BN structure learning.
- We propose an Exponential Monte Carlo with counter hyper-heuristic (EMCQ-HH) algorithm that integrates the soft and hard constraints derived from local structure learning techniques.
2. Background
2.1. BN
2.2. Information Theory
2.3. Scoring Function
3. Methodology
3.1. Proposed Hyper-Heuristic Algorithm
Algorithm 1: EMCQHH |
3.2. Low-Level Heuristics
3.2.1. Mutation Operators
3.2.2. Neighborhood Hill-Climbing Operators
3.2.3. Learning Operators
3.2.4. Restart Operators
- Remove all parents of the selected node X.
- Add the nodes that can increase the BIC score to the set of parent variables of X.
- Delete one by one the nodes in the parent node set of X that can increase the BIC score.
- Reverse the nodes in the parent node set of X one by one that can increase the BIC score.
- Randomly select a parent of node X and perform a parent–child conversion.
- Add the nodes that can increase the BIC score to the set of parent variables of X.
- Delete one by one the nodes in the parent node set of X that can increase the BIC score.
- Continue with Step 1 until the original parent of X has been executed.
3.2.5. Expert Knowledge Operator
4. Experiments
4.1. Datasets and Evaluation Metrics
- BIC: The mean and standard deviation of the BIC scores for multiple runs of the learning network.
- AE: The quantity of incorrect edges contained in the learned network compared with the original network.
- DE: The quantity of wrongly deleted edges in the learned network that were incorrectly deleted in comparison to the original network.
- RE: The quantity of edges with incorrect reversals in the learned network that were incorrectly reversed in comparison to the original network.
- RT: The average running time of the algorithm for multiple runs.
- F1: We use F1 to measure the accuracy of the algorithm. F1 is the harmonic average of precision and recall, where . Precision is defined as the ratio of the correct number of edges in the output to the total number of edges in the algorithm’s output, whereas recall is the ratio of the correct number of edges in the output to the actual number of edges in the original DAG. Therefore, F1 = 1 is the best case, and F1 = 0 is the worst case. The higher the F1 score is, the better.
4.2. Performance Evaluation of EMCQ-HH
4.3. Comparison with Some Other Algorithms
5. Conclusions and Future Research
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
BIC | Bayesian information criterion |
BN | Bayesian network |
CI | Conditional independence |
EMCQ-HH | Exponential Monte Carlo with counter hyper-heuristic |
F2SL | Feature Selection-based Structure Learning |
LLH | Low-level heuristics |
MB | Markov blanket |
MMHC | max-min hill-climbing |
PC | parent–child |
PSO | Particle swarm optimization |
References
- Yang, J.; Jiang, L.F.; Xie, K.; Chen, Q.Q.; Wang, A.G. Lung nodule detection algorithm based on rank correlation causal structure learning. Expert Syst. Appl. 2023, 216, 119381. [Google Scholar] [CrossRef]
- McLachlan, S.; Dube, K.; Hitman, G.A.; Fenton, N.E.; Kyrimi, E. Bayesian networks in healthcare: Distribution by medical condition. Artif. Intell. Med. 2020, 107, 101912. [Google Scholar] [CrossRef] [PubMed]
- Wang, Z.W.; Wang, Z.W.; He, S.W.; Gu, X.W.; Yan, Z.F. Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information. Appl. Energy 2017, 188, 200–214. [Google Scholar] [CrossRef]
- Tien, I.; Kiureghian, A.D. Algorithms for Bayesian network modeling and reliability assessment of infrastructure systems. Reliab. Eng. Syst. Saf. 2016, 156, 134–147. [Google Scholar] [CrossRef]
- Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
- Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
- Colombo, D.; Maathuis, M.H. Order-Independent Constraint-Based Causal Structure Learning. J. Mach. Learn. Res. 2014, 15, 3741–3782. [Google Scholar]
- Liu, X.; Gao, X.; Wang, Z.; Ru, X.; Zhang, Q. A metaheuristic causal discovery method in directed acyclic graphs space. Knowl.-Based Syst. 2023, 276, 110749. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, X.; Tan, X.; Yang, Y.; Chen, D. Learning Bayesian Networks based on Order Graph with Ancestral Constraints. Knowl.-Based Syst. 2021, 211, 106515. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, X.; Tan, X.; Liu, X. Determining the direction of the local search in topological ordering space for Bayesian network structure learning. Knowl.-Based Syst. 2021, 234, 107566. [Google Scholar] [CrossRef]
- de Campos, C.P.; Ji, Q. Efficient Structure Learning of Bayesian Networks using Constraints. J. Mach. Learn. Res. 2011, 12, 663–689. [Google Scholar]
- Yuan, C.; Malone, B. Learning Optimal Bayesian Networks: A Shortest Path Perspective. J. Artif. Intell. Res. 2013, 48, 23–65. [Google Scholar] [CrossRef]
- Wang, Z.; Gao, X.; Tan, X.; Liu, X. Learning Bayesian networks using A* search with ancestral constraints. Neurocomputing 2021, 451, 107–124. [Google Scholar] [CrossRef]
- Cussens, J.; Järvisalo, M.; Korhonen, J.H.; Bartlett, M. Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets and Complexity. J. Artif. Intell. Res. 2017, 58, 185–229. [Google Scholar] [CrossRef]
- Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
- Lee, J.; Chung, W.Y.; Kim, E. Structure learning of Bayesian networks using dual genetic algorithm. IEICE Trans. Inf. Syst. 2008, 91, 32–43. [Google Scholar] [CrossRef]
- Cui, G.; Wong, M.L.; Lui, H.K. Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Manag. Sci. 2006, 52, 597–612. [Google Scholar] [CrossRef]
- Gámez, J.A.; Puerta, J.M. Searching for the best elimination sequence in Bayesian networks by using ant colony optimization. Pattern Recognit. Lett. 2002, 23, 261–277. [Google Scholar] [CrossRef]
- Askari, M.B.A.; Ahsaee, M.G.; IEEE. Bayesian network structure learning based on cuckoo search algorithm. In Proceedings of the 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Shahid Bahonar Univ Kerman, Kerman, Iran, 28 February–2 March 2018; pp. 127–130. [Google Scholar]
- Wang, J.Y.; Liu, S.Y. Novel binary encoding water cycle algorithm for solving Bayesian network structures learning problem. Knowl.-Based Syst. 2018, 150, 95–110. [Google Scholar] [CrossRef]
- Sun, B.D.; Zhou, Y.; Wang, J.J.; Zhang, W.M. A new PC-PSO algorithm for Bayesian network structure learning with structure priors. Expert Syst. Appl. 2021, 184, 11. [Google Scholar] [CrossRef]
- Gheisari, S.; Meybodi, M.R. BNC-PSO: Structure learning of Bayesian networks by Particle Swarm Optimization. Inf. Sci. 2016, 348, 272–289. [Google Scholar] [CrossRef]
- Ji, J.Z.; Wei, H.K.; Liu, C.N. An artificial bee colony algorithm for learning Bayesian networks. Soft Comput. 2013, 17, 983–994. [Google Scholar] [CrossRef]
- Yang, C.C.; Ji, J.Z.; Liu, J.M.; Liu, J.D.; Yin, B.C. Structural learning of Bayesian networks by bacterial foraging optimization. Int. J. Approx. Reason. 2016, 69, 147–167. [Google Scholar] [CrossRef]
- Wang, X.C.; Ren, H.J.; Guo, X.X. A novel discrete firefly algorithm for Bayesian network structure learning. Knowl.-Based Syst. 2022, 242, 10. [Google Scholar] [CrossRef]
- Tsamardinos, I.; Brown, L.E.; Aliferis, C.F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
- Margaritis, D.; Thrun, S. Bayesian network induction via local neighborhoods. Adv. Neural Inf. Process. Syst. 1999, 12, 505–511. [Google Scholar]
- Ling, Z.; Yu, K.; Wang, H.; Liu, L.; Ding, W.; Wu, X. BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection. ACM Trans. Intell. Syst. 2019, 10, 1–25. [Google Scholar] [CrossRef]
- Wang, H.; Ling, Z.; Yu, K.; Wu, X. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 2020, 509, 227–242. [Google Scholar] [CrossRef]
- Yu, K.; Ling, Z.; Liu, L.; Li, P.; Wang, H.; Li, J. Feature Selection for Efficient Local-to-global Bayesian Network Structure Learning. ACM Trans. Knowl. Discov. Data 2024, 18, 1–27. [Google Scholar] [CrossRef]
- Ling, Z.; Yu, K.; Wang, H.; Li, L.; Wu, X. Using Feature Selection for Local Causal Structure Learning. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 530–540. [Google Scholar] [CrossRef]
- Pandiri, V.; Singh, A. A hyper-heuristic based artificial bee colony algorithm for k-Interconnected multi-depot multi-traveling salesman problem. Inf. Sci. 2018, 463, 261–281. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, J.L.; Zhang, J.L. Hyper-heuristic algorithm for traffic flow-based vehicle routing problem with simultaneous delivery and pickup. J. Comput. Des. Eng. 2023, 10, 2271–2287. [Google Scholar] [CrossRef]
- Drake, J.H.; Özcan, E.; Burke, E.K. A Case Study of Controlling Crossover in a Selection Hyper-heuristic Framework Using the Multidimensional Knapsack Problem. Evol. Comput. 2016, 24, 113–141. [Google Scholar] [CrossRef] [PubMed]
- Zamli, K.Z.; Din, F.; Kendall, G.; Ahmed, B.S. An experimental study of hyper-heuristic selection and acceptance mechanism for combinatorial t-way test suite generation. Inf. Sci. 2017, 399, 121–153. [Google Scholar] [CrossRef]
- Dang, Y.L.; Gao, X.G.; Wang, Z.D. Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints. Fractal Fract. 2025, 9, 394. [Google Scholar] [CrossRef]
Network | Nodes | Edges | Max.indeg | Max.outdeg | Domain Range |
---|---|---|---|---|---|
Alarm | 37 | 46 | 4 | 5 | 2–4 |
Hepar2 | 70 | 123 | 6 | 17 | 2–4 |
Win95pts | 76 | 112 | 7 | 10 | 2–2 |
Munin | 189 | 282 | 3 | 15 | 1–21 |
Andes | 223 | 338 | 6 | 12 | 2–2 |
Pigs | 441 | 592 | 2 | 39 | 3–3 |
Param. | Value | Descriptions |
---|---|---|
p | 0.1 | The constraint rate of the edge that is sure to exist |
q | 0.5 | The constraint rate of the nonexistent edge |
50 | The population size | |
The maximum number of unpromoted iterations allowed | ||
5000 | The maximum number of iterations allowed | |
20 | Control parameters of the elimination and dispersal operator | |
0.5 | Percentage receiving expert guidance |
Network | Dataset | BIC | AE | DE | RE | RT(s) | F1 |
---|---|---|---|---|---|---|---|
Alarm | 1000 | ± 53.83 | 0.2 ± 0.45 | 4.40 ± 0.55 | 2.20 ± 0.45 | 30.00 | 0.8974 ± 0.0166 |
3000 | ± 222.41 | 0.40 ± 0.55 | 3.20 ± 1.10 | 1.80 ± 0.84 | 46.86 | 0.9192 ± 0.0292 | |
5000 | ± 8.51 | 0.80 ± 0.45 | 2 ± 0 | 0 ± 0 | 39.72 | 0.9692 ± 0.0048 | |
10,000 | ± 4.92 | 1.20 ± 0.45 | 2.20 ± 0.45 | 1.80 ± 0.45 | 62.72 | 0.9231 ± 0 | |
Hepar2 | 1000 | ± 29.67 | 0 ± 0 | 71.60 ± 0.89 | 1.60 ± 0.55 | 58.92 | 0.5711 ± 0.0074 |
3000 | ± 11.44 | 0 ± 0 | 56.20 ± 0.45 | 1.60 ± 0.55 | 96.23 | 0.6870 ± 0.0076 | |
5000 | ± 264.73 | 0 ± 0 | 47.20 ± 1.30 | 1.80 ± 0.45 | 126.51 | 0.7444 ± 0.0098 | |
10,000 | ± 680.34 | 0 ± 0 | 36 ± 0.71 | 3.60 ± 1.34 | 218.80 | 0.7943 ± 0.0099 | |
Win95pts | 1000 | ± 192.53 | 5.80 ± 1.92 | 33.80 ± 4.66 | 4.60 ± 1.67 | 158.56 | 0.7504 ± 0.0496 |
3000 | ± 241.03 | 5 ± 1.22 | 25.80 ± 1.64 | 4.60 ± 2.51 | 183.29 | 0.8031 ± 0.0370 | |
5000 | ± 372.60 | 4.80 ± 2.39 | 21.20 ± 2.77 | 3.60 ± 1.52 | 236.90 | 0.8400 ± 0.0391 | |
10,000 | ± 2224.53 | 5.60 ± 1.82 | 13.60 ± 2.30 | 3.20 ± 1.64 | 427.16 | 0.8814 ± 0.0325 | |
Munin | 1000 | −6.0507 ± 1744.05 | 43.40 ± 4.39 | 154.80 ± 2.59 | 7.80 ± 2.17 | 3676.36 | 0.5080 ± 0.0086 |
3000 | −1.5007 ± 1816.09 | 43 ± 1 | 129 ± 2 | 13.67 ± 2.08 | 4510.92 | 0.5666 ± 0.0148 | |
5000 | −2.3602 ± 3403.33 | 46.67 ± 2.31 | 112.67 ± 3.51 | 12.33 ± 3.06 | 6014.27 | 0.6166 ± 0.0225 | |
10,000 | −4.4662 ± 10,871.15 | 43 ± 1.73 | 100.67 ± 3.21 | 10.33 ± 2.89 | 8753.79 | 0.6635 ± 0.0079 | |
Andes | 1000 | ± 45.41 | 1.33 ± 0.58 | 80.33 ± 1.53 | 2 ± 1 | 1004.97 | 0.8565 ± 0.0020 |
3000 | ± 64.71 | 1 ± 1 | 54.67 ± 1.53 | 2.33 ± 0.58 | 1344.88 | 0.9031 ± 0.0025 | |
5000 | ± 0.96 | 1.33 ± 0.58 | 39.33 ± 0.58 | 3.67 ± 0.58 | 1843.07 | 0.9248 ± 0 | |
10,000 | ± 2119.20 | 1 ± 1 | 28 ± 1 | 1.33 ± 0.58 | 3352.94 | 0.9512 ± 0.0038 | |
Pigs | 1000 | −3.4827 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 | 2850.50 | 1 ± 0 |
3000 | −1.0120 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 | 2635.21 | 1 ± 0 | |
5000 | −1.6760 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 | 2914.08 | 1 ± 0 | |
10,000 | −3.3268 ± 0 | 0 ± 0 | 0 ± 0 | 0 ± 0 | 3388.36 | 1 ± 0 |
Sample | Network | PC-Stable | GS | F2SL | MMHC | BNC-PSO | EMCQHH |
---|---|---|---|---|---|---|---|
1000 | Alarm | −1.2339 × | |||||
Hepar2 | −3.3992 × | ||||||
Win95pts | −1.0485 × | ||||||
Munin | − | −5.6270 × | |||||
Andes | −9.5844 × | ||||||
Pigs | − | −3.4827 × | |||||
3000 | Alarm | −3.4715 × | |||||
Hepar2 | −1.0000 × | ||||||
Win95pts | −2.9321 × | ||||||
Munin | − | −1.4397 × | |||||
Andes | −2.8213 × | ||||||
Pigs | −1.0120 | −1.0120 | −1.0120 × | ||||
5000 | Alarm | −5.6858 × | |||||
Hepar2 | −1.6590 × | ||||||
Win95pts | −4.7410 × | ||||||
Munin | − | −2.2662 × | |||||
Andes | −4.6763 × | ||||||
Pigs | −1.6760 × | ||||||
10,000 | Alarm | −1.1197 × | |||||
Hepar2 | −3.3125 × | ||||||
Win95pts | −9.3509 × | ||||||
Munin | − | −4.2882 × | |||||
Andes | −9.3268 × | ||||||
Pigs | −3.3268 | −3.3268 | −3.3268 × |
Sample | Network | PC-Stable | GS | F2SL | MMHC | BNC-PSO | EMCQHH |
---|---|---|---|---|---|---|---|
1000 | Alarm | 0.8810 | 0.4516 | 0.8434 | 0.7912 | 0.8667 | 0.8315 |
Hepar2 | 0.2927 | 0.3529 | 0.2953 | 0.5464 | 0.5385 | 0.5054 | |
Win95pts | 0.4654 | 0.4459 | 0.5297 | 0.5455 | 0.6146 | 0.6597 | |
Munin | 0.1180 | 0.0414 | 0.3129 | 0.2722 | 0.2850 | 0.3333 | |
Andes | 0.7179 | 0.5217 | 0.5820 | 0.5742 | 0.7064 | 0.8152 | |
Pigs | 0.7555 | 0.0707 | 0.9735 | 0.9975 | 0.9486 | 1 | |
3000 | Alarm | 0.9545 | 0.4179 | 0.8537 | 0.7333 | 0.8764 | 0.8989 |
Hepar2 | 0.3094 | 0.3827 | 0.3067 | 0.5670 | 0.6154 | 0.6392 | |
Win95pts | 0.6932 | 0.5375 | 0.6349 | 0.6884 | 0.7273 | 0.7549 | |
Munin | 0.2500 | 0.0621 | 0.3000 | 0.2659 | 0.3991 | 0.4169 | |
Andes | 0.8068 | 0.5594 | 0.5667 | 0.6941 | 0.8066 | 0.8660 | |
Pigs | 1 | 0.0943 | 0.9983 | 1 | 0.9713 | 1 | |
5000 | Alarm | 0.9545 | 0.4412 | 0.8571 | 0.7957 | 0.9333 | 0.9111 |
Hepar2 | 0.4255 | 0.4294 | 0.2267 | 0.5941 | 0.7122 | 0.6765 | |
Win95pts | 0.7701 | 0.5697 | 0.6526 | 0.7544 | 0.7558 | 0.8462 | |
Munin | 0.3769 | 0.0403 | 0.3581 | 0.3211 | 0.4915 | 0.5096 | |
Andes | 0.8485 | 0.5217 | 0.5882 | 0.7797 | 0.8223 | 0.8875 | |
Pigs | 0.9983 | 0.1178 | 0.9983 | 0.9958 | 0.9941 | 1 | |
10,000 | Alarm | 0.9318 | 0.4478 | 0.8434 | 0.8602 | 0.9556 | 0.9011 |
Hepar2 | 0.5572 | 0.4881 | 0.2267 | 0.7204 | 0.7793 | 0.7700 | |
Win95pts | 0.8061 | 0.6550 | 0.6237 | 0.6756 | 0.8778 | 0.8357 | |
Munin | 0.5380 | 0.0604 | 0.3113 | 0.3784 | 0.5031 | 0.5947 | |
Andes | 0.8736 | 0.5021 | 0.3286 | 0.7101 | 0.8703 | 0.9252 | |
Pigs | 0.9992 | 0.1031 | 1 | 0.9992 | 1 | 1 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dang, Y.; Gao, X.; Wang, Z. A Novel Hyper-Heuristic Algorithm for Bayesian Network Structure Learning Based on Feature Selection. Axioms 2025, 14, 538. https://doi.org/10.3390/axioms14070538
Dang Y, Gao X, Wang Z. A Novel Hyper-Heuristic Algorithm for Bayesian Network Structure Learning Based on Feature Selection. Axioms. 2025; 14(7):538. https://doi.org/10.3390/axioms14070538
Chicago/Turabian StyleDang, Yinglong, Xiaoguang Gao, and Zidong Wang. 2025. "A Novel Hyper-Heuristic Algorithm for Bayesian Network Structure Learning Based on Feature Selection" Axioms 14, no. 7: 538. https://doi.org/10.3390/axioms14070538
APA StyleDang, Y., Gao, X., & Wang, Z. (2025). A Novel Hyper-Heuristic Algorithm for Bayesian Network Structure Learning Based on Feature Selection. Axioms, 14(7), 538. https://doi.org/10.3390/axioms14070538