Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy
Abstract
1. Introduction
2. Related Mathematical Knowledge
3. HPO via Maximization of the Evidence Function
3.1. Bayesian Linear Regression and Linear Basis Function Models
3.2. Evidence Approximation and Bayesian Model Comparison
3.3. The Evidence Function Evaluation
3.4. The Evidence Function Maximization
4. HPO EM Algorithm via Bayesian Optimization and Relative Entropy
5. Experimental Set-Up
5.1. Synthetic Data
Algorithm 1: HPO EM algorithm for synthetic data | |
1. Generate randomly hyperparameters value , , and the dataset | |
2. Choose a suitable linear basis function to get an matrix by using (12). | |
3. Generate parameters value randomly sampled by (2). | |
4. Generate N-dimensional vector sampled randomly by the Gaussian distribution and then generate by (4) and (5), respectively. | |
5. E step. Compute the mean and covariance using the current hyperparameter values. | |
, | (49) |
. | (50) |
6. M step. Estimate again the hyperparameters by employing the mean and covariance obtained by step 5 and the following update equations | |
, | (51) |
, | (52) |
. | (53) |
7. Compute the likelihood function or the log likelihood function given by the following result: | |
or | |
and then determine the convergence of the hyperparameters or the likelihood. If convergence is not satisfied, back to step 5. If the likelihood or the log likelihood converges, then the algorithm’s computational complexity is . |
5.2. The Diabetes Dataset
Algorithm 2: HPO EM algorithm for the diabetes dataset | |
1. Generate randomly hyperparameters value , , . | |
2. Choose a suitable linear basis function to get an matrix by using (12). | |
3. E step. Compute the mean and covariance using the current hyperparameter values. | |
(54) | |
(55) | |
4. M step. Estimate again the hyperparameters by employing the mean and covariance obtained by step 3 and the following update equations | |
(56) | |
(57) | |
(58) | |
5. Compute the likelihood function or log likelihood function given by the following result: | |
or | |
and then determine the convergence of the hyperparameters or the likelihood. If convergence criterion is not satisfied, go back to step 3. |
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A
Appendix B
Appendix C
References
- Alibrahim, H.; Ludwig, S.A. Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021. [Google Scholar]
- Algorain, F.T.; Alnaeem, A.S. Deep Learning Optimisation of Static Malware Detection with Grid Search and Covering Arrays. Telecom 2023, 4, 249–264. [Google Scholar] [CrossRef]
- Claesen, M.; Simm, J.; Popovic, D.; Moreau, Y.; Moor, B.D. Easy Hyperparameter Search Using Optunity. arXiv 2014. [Google Scholar] [CrossRef]
- Syarif, I.; Prugel-Bennett, A.; Wills, G. SVM parameter optimization using grid search and genetic algorithm to improve classification performance. TELKOMNIKA (Telecommun. Comput. Electron. Control.) 2016, 14, 1502–1509. [Google Scholar] [CrossRef]
- Liu, B. A Very Brief and Critical Discussion on AutoML. arXiv 2018. [Google Scholar] [CrossRef]
- Bergstra, J.; Bengio, Y. Random Search for Hyper-Parameter Optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R. Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 2012, 4, 1–9. [Google Scholar]
- Di Francescomarino, C.; Dumas, M.; Federici, M.; Ghidini, C.; Maggi, F.M.; Rizzi, W.; Simonetto, L. Genetic Algorithms for Hyperparameter Optimization in Predictive Business Process Monitoring. Inf. Syst. 2018, 74, 67–83. [Google Scholar] [CrossRef]
- Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
- Thiede, L.A.; Parlitz, U. Gradient based hyperparameter optimization in Echo State Networks. Neural Netw. 2019, 115, 23–29. [Google Scholar] [CrossRef] [PubMed]
- Bengio, Y. Practical Recommendations for Gradient-Based Training of Deep Architectures. In Neural Networks: Tricks of the Trade, 2nd ed.; Montavon, G., Orr, G.B., Müller, K.-R., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 437–478. [Google Scholar]
- Ruder, S. An overview of gradient descent optimization algorithms. arXiv 2016. [Google Scholar] [CrossRef]
- Bergstra, J.; Bardenet, R.; Kégl, B.; Bengio, Y. Algorithms for Hyper-Parameter Optimization. In Proceedings of the 25th International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011. [Google Scholar]
- Snoek, J.; Rippel, O.; Swersky, K.; Kiros, R.; Satish, N.; Sundaram, N.; Patwary, M.; Prabhat, M.; Adams, R. Scalable Bayesian Optimization Using Deep Neural Networks. arXiv 2015. [Google Scholar] [CrossRef]
- Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; Freitas, N.D. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
- Yao, C.; Cai, D.; Bu, J.; Chen, G. Pre-training the deep generative models with adaptive hyperparameter optimization. Neurocomputing 2017, 247, 144–155. [Google Scholar] [CrossRef]
- Feurer, M.; Klein, A.; Eggensperger, K.; Springenberg, J.; Blum, M.; Hutter, F. Efficient and robust automated machine learning. Adv. Neural Inf. Process. Syst. 2015, 28, 2944–2952. [Google Scholar]
- Kaul, A.; Maheshwary, S.; Pudi, V. AutoLearn—Automated Feature Generation and Selection. In Proceedings of the 2017 IEEE International Conference on Data Mining (ICDM), Orleans, LA, USA, 18–21 November 2017. [Google Scholar]
- Jin, H.; Song, Q.; Hu, X. Auto-Keras: An Efficient Neural Architecture Search System. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019. [Google Scholar] [CrossRef]
- Salehin, I.; Islam, M.S.; Saha, P.; Noman, S.M.; Tuni, A.; Hasan, M.M.; Baten, M.A. AutoML: A systematic review on automated machine learning with neural architecture search. J. Inf. Intell. 2024, 2, 52–81. [Google Scholar] [CrossRef]
- Chartcharnchai, P.; Jewajinda, Y.; Praditwong, K. A Categorical Particle Swarm Optimization for Hyperparameter Optimization in Low-Resource Transformer-Based Machine Translation. In Proceedings of the 28th International Computer Science and Engineering Conference (ICSEC), Khon Kaen, Thailand, 6–8 November 2024. [Google Scholar]
- Indrawati, A.; Wahyuni, I.N. Enhancing Machine Learning Models through Hyperparameter Optimization with Particle Swarm Optimization. In Proceedings of the International Conference on Computer, Control, Informatics and Its Applications (IC3INA), Bandung, Indonesia, 4–5 October 2023. [Google Scholar]
- Marchisio, A.; Ghillino, E.; Curri, V.; Carena, A.; Bardella, P. Particle swarm optimization hyperparameters tuning for physical-model fitting of VCSEL measurements. In Proceedings of the SPIE OPTO, San Francisco, CA, USA, 27 January–1 February 2024. [Google Scholar]
- Bishop, M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
- Golub, G.H.; Van Loan, C.F. Matrix Computations; JHU Press: Baltimore, MD, USA, 2013. [Google Scholar]
- Lütkepohl, H. Handbook of Matrices; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
- Aggarwal, C. Linear Algebra and Optimization for Machine Learning; Springer: Berlin/Heidelberg, Germany, 2020. [Google Scholar]
- Simovici, D. Mathematical Analysis for Machine Learning and Data Mining; World Scientific: Singapore, 2018. [Google Scholar]
- Zou, D.; Tong, L.; Wang, J.; Fan, S.; Ji, J. A Logical Framework of the Evidence Function Approximation Associated with Relevance Vector Machine. Math. Probl. Eng. 2020, 2020, 2548310. [Google Scholar] [CrossRef]
- Bernardo, M.; Smith, A.F. Bayesian Theory; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Gelman, A.; Carlin, J.B.; Stern, H.S.; Rubin, D.B. Bayesian Data Analysis; Chapman and Hall/CRC: Boca Raton, FL, USA, 1995. [Google Scholar]
- Berger, J.O. Statistical Decision Theory and Bayesian Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013. [Google Scholar]
- Wahba, G. A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Stat. 1985, 13, 1378–1402. [Google Scholar] [CrossRef]
- MacKay, D.J.C. Bayesian Interpolation. Neural Comput. 1992, 4, 415–447. [Google Scholar] [CrossRef]
- Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
- McLachlan, G.J.; Krishnan, T. The EM Algorithm and Extensions, 2nd ed.; Wiley-Interscience: Hoboken, NJ, USA, 2008. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. 1977, 39, 1–22. [Google Scholar] [CrossRef]
- Quinonero-Candela, J. Sparse probabilistic linear models and the RVM. In Learning with Uncertainty: Gaussian Processes and Relevance Vector Machines; Technical University of Denmark: Lyngby, Denmark, 2004. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zou, D.; Ma, C.; Wang, P.; Geng, Y. Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy. Entropy 2025, 27, 678. https://doi.org/10.3390/e27070678
Zou D, Ma C, Wang P, Geng Y. Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy. Entropy. 2025; 27(7):678. https://doi.org/10.3390/e27070678
Chicago/Turabian StyleZou, Dawei, Chunhua Ma, Peng Wang, and Yanqiu Geng. 2025. "Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy" Entropy 27, no. 7: 678. https://doi.org/10.3390/e27070678
APA StyleZou, D., Ma, C., Wang, P., & Geng, Y. (2025). Hyperparameter Optimization EM Algorithm via Bayesian Optimization and Relative Entropy. Entropy, 27(7), 678. https://doi.org/10.3390/e27070678