Advanced Monte Carlo for Acquisition Sampling in Bayesian Optimization
Abstract
1. Introduction
1.1. Bayesian Optimization
| Algorithm 1 Bayesian Optimization | |
| Require: Budget T | |
| 1: ← Query p initial points based on LHS | |
| 2: for do | |
| 3: ← Update surrogate model with | |
| 4: ← Selection strategy using | ▹ See Section 2 |
| 5: ← | ▹ Observe query |
| 6: end for | |
| 7: return | |
1.2. Distributed BO
| Algorithm 2 Distributed Bayesian Optimization Node | |
| Require: Budget T | |
| 1: ← Query initial points based on LHS | |
| 2: Broadcast ← | ▹ To all other available nodes |
| 3: for do | |
| 4: ← | ▹ Collect other node data if available |
| 5: ← Update surrogate model with | |
| 6: ← Selection strategy using | |
| 7: ← | |
| 8: Broadcast ← | ▹ To all other available nodes |
| 9: end for | |
| 10: Sync data with other nodes | ▹ So that all nodes return the same optimum |
| 11: return | |
2. Selection Strategies
2.1. Acquisition Maximization
2.1.1. Expected Improvement
2.1.2. Thompson Sampling
2.2. Acquisition Sampling
2.2.1. Boltzmann Sampling
2.2.2. Direct Acquisition Sampling
3. MCMC for Acquisition Sampling
3.1. Metropolis–Hastings
3.2. Metropolis-Adjusted Langevin Algorithm
3.3. Hamiltonian Monte Carlo
3.4. No U-Turn Sampler
3.5. Cyclical SGLD
4. Results
4.1. Methodology
- Hamiltonian Monte Carlo (HMC) uses a step size of , with five integration steps and identity mass matrix .
- The Metropolis-adjusted Langevin algorithm (MALA) uses a step size of .
- No U-turn sampler (NUTS) uses a step size of and identity mass matrix .
- CyclicalSGLD uses 30 cycles, alternating exploration and sampling, with a 25–75% ratio, respectively, an initial step size of , and an SGD learning rate of .
- Mixture distribution Metropolis–Hastings (MMH) uses a proposal that combines different sizes of normal distributions:which can be sampled following a stratified sampling strategy:where is the selecting variable. This proposal achieves a good balance between modeling the locality of the density and jumping between different modes of the acquisition function, considering that the input space is normalized.
4.2. Experiments
4.2.1. Benchmark Functions
4.2.2. Rover Trajectory Problem
4.2.3. Ablation on Boltzmann Sampling
5. Discussion
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| BO | Bayesian optimization |
| GP | Gaussian process |
| EI | Expected improvement |
| LogEI | Log expected improvement |
| MaxEI | Maximization expected improvement |
| TS | Thompson sampling |
| BS | Boltzmann sampling |
| AS | Direct acquisition sampling |
| MCMC | Markov chain Monte Carlo |
| MH | Metropolis–Hastings |
| MMH | Mixture distribution Metropolis–Hastings |
| MALA | Metropolis-adjusted Langevin algorithm |
| HMC | Hamiltonian Monte Carlo |
| NUTS | No U-turn sampler |
| SGLD | Stochastic gradient Langevin dynamics |
| SGD | Stochastic gradient descent |
References
- Jones, D.; Schonlau, M.; Welch, W. Efficient Global Optimization of Expensive Black-Box Functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
- Mockus, J.; Tiesis, V.; Zilinskas, A. The application of Bayesian methods for seeking the extremum. In Towards Global Optimisation 2; Elsevier: Amsterdam, The Netherlands, 1978; pp. 117–129. [Google Scholar]
- Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent Advances in Bayesian Optimization. ACM Comput. Surv. 2023, 55, 287. [Google Scholar] [CrossRef]
- Hernandez-Lobato, J.; Requeima, J.; Pyzer-Knapp, E.; Aspuru-Guzik, A. Parallel and Distributed Thompson Sampling for Large-scale Accelerated Exploration of Chemical Space. In Proceedings of the ICML 2017, Sydney, NSW, Australia, 6–11 August 2017; pp. 1470–1479. [Google Scholar]
- Ma, H.; Zhang, T.; Wu, Y.; Calmon, F.P.; Li, N. Gaussian Max-Value Entropy Search for Multi-Agent Bayesian Optimization. In Proceedings of the 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Detroit, MI, USA, 1–5 October 2023; pp. 10028–10035. [Google Scholar] [CrossRef]
- Kandasamy, K.; Vysyaraju, K.R.; Neiswanger, W.; Paria, B.; Collins, C.R.; Schneider, J.; Poczos, B.; Xing, E.P. Tuning hyperparameters without grad students: Scalable and robust bayesian optimisation with dragonfly. J. Mach. Learn. Res. 2020, 21, 1–27. [Google Scholar]
- González, J.; Dai, Z.; Hennig, P.; Lawrence, N. Batch bayesian optimization via local penalization. In Proceedings of the AISTATS 2016, Cadiz, Spain, 9–11 May 2016; pp. 648–657. [Google Scholar]
- Desautels, T.; Krause, A.; Burdick, J. Parallelizing exploration-exploitation tradeoffs in Gaussian process bandit optimization. J. Mach. Learn. Res. 2014, 15, 3873–3923. [Google Scholar]
- Snoek, J.; Larochelle, H.; Adams, R. Practical Bayesian Optimization of Machine Learning Algorithms. In Proceedings of the NIPS 2012, Lake Tahoe, NV, USA, 3–8 December 2012; pp. 2960–2968. [Google Scholar]
- Dai, Z.; Low, B.K.H.; Jaillet, P. Differentially Private Federated Bayesian Optimization with Distributed Exploration. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 9125–9139. [Google Scholar]
- Egelé, R.; Guyon, I.; Vishwanath, V.; Balaprakash, P. Asynchronous Decentralized Bayesian Optimization for Large Scale Hyperparameter Optimization. In Proceedings of the 2023 IEEE 19th International Conference on e-Science (e-Science), Limassol, Cyprus, 9–13 October 2023; pp. 1–10. [Google Scholar] [CrossRef]
- Kandasamy, K.; Krishnamurthy, A.; Schneider, J.; Poczos, B. Parallelised Bayesian Optimisation via Thompson Sampling. In Proceedings of the AISTATS 2018, Playa Blanca, Lanzarote, 9–11 April 2018. [Google Scholar]
- Garcia-Barcos, J.; Martinez-Cantin, R. Fully Distributed Bayesian Optimization with Stochastic Policies. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, Macao, Hong Kong, 10–16 August 2019; pp. 2357–2363. [Google Scholar] [CrossRef]
- Garcia-Barcos, J.; Martinez-Cantin, R. Robust Policy Search for Robot Navigation. IEEE Robot. Autom. Lett. 2021, 6, 2389–2396. [Google Scholar] [CrossRef]
- Brooks, S.; Gelman, A.; Jones, G.; Meng, X.L. Handbook of Markov Chain Monte Carlo; CRC press: Boca Raton, FL, USA, 2011. [Google Scholar]
- Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.; de Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
- Ginsbourger, D.; Le Riche, R.; Carraro, L. Kriging is well-suited to parallelize optimization. In Computational Intelligence in Expensive Optimization Problems; Springer: Berlin/Heidelberg, Germany, 2010; pp. 131–162. [Google Scholar]
- Contal, E.; Buffoni, D.; Robicquet, A.; Vayatis, N. Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Proceedings of the ECMLKDD 2013, Prague, Czech Republic, 22–26 September 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 225–240. [Google Scholar]
- Srinivas, N.; Krause, A.; Kakade, S.; Seeger, M. Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design. In Proceedings of the ICML 2010, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Hennig, P.; Schuler, C. Entropy Search for Information Efficient Global Optimization. J. Mach. Learn. Res. 2012, 13, 1809–1837. [Google Scholar]
- Hernandez-Lobato, J.; Hoffman, M.; Ghahramani, Z. Predictive Entropy Search for Efficient Global Optimization of Black-box Functions. In Proceedings of the NIPS 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 918–926. [Google Scholar]
- Wang, Z.; Jegelka, S. Max-value entropy search for efficient Bayesian optimization. In Proceedings of the International Conference on Machine Learning. PMLR 2017, Sydney, NSW, Australia, 6–11 August 2017; pp. 3627–3635. [Google Scholar]
- Ament, S.; Daulton, S.; Eriksson, D.; Balandat, M.; Bakshy, E. Unexpected Improvements to Expected Improvement for Bayesian Optimization. In Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 20577–20612. [Google Scholar]
- Dillon, J.V.; Langmore, I.; Tran, D.; Brevdo, E.; Vasudevan, S.; Moore, D.; Patton, B.; Alemi, A.; Hoffman, M.; Saurous, R.A. TensorFlow Distributions. arXiv 2017, arXiv:1711.10604. [Google Scholar]
- Virtanen, P.; Gommers, R.; Oliphant, T.E.; Haberland, M.; Reddy, T.; Cournapeau, D.; Burovski, E.; Peterson, P.; Weckesser, W.; Bright, J.; et al. SciPy 1.0: Fundamental Algorithms for Scientific Computing in Python. Nat. Methods 2020, 17, 261–272. [Google Scholar] [CrossRef]
- Vakili, S.; Moss, H.; Artemev, A.; Dutordoir, V.; Picheny, V. Scalable Thompson Sampling using Sparse Gaussian Process Models. In Proceedings of the Advances in Neural Information Processing Systems, Virtual, 6–14 December 2021; Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., Vaughan, J.W., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2021; Volume 34, pp. 5631–5643. [Google Scholar]
- Rahimi, A.; Recht, B. Random Features for Large-Scale Kernel Machines. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 3–6 December 2007; Platt, J., Koller, D., Singer, Y., Roweis, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2007; Volume 20. [Google Scholar]
- Wilson, J.T.; Borovitskiy, V.; Terenin, A.; Mostowsky, P.; Deisenroth, M.P. Pathwise conditioning of Gaussian processes. J. Mach. Learn. Res. 2021, 22, 1–47. [Google Scholar]
- Bradbury, J.; Frostig, R.; Hawkins, P.; Johnson, M.J.; Leary, C.; Maclaurin, D.; Necula, G.; Paszke, A.; VanderPlas, J.; Wanderman-Milne, S.; et al. JAX: Composable Transformations of Python+NumPy Programs. 2018. Available online: https://github.com/jax-ml/jax (accessed on 7 January 2025).
- Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. Equation of State Calculations by Fast Computing Machines. J. Chem. Phys. 1953, 21, 1087–1092. [Google Scholar] [CrossRef]
- Girolami, M.; Calderhead, B. Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 2011, 73, 123–214. [Google Scholar] [CrossRef]
- Duane, S.; Kennedy, A.; Pendleton, B.J.; Roweth, D. Hybrid Monte Carlo. Phys. Lett. B 1987, 195, 216–222. [Google Scholar] [CrossRef]
- Homan, M.D.; Gelman, A. The No-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 2014, 15, 1593–1623. [Google Scholar]
- Zhang, R.; Li, C.; Zhang, J.; Chen, C.; Wilson, A.G. Cyclical Stochastic Gradient MCMC for Bayesian Deep Learning. In Proceedings of the International Conference on Learning Representations 2020, Virtual, 26 April–1 May 2020. [Google Scholar]
- Cabezas, A.; Corenflos, A.; Lao, J.; Louf, R. BlackJAX: Composable Bayesian inference in JAX. arXiv 2024, arXiv:2402.10797. [Google Scholar]
- Van Rossum, G.; Drake, F.L., Jr. Python Tutorial; Centrum voor Wiskunde en Informatica: Amsterdam, The Netherlands, 1995. [Google Scholar]
- Foreman-Mackey, D.; Yu, W.; Yadav, S.; Becker, M.R.; Caplar, N.; Huppenkothen, D.; Killestein, T.; Tronsgaard, R.; Rashid, T.; Schmerler, S. dfm/tinygp: The Tiniest of Gaussian Process Libraries; Zenodo: Geneva, Switzerlnad, 2024. [Google Scholar] [CrossRef]
- Geyer, C.J. Practical Markov Chain Monte Carlo. Stat. Sci. 1992, 7, 473–483. [Google Scholar] [CrossRef]
- Gelman, A.; Rubin, D.B. Inference from Iterative Simulation Using Multiple Sequences. Stat. Sci. 1992, 7, 457–472. [Google Scholar] [CrossRef]
- Shahriari, B.; Bouchard-Cote, A.; Freitas, N. Unbounded Bayesian Optimization via Regularization. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics; Proceedings of Machine Learning Research, Cadiz, Spain, 9–11 May 2016; Gretton, A., Robert, C.C., Eds.; Volume 51, pp. 1168–1176. [Google Scholar]
- Jamil, M.; Yang, X.S. A literature survey of benchmark functions for global optimisation problems. arXiv 2013, arXiv:1308.4008. [Google Scholar] [CrossRef]
- Surjanovic, S.; Bingham, D. Virtual Library of Simulation Experiments: Test Functions and Datasets. Available online: http://www.sfu.ca/~ssurjano (accessed on 18 November 2024).
- Wang, Z.; Gehring, C.; Kohli, P.; Jegelka, S. Batched Large-scale Bayesian Optimization in High-dimensional Spaces. arXiv 2017, arXiv:1706.01445. [Google Scholar]
- Wang, Z.; Mohamed, S.; De Freitas, N. Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers. In Proceedings of the 30th International Conference on Machine Learning, ICML’13, Atlanta, GA, USA, 16–21 June 2013; Volume 28, pp. III–1462–III–1470. [Google Scholar]
- Zabinsky, Z.B.; Smith, R.L. Hit-and-Run Methods. In Encyclopedia of Operations Research and Management Science; Gass, S.I., Fu, M.C., Eds.; Springer US: Boston, MA, USA, 2013; pp. 721–729. [Google Scholar] [CrossRef]












Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Garcia-Barcos, J.; Martinez-Cantin, R. Advanced Monte Carlo for Acquisition Sampling in Bayesian Optimization. Entropy 2025, 27, 58. https://doi.org/10.3390/e27010058
Garcia-Barcos J, Martinez-Cantin R. Advanced Monte Carlo for Acquisition Sampling in Bayesian Optimization. Entropy. 2025; 27(1):58. https://doi.org/10.3390/e27010058
Chicago/Turabian StyleGarcia-Barcos, Javier, and Ruben Martinez-Cantin. 2025. "Advanced Monte Carlo for Acquisition Sampling in Bayesian Optimization" Entropy 27, no. 1: 58. https://doi.org/10.3390/e27010058
APA StyleGarcia-Barcos, J., & Martinez-Cantin, R. (2025). Advanced Monte Carlo for Acquisition Sampling in Bayesian Optimization. Entropy, 27(1), 58. https://doi.org/10.3390/e27010058

