Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization

Zhu, Hangyu; Wang, Xilu

doi:10.3390/math13172894

Open AccessArticle

Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization

by

Hangyu Zhu

^1,2 and

Xilu Wang

^3,*

¹

School of Artificial Intelligence and Computer Science, Jiangnan University, No.1800 Lihu Road, Wuxi 214122, China

²

C*Core Technology Co., Ltd., No.1 Building, No.99 Fenhu Road, Suzhou 215004, China

³

Computer Science Research Centre, University of Surrey, Surrey GU2 7XH, UK

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(17), 2894; https://doi.org/10.3390/math13172894

Submission received: 13 August 2025 / Revised: 4 September 2025 / Accepted: 5 September 2025 / Published: 8 September 2025

(This article belongs to the Special Issue Multi-Objective Optimizations and Their Applications)

Download

Browse Figures

Versions Notes

Abstract

Bayesian Optimization (BO) provides an efficient framework for optimizing expensive black-box functions by employing a surrogate model (typically a Gaussian Process) to approximate the objective function and an acquisition function to guide the search for optimal points. Batch BO extends this paradigm by selecting and evaluating multiple candidate points simultaneously, which improves computational efficiency but introduces challenges in optimizing the resulting high-dimensional acquisition functions. Among existing acquisition functions for batch Bayesian Optimization, entropy-based methods are considered to be state-of-the-art methods due to their ability to enable more globally efficient while avoiding redundant evaluations. However, they often fail to fully capture the dependencies and interactions among the selected batch points. In this work, we propose a Multi-Objective Batch Energy–Entropy acquisition function for Bayesian Optimization (MOBEEBO), which adaptively exploits the correlations among batch points. In addition, MOBEEBO incorporates multiple types of acquisition functions as objectives in a unified framework to achieve more effective batch diversity and quality. Empirical results demonstrate that the proposed algorithm is applicable to a wide range of optimization problems and achieves competitive performance.

Keywords:

Bayesian Optimization; batch Bayesian Optimization; acquisition function

MSC:

65K10

1. Introduction

Bayesian Optimization (BO) [1,2] has emerged as a powerful framework for solving global optimization problems involving expensive-to-evaluate and unknown objective functions. It is particularly well suited for scenarios where function evaluations are constrained by computational cost or time limitations [3]. Traditional BO operates sequentially, selecting one candidate point at each iteration and updating the surrogate model based on the observed outcome. However, many real-world applications, including materials discovery [4] and hyperparameter tuning of deep neural networks [5], allow for parallel function evaluations. In such settings, selecting and evaluating a batch of candidate points simultaneously can substantially reduce the total optimization time while maintaining or even improving the quality of the final solution.

The application of BO to deep neural network training has significantly expanded its research scope [6]. Subsequent studies have explored methods to handle noisy and multi-fidelity data [7,8], high-dimensional spaces [9], and batch evaluations [10]. These advancements typically involve improving one of two fundamental components of the BO: the surrogate model or the acquisition function. The surrogate model provides a probabilistic approximation of the unknown objective function, enabling uncertainty quantification for unexplored regions, with the Gaussian Process (GP) [11] being a common choice. Following this, the acquisition function uses this surrogate model to identify candidate points for evaluation in the next iteration, aiming to strike a balance between exploring high-uncertainty regions to improve the model and exploiting areas that currently appear promising [12].

Several acquisition functions have been proposed to guide the candidate selection process. Expected Improvement (EI) [13] is one of the most widely used due to its intuitive appeal and closed-form solution under Gaussian assumptions. It favors points that are expected to improve upon the current best observation. While the Upper Confidence Bound (UCB) [14] introduces a tunable parameter to control the exploration–exploitation balance, selecting points with high predictive mean and high uncertainty. These methods are efficient in the sequential single-point setting but face challenges when extended to batch parallel evaluations. To address this, more recent work has shifted toward entropy-based acquisition functions, such as Predictive Entropy Search (PES) [15] and Max-value Entropy Search (MES) [16], which directly aim to reduce the uncertainty about the location of the global optimum. More recently, the Batch Energy–Entropy Bayesian Optimization (BEEBO) method [17] has been introduced, offering precise control over the exploration–exploitation trade-off during the optimization process. As a result, entropy-based methods have been widely regarded as state-of-the-art for Batch Bayesian Optimization.

However, these entropy-based approaches inherently fail to capture the correlations among the selected batch points and often treat batch points independently, which can exacerbate redundancy and limit effective exploration of the objective function. While BEEBO partially addresses this issue by incorporating an energy–entropy term to capture global information, it still does not explicitly model the pairwise correlations among batch points. Consequently, the chosen points may be redundant or may not fully exploit the complementary information distributed across the batch. Therefore, in this work, we propose a Multi-Objective batch Energy–Entropy acquisition function for Bayesian Optimization (MOBEEBO) with an RBF kernel-based regularizer that directly quantifies pairwise similarities, promoting diversity and reducing redundancy in batch selection. The main contributions of this paper are the following:

We develop a multi-objective framework that simultaneously integrates multiple acquisition function types, enabling adaptive exploration–exploitation trade-offs and more balanced batch selection decisions.
We introduce an energy-based regularization term that explicitly models and exploits correlations among batch points, enhancing the diversity and quality of selected candidates.
Through comprehensive empirical evaluation, we demonstrate that MOBEEBO overall achieves superior or competitive performance compared to state-of-the-art methods across a diverse range of optimization benchmarks.

MOBEEBO is broadly applicable to continuous, expensive, black-box optimization problems. It is particularly effective when evaluations are costly, where efficient use of each batch evaluation is critical. By building upon the general Bayesian optimization framework and incorporating multiple acquisition strategies, MOBEEBO remains versatile and robust across a wide spectrum of applications.

2. Related Work

Batch extensions of classical acquisition strategies have become a central focus in parallel Bayesian Optimization. These methods typically build upon well-established single-point acquisition functions such as EI [13], Knowledge Gradient (KG) [18], and UCB [14]. When adapted to select a batch of Q points per iteration, these functions yield batch variants such as q-EI and q-UCB [19]. While single-point formulations often possess closed-form expressions that enable efficient gradient-based optimization, extending them to the batch setting introduces significant computational challenges. In most cases, batch acquisition functions lack closed-form expressions and involve high-dimensional integrals over joint distributions, necessitating alternative optimization approaches such as greedy sequential selection, Monte Carlo approximations, or analytical approximations of joint expectations over multiple query points [20].

For example, the EI acquisition function selects the next query point by maximizing the expected gain over the best value observed so far, which is denoted as

f_{t}^{*}

. When using a GP as the surrogate model, EI can be computed in closed form based on the model’s predictive mean

μ (x)

and variance

Σ (x)

. However, extending EI to the batch setting (q-EI), where multiple points are simultaneously selected, the joint evaluation requires integrating over a multivariate Gaussian distribution, which quickly becomes intractable as the batch size increases. In such cases, Monte Carlo (MC) sampling [21] is commonly employed to approximate the acquisition value and its gradient. Yet, MC methods can be inefficient, especially in high-dimensional settings, due to the exponential increase in sample complexity [22]. To address this, Wilson et al. [19] applied the reparameterization trick [23] to reformulate the integrals involved in acquisition functions, enabling efficient gradient-based optimization. This approach has proven effective, particularly in problems with moderate to high dimensionality.

More recently, entropy-based acquisition functions have been proposed for batch BO due to their ability to directly and globally reduce uncertainty about the location or value of the global optimum, resulting in more informative, non-redundant, and sample-efficient batch selection. This viewpoint has motivated the development of approaches such as Entropy Search (ES) [24], PES [15], and MES [16]. A key distinction of MES is its focus on the mutual information [25] between the unknown optimum value and the observed data rather than on the location of the optimum itself. Building on MES, the General-purpose Information-Based Bayesian Optimization (GIBBON) method [26] introduces a scalable extension suitable for batch selection and more complex scenarios, such as multi-fidelity optimization. However, its performance declines for large batches (e.g.,

Q > 50

) due to approximation errors, necessitating the use of heuristic diversity-enhancing techniques. To mitigate this issue, Teufel et al. [17] proposed a BEEBO acquisition function, which incorporates an entropy term to balance information gain and diversity through an energy–entropy trade-off. This formulation does not require MC sampling and enables the efficient selection of batch points that are both informative and diverse.

3. The MOBEEBO Acquisition Function

Our proposed MOBEEBO addresses scalability and redundancy challenges through a combination of gradient-based batch optimization and explicit diversity regularization. Redundancy is mitigated via an RBF kernel-based regularizer that penalizes closely spaced batch points, while the integration of BEEBO, q-EI, and q-UCB in a unified framework which ensures that exploration, exploitation, and batch diversity are adaptively balanced.

Let

f_{true} : X \to R

denote an unknown objective function that maps inputs to the real-valued outputs. And also, suppose we are given a dataset with N samples

D = {\{(x_{i}^{t r a i n}, y_{i}^{t r a i n})\}}_{i = 1}^{N}

, and a batch of Q candidate query points

x = (x_{1}, \dots, x_{Q}) \in X^{Q}

, for which we aim to compute an acquisition value. Following the BO paradigm, we place a posterior distribution over the GP surrogate function f evaluated at these query points as

f (x) \sim P (f | D, x) = N (f | μ (x), Σ (x))

(1)

where

N (f | μ (x), Σ (x))

represents the multivariate Gaussian distribution with mean

μ (x)

and covariance

Σ (x)

. For GPs, both the posterior mean and covariance of Q queries can be computed in closed-form expressions as

\begin{matrix} μ (x) & = K (x, x_{D}) K {(x_{D}, x_{D})}^{- 1} y_{D} \\ Σ (x) & = K (x, x) - K (x, x_{D}) K {(x_{D}, x_{D})}^{- 1} K (x_{D}, x) \end{matrix}

(2)

where

x_{D} = {\{x_{i}\}}_{i = 1}^{N}

and

y_{D} = {\{y_{i}\}}_{i = 1}^{N}

are the N observed inputs and observations, respectively, and

K (\cdot, \cdot)

represents the kernel matrix computed using a GP kernel function (e.g., the RBF kernel [27]). Note that the augmented covariance

Σ_{aug} (x)

for the Q candidate points can be readily computed using the augmented inputs

x_{aug} = (x_{D}, x)

, without requiring the true observations

y

corresponding to these Q inputs:

Σ_{aug} (x) = K (x, x) - K (x, x_{aug}) K {(x_{aug}, x_{aug})}^{- 1} K (x_{aug}, x)

(3)

Based on this, BEEBO constructs a batch acquisition function to maximize Q points, introducing a temperature T to control the trade-off between exploration and exploitation. This temperature functions as a hyperparameter that scales the relative influence of uncertainty versus mean predictions when evaluating candidate points. A higher temperature encourages broader exploration by amplifying the contribution of uncertainty, whereas a lower temperature promotes exploitation of regions with higher predicted means. The corresponding acquisition function is defined as follows:

\begin{matrix} A_{BEEBO} & = E (x) + T \cdot I (x) \\ = \sum_{q = 1}^{Q} μ (x_{q}) + T \cdot (H (f | D, x) - H_{aug} (f | D, x)) \end{matrix}

(4)

where

E (x)

represents the energy term that drives exploitation, while the mutual information

I (x)

serves as the entropy term promoting exploration. Specifically,

I (x)

is derived from differential entropy H computed as

\begin{matrix} H (f | D, x) & = - \int_{f} P (f | D, x) ln (P (f | D, x)) d f \\ = - E_{f} [ln (N (f | μ (x), Σ (x)))] \\ = \frac{Q}{2} log (2 π e) + \frac{1}{2} log |Σ (x)| . \end{matrix}

(5)

where e is is the base of natural logarithms. Similarly, the augmented differential entropy

H_{aug} (f | D, x)

of f can be expressed as

H_{aug} (f | D, x) = \frac{Q}{2} log (2 π e) + \frac{1}{2} log |Σ_{aug} (x)|

. As a result, the batch acquisition function of BEEBO can be simplified as

A_{BEEBO} (x) = \sum_{q = 1}^{Q} μ (x_{q}) + T \cdot (\frac{1}{2} log |Σ (x)| - \frac{1}{2} log |Σ_{aug} (x)|)

(6)

The advantage of this acquisition function is that it does not require access to the true observations

y

for Q candidate points, which would otherwise incur significant computational costs if

f_{true}

is expensive to evaluate. However, it lacks the ability to capture correlations among Q batch points, thereby failing to fully exploit the complementary information distributed across the batch. To mitigate this limitation, we propose introducing an additional RBF-like repulsion regularization term with a bounded and smooth penalty, which is formulated as

R (x) = \sum_{q < q^{'} \in Q} exp (- \frac{{∥x_{q} - x_{q^{'}}∥}^{2}}{2 ℓ^{2}})

(7)

where ℓ is the length-scale hyperparameter that controls how quickly the function values change with respect to the input, and

x_{q}

and

x_{q^{'}}

are two different points belonging to Q batch points

x

. In addition, while BEEBO balances exploration and exploitation through global uncertainty reduction, it may under-exploit areas near the current optimum. Conversely, q-EI and q-UCB focus on local improvement and uncertainty-guided exploitation. To combine their strengths, a multi-objective acquisition function integrating BEEBO, q-EI, and q-UCB is proposed as follows, enabling simultaneous global exploration, local refinement, and uncertainty-driven sampling:

A_{MOBEEBO} (x) = A_{BEEBO} (x) + A_{q - EI} (x) + A_{q - UCB} (x) + R (x)

(8)

where

A_{q - EI} (x)

and

A_{q - UCB} (x)

can be efficiently estimated via MC sampling, enabling fast, gradient-based optimization as follows:

\begin{matrix} A_{q - EI} (x) & = E [max (max f (x) - f^{*}, 0)] \\ \approx \frac{1}{M} \sum_{j = 1}^{M} max (max_{q} f^{(j)} (x_{q}) - f^{*}, 0) \\ A_{q - UCB} (x) & = E [max f (x)] \\ \approx \frac{1}{M} \sum_{j = 1}^{M} max f^{(j)} (x) \\ \approx \frac{1}{M} \sum_{j = 1}^{M} max_{q} (\begin{matrix} \bar{f} (x_{q}) + β |f^{(j)} (x_{q}) - \bar{f} (x_{q})| \end{matrix}) \end{matrix}

(9)

where M represents the number of MC samples,

\bar{f} (x_{q}) = \frac{1}{M} \sum_{j = 1}^{M} f^{(j)} (x_{q})

is the predictive mean of the q-th point across M samples, and

β

is the hyperparameter controls the exploration–exploitation trade-off in q-UCB. By combining them, MOBEEBO leverages these diverse perspectives to achieve a more balanced search strategy that avoids the biases inherent in relying on a single criterion. Trade-offs among the components are naturally balanced by their shared probabilistic formulation and Monte Carlo estimation, which normalize contributions to comparable scales; in practice, this yields stable optimization without requiring problem-specific manual tuning.

The overall optimization process of our proposed MOBEEBO is shown in Algorithm 1, with the learning rate defined as

γ

. In this setup, the

GP

model incorporates both the training data

D

and the learned kernel function K, while the augmented covariance matrix

Σ_{aug} (x)

is computed using a refitted

GP

model based on the augmented inputs

x_{aug}

. The batch of Q points

x

is optimized using a gradient ascent algorithm, rather than the heuristic search methods commonly used in multi-objective evolutionary optimization, to achieve faster convergence.

And convergence is achieved when the gradient norm

| | ▽_{x} A_{MOBEEBO} (x) | |

falls below a small threshold, indicating that candidate point

x

has reached a local maximum of the acquisition function. Due to the non-convexity introduced by the combination of energy–entropy, expected improvement, upper confidence bound, and batch diversity terms, the algorithm generally converges to a locally optimal configuration rather than a global optimum. The MC estimation of q-EI and q-UCB introduces stochasticity, which can be mitigated by using a sufficiently large number of samples. Overall, upon convergence, MOBEEBO provides a batch of points that effectively balances exploration, exploitation, and diversity, thereby guiding efficient evaluation of expensive black-box functions.

Algorithm 1: MOBEEBO optimization.

Input:

GP

model, observed data

D

, Q batch points

x

1:: repeat
2:: Calculate $μ (x)$ and $Σ (x)$ using $GP$ given in Equation (2)
3:: $E (x) \leftarrow \sum_{q = 1}^{Q} μ (x_{q})$
4:: Calculate $Σ_{aug} (x)$ using $x_{aug} = (x_{D}, x)$ given in Equation (3)
5:: Compute $A_{BEEBO} (x)$ using Equation (6)
6:: $R (x) \leftarrow \sum_{q < q^{'}} exp (- \frac{{∥x_{q} - x_{q^{'}}∥}^{2}}{2 ℓ^{2}})$
7:: $A_{q - EI} (x) \leftarrow \frac{1}{M} \sum_{j = 1}^{M} max ({max}_{q} f^{(j)} (x_{q}) - f^{*}, 0)$
8:: $A_{q - UCB} (x) \leftarrow \frac{1}{M} \sum_{j = 1}^{M} {max}_{q} (\begin{matrix} \bar{f} (x_{q}) + β |f^{(j)} (x_{q}) - \bar{f} (x_{q})| \end{matrix})$
9:: $A_{MOBEEBO} (x) \leftarrow A_{BEEBO} (x) + A_{q - EI} (x) + A_{q - UCB} (x) + R (x)$
10:: $x \leftarrow x + γ ▽_{x} A_{MOBEEBO} (x)$
11:: until Converged

Output: Optimized batch points

x

4. Experiments

4.1. Experimental Settings

We evaluated the performance of MOBEEBO on a diverse collection of benchmark optimization problems with varying input dimensions, as summarized in Table 1. These benchmark functions were implemented in the BoTorch library [28], with several supporting flexible dimensional configurations. To evaluate performance in higher-dimensional settings, we configured the Ackley, Rastrigin, and Levy functions to operate in 10-dimensional search spaces, providing a representative set of moderately high-dimensional optimization challenges.

For each test function, we conducted 10 independent runs of BO using the proposed MOBEEBO algorithm and compare the results against baseline methods q-EI and q-UCB, as well as the state-of-the-art entropy-based method BEEBO. To ensure fair comparison, we adopted identical experimental settings across all methods. The temperature parameter T was set to 0.5, following the configuration in the original BEEBO work. And the length-scale ℓ of the regularizer in Equation (7) was set to 1 for moderate-strength repulsion. The Gaussian Process surrogate model was initialized with 100 randomly sampled points, each positioned at least 0.5 units away from the known global optimum to ensure challenging and unbiased starting conditions. For the Monte Carlo-based methods (q-EI and q-UCB), we used

M = 1024

samples for acquisition function evaluation. To eliminate random variation effects, identical random seeds were employed across all methods, ensuring that each independent run would begin from the same initial design points. All experiments were performed with a fixed batch size of

Q = 100

, which represents a large-batch regime in the BO literature [29].

4.2. Results

The optimization performance of the proposed MOBEEBO method compared with the three baseline methods over successive BO rounds is presented in Figure 1, Figure 2, Figure 3, Figure 4, Figure 5 and Figure 6. Each figure contains two panels: the left panel displays the mean best objective value found up to each iteration (incumbent) across five independent runs, while the right panel shows the mean objective value of the Q points evaluated in each batch. Both shaded regions represent one standard deviation around the mean performance.

It is evident that the proposed MOBEEBO acquisition function generally outperformed the other approaches over the 10 BO rounds, with the exception of the embedded Hartmann test problem, where it performed slightly worse than the alternatives. This strong performance can be attributed to the low predictive uncertainty of MOBEEBO, which arises from the inclusion of the RBF-based regularization term

R (x)

. By enhancing the correlation among the Q points within each batch, this term encourages more coherent exploitation of the search space, thereby reducing variance in the acquisition function and improving overall stability during optimization. And BEEBO achieved the second-best overall performance in the tests. The Q points it selects across rounds exhibit greater diversity but lower correlation, which can potentially encourage broader exploration of the search space. However, the resulting high uncertainty may hinder exploitation by allocating resources to less promising regions, leading to inefficient searches in unexpected areas.

Specifically, for the Ackley function, MOBEEBO exhibited the fastest convergence, reaching a value of approximately 3, which is closest to the optimal value of 0 among all methods. In contrast, q-UCB and BEEBO demonstrated similar convergence performance, while q-EI showed the poorest convergence, with a final optimization value of around 10. However, the trend differed for the Rastrigin function, where MOBEEBO converged the slowest during the initial BO rounds. Its performance curve then exhibited a sharp drop around the third round and continued to decline to nearly 40. BEEBO, q-UCB, and q-EI showed similar convergence patterns, with their performance stabilizing near the end of the optimization process. In addition, the search expectation for Rastrigin fluctuated minimally, indicating a relatively stable optimization process. Unlike the previous scenarios, all acquisition functions exhibited similar convergence performance, with MOBEEBO performing slightly better and q-EI performing slightly worse.

Similar to the Rastrigin problem, MOBEEBO initially showed only modest convergence performance for Shekel function, slightly better than q-EI, but experienced a performance boost in the fifth BO round, reaching approximately −7.8 and approaching the optimal value of −8. In contrast, MOBEEBO performed poorly on the embedded Hartmann function, achieving an optimal fitness value of around −3, whereas the other three acquisition functions converged to approximately −3.25, close to the optimal value of −3.32237. Moreover, all acquisition functions except MOBEEBO exhibited severe fluctuations across the Q batch points for this test problem. Finally, for the maximization problem of the Cosine Mixture test function, the proposed MOBEEBO exhibited both the fastest convergence speed and the best optimization performance across the rounds, while also maintaining the smallest fluctuations.

Finally, the test performance outcomes of all acquisition functions werre evaluated on the Ackley, Rastrigin, and Levy functions in extremely high-dimensional settings. The corresponding results for the 100-dimensional problems are shown in Figure 7. It can be clearly observed that our proposed MOBEEBO converged much faster than the other three acquisition functions in the early stages of the optimization and then stabilized, ultimately achieving the best optimization performance across all three test functions. In contrast, q-EI performed the worst, as it failed to converge in any of the test problems. A potential reason for this phenomenon is that, in extremely high dimensions, it is very difficult to attain the best evaluated fitness

f^{*}

, which serves as an anchor point for guiding the optimization direction. It is surprising to observe that BEEBO, similar to q-EI, failed to converge, with its final optimization result being almost indistinguishable from random initialization. This finding implicitly demonstrates that the redundancy introduced by batch optimization becomes more pronounced in high-dimensional problems.

To further validate the effectiveness of the MOBEEBO acquisition function, we conducted an additional 10 rounds of searches using the Q batch points selected in the previous rounds. For this scenario, the temperature T was set to 0 to enable full exploitation. Since the test problems vary widely in scale, we applied min-max normalization to the results for each problem. This normalization allows us to measure progress toward the true optimum on a standardized scale from 0 to 1. And the highest observed values averaged across multiple runs, together with the Wilcoxon test results after 10 BO rounds for all test functions, are presented in Table 2, with the best value highlighted in bold. It is evident that MOBEEBO achieved the highest searched value for almost all test functions, except for the embedded Hartmann function, where its value of 0.996 was slightly lower than that of BEEBO. Specifically, for the Shekel test problem, MOBEEBO significantly outperformed all other methods, attaining a value of 0.828, which is approximately 0.5 higher than the second-best result achieved by BEEBO.

5. Conclusions

In this paper, we introduced MOBEEBO, a multi-objective batch energy–entropy acquisition function for BO that explicitly accounts for the correlations among batch points. By integrating multiple acquisition functions within a unified framework, MOBEEBO leverages the complementary strengths of different selection strategies, enabling both globally efficient and non-redundant exploration. Empirical evaluations across diverse optimization problems demonstrate that MOBEEBO overall achieved competitive or superior performance compared to SOTA methods, including entropy-based approaches. These results highlight the potential of multi-objective and correlation-aware acquisition strategies in advancing batch BO.

Nevertheless, MOBEEBO also has certain limitations. Its performance may be sensitive to the choice of hyperparameters, such as the temperature T, and the computational cost of evaluating multiple acquisition objectives can increase with batch size Q and problem dimension. Moreover, the current formulation assumes relatively noise-free settings, which may not fully reflect the challenges of real-world applications. Future research directions will therefore include providing a formal theoretical analysis of MOBEEBO’s convergence guarantees, which include investigating its robustness under noisy and constrained optimization scenarios and developing scalable approximations to improve efficiency with very large batch sizes and high-dimensional problems.

Author Contributions

H.Z.: Conceptualization, Methodology, Software, Writing—Original Draft. X.W.: Conceptualization, Writing—Review and Editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) under Grant 62406129 and Shanghai Pujiang Program (23PJ1421800).

Data Availability Statement

The original data presented in the study are openly available in BEE-BO at https://github.com/novonordisk-research/BEE-BO (accessed on 1 August 2025).

Conflicts of Interest

Author Hangyu Zhu was employed by the company C*Core Technology Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Močkus, J. On Bayesian methods for seeking the extremum. In IFIP Technical Conference on Optimization Techniques; Springer: Berlin/Heidelberg, Germany, 1974; pp. 400–404. [Google Scholar]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; de Freitas, N. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 2016, 104, 148–175. [Google Scholar] [CrossRef]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. An adaptive Bayesian approach to surrogate-assisted evolutionary multi-objective optimization. Inf. Sci. 2020, 519, 317–331. [Google Scholar] [CrossRef]
Snoek, J.; Larochelle, H.; Adams, R.P. Practical Bayesian optimization of machine learning algorithms. In Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS’12, Lake Tahoe, NV, USA, 3–6 December 2012; Volume 2, pp. 2951–2959. [Google Scholar]
Stanton, S.; Maddox, W.; Gruver, N.; Maffettone, P.; Delaney, E.; Greenside, P.; Wilson, A.G. Accelerating Bayesian Optimization for Biological Sequence Design with Denoising Autoencoders. In Proceedings of the 39th International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S., Eds.; PMLR: New York, NY, USA, 2022; Volume 162, pp. 20459–20478. [Google Scholar]
Griffiths, R.R.; Hernández-Lobato, J.M. Constrained Bayesian optimization for automatic chemical design using variational autoencoders. Chem. Sci. 2020, 11, 577–586. [Google Scholar] [CrossRef] [PubMed]
Daulton, S.; Cakmak, S.; Balandat, M.; Osborne, M.A.; Zhou, E.; Bakshy, E. Robust multi-objective bayesian optimization under input noise. In Proceedings of the International Conference on Machine Learning, Baltimore, MD, USA, 17–23 July 2022; PMLR: New York, NY, USA, 2022; pp. 4831–4866. [Google Scholar]
Takeno, S.; Fukuoka, H.; Tsukada, Y.; Koyama, T.; Shiga, M.; Takeuchi, I.; Karasuyama, M. Multi-fidelity Bayesian optimization with max-value entropy search and its parallelization. In Proceedings of the 37th International Conference on Machine Learning, ICML’20, Virtual Event, 12–18 July 2020; JMLR.org: New York, NY, USA, 2020. [Google Scholar]
Moriconi, R.; Deisenroth, M.P.; Sesh Kumar, K. High-dimensional Bayesian optimization using low-dimensional feature spaces. Mach. Learn. 2020, 109, 1925–1943. [Google Scholar] [CrossRef]
Kathuria, T.; Deshpande, A.; Kohli, P. Batched Gaussian process bandit optimization via determinantal point processes. In Proceedings of the 30th International Conference on Neural Information Processing Systems, NIPS’16, Barcelona, Spain, 5–10 December 2016; pp. 4213–4221. [Google Scholar]
Schulz, E.; Speekenbrink, M.; Krause, A. A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J. Math. Psychol. 2018, 85, 1–16. [Google Scholar] [CrossRef]
Jasrasaria, D.; Pyzer-Knapp, E.O. Dynamic Control of Explore/Exploit Trade-Off in Bayesian Optimization. In Intelligent Computing, Proceedings of the 2018 Computing Conference, London, UK, 10–12 July 2018; Arai, K., Kapoor, S., Bhatia, R., Eds.; Springer: Cham, Switzerland, 2019; pp. 1–15. [Google Scholar]
Zhan, D.; Xing, H. Expected improvement for expensive optimization: A review. J. Glob. Optim. 2020, 78, 507–544. [Google Scholar] [CrossRef]
Kaufmann, E.; Cappe, O.; Garivier, A. On Bayesian Upper Confidence Bounds for Bandit Problems. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, La Palma, Spain, 21–23 April 2012; Lawrence, N.D., Girolami, M., Eds.; Proceedings of Machine Learning Research: New York, NY, USA, 2012; Volume 22, pp. 592–600. [Google Scholar]
Henrández-Lobato, J.M.; Hoffman, M.W.; Ghahramani, Z. Predictive entropy search for efficient global optimization of black-box functions. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’14, Cambridge, MA, USA, 8–13 December 2014; Volume 1, pp. 918–926. [Google Scholar]
Wang, Z.; Jegelka, S. Max-value entropy search for efficient Bayesian Optimization. In Proceedings of the 34th International Conference on Machine Learning, ICML’17, 6–11 August 2017; JMLR.org: New York, NY, USA, 2017; Volume 70, pp. 3627–3635. [Google Scholar]
Teufel, F.; Stahlhut, C.; Ferkinghoff-Borg, J. Batched energy-entropy acquisition for Bayesian optimization. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS’24, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
Wu, J.; Frazier, P. The Parallel Knowledge Gradient Method for Batch Bayesian Optimization. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2016; Volume 29. [Google Scholar]
Wilson, J.; Hutter, F.; Deisenroth, M. Maximizing acquisition functions for Bayesian optimization. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Shah, A.; Ghahramani, Z. Parallel Predictive Entropy Search for Batch Global Optimization of Expensive Objective Functions. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
Hastings, W.K. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 1970, 57, 97–109. [Google Scholar] [CrossRef]
Dick, J.; Kuo, F.Y.; Sloan, I.H. High-dimensional integration: The quasi-Monte Carlo way. Acta Numer. 2013, 22, 133–288. [Google Scholar] [CrossRef]
Kingma, D.P.; Salimans, T.; Welling, M. Variational Dropout and the Local Reparameterization Trick. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Cortes, C., Lawrence, N., Lee, D., Sugiyama, M., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2015; Volume 28. [Google Scholar]
Hennig, P.; Schuler, C.J. Entropy search for information-efficient global optimization. J. Mach. Learn. Res. 2012, 13, 1809–1837. [Google Scholar]
Kraskov, A.; Stögbauer, H.; Grassberger, P. Estimating mutual information. Phys. Rev. E 2004, 69, 066138. [Google Scholar] [CrossRef] [PubMed]
Moss, H.B.; Leslie, D.S.; Gonzalez, J.; Rayson, P. GIBBON: General-purpose Information-Based Bayesian Optimisation. J. Mach. Learn. Res. 2021, 22, 10616–10664. [Google Scholar]
Jayasumana, S.; Hartley, R.; Salzmann, M.; Li, H.; Harandi, M. Kernel Methods on Riemannian Manifolds with Gaussian RBF Kernels. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 2464–2477. [Google Scholar] [CrossRef] [PubMed]
Balandat, M.; Karrer, B.; Jiang, D.R.; Daulton, S.; Letham, B.; Wilson, A.G.; Bakshy, E. BOTORCH: A framework for efficient monte-carlo Bayesian optimization. In Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS’20, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Eriksson, D.; Jankowiak, M. High-dimensional Bayesian optimization with sparse axis-aligned subspaces. In Proceedings of the Thirty-Seventh Conference on Uncertainty in Artificial Intelligence, Online, 27–30 July 2021; de Campos, C., Maathuis, M.H., Eds.; PMLR: New York, NY, USA, 2021; Volume 161, pp. 493–503. [Google Scholar]

Figure 1. Performance achieved by each algorithm on the Ackley function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 2. Performance achieved by each algorithm on the Rastrigin function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 3. Performance achieved by each algorithm on the Levy function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 4. Performance achieved by each algorithm on the Shekel function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 5. Performance achieved by each algorithm on the embedded Hartmann function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 6. Performance achieved by each algorithm on the Cosine function. (a) The mean best objective value with shaded bounds across different runs. (b) The mean objective values with shaded bounds among Q points.

Figure 7. Performance achieved by each algorithm on Ackley, Rastrigin, and Levy functions. (a) The best objective value found so far. (b) The best objective value found so far. (c) The best objective value found so far.

Table 1. Summary of test problems.

Problems	Dimension (d)	Optimization Type	Global Optimum
Ackley	10	Minimization	0
Rastrigin	10	Minimization	0
Levy	10	Minimization	0
Shekel-10	4	Minimization	−10.536443
Hartmann	6	Minimization	−3.32237
Cosine	8	Maximization	0.8

Table 2. Highest observed value after 10 rounds of BO with

Q = 100

. The best value is highlighted in bold, and both Mean and Median values are computed across test problems.

Table 2. Highest observed value after 10 rounds of BO with

Q = 100

. The best value is highlighted in bold, and both Mean and Median values are computed across test problems.

Problem	d	MOBEEBO	BEEBO	q-UCB	q-EI
Ackley	10	0.973	0.836 +	0.732 +	0.522 +
Rastrigin	10	0.523	0.497 +	0.449 +	0.385 +
Levy	10	0.985	0.961 +	0.894 +	0.905 +
Shekel-10	4	0.828	0.342 +	0.303 +	0.130 +
Hartmann	6	0.996	1.000−	0.938 +	0.955 +
Cosine	8	1.000	1.000 +	0.968 +	0.836 +
Mean		0.8842	0.7727	0.714	0.6222
Median		0.979	0.8985	0.813	0.679

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, H.; Wang, X. Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization. Mathematics 2025, 13, 2894. https://doi.org/10.3390/math13172894

AMA Style

Zhu H, Wang X. Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization. Mathematics. 2025; 13(17):2894. https://doi.org/10.3390/math13172894

Chicago/Turabian Style

Zhu, Hangyu, and Xilu Wang. 2025. "Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization" Mathematics 13, no. 17: 2894. https://doi.org/10.3390/math13172894

APA Style

Zhu, H., & Wang, X. (2025). Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization. Mathematics, 13(17), 2894. https://doi.org/10.3390/math13172894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Objective Batch Energy-Entropy Acquisition Function for Bayesian Optimization

Abstract

1. Introduction

2. Related Work

3. The MOBEEBO Acquisition Function

4. Experiments

4.1. Experimental Settings

4.2. Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI