Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization

Santos Almonte, Carol; Sanchez Jimenez, Oscar; Souza de Cursi, Eduardo; Pagnacco, Emmanuel

doi:10.3390/a18090557

Open AccessArticle

Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization

INSA Rouen Normandie, Normandie Univ, LMN UR 3828, F-76000 Rouen, France

^*

Author to whom correspondence should be addressed.

Algorithms 2025, 18(9), 557; https://doi.org/10.3390/a18090557

Submission received: 31 July 2025 / Revised: 29 August 2025 / Accepted: 31 August 2025 / Published: 3 September 2025

(This article belongs to the Special Issue Mathematical Modelling in Engineering and Human Behaviour (3rd Edition))

Download

Browse Figures

Versions Notes

Abstract

We propose a variant of Bayesian optimization in which probability distributions are constructed using uncertainty quantification (UQ) techniques. In this context, UQ techniques rely on a Hilbert basis expansion to infer probability distributions from limited experimental data. These distributions act as prior knowledge of the search space and are incorporated into the acquisition function to guide the selection of enrichment points more effectively. Several variants of the method are examined, depending on the distribution type (normal, log-normal, etc.), and benchmarked against traditional Bayesian optimization on test functions. The results show competitive performance, with selective improvements depending on the problem structure, and faster convergence in specific cases. As a practical application, we address a structural shape optimization problem. The initial geometry is an L-shaped plate, where the goal is to minimize the volume under a horizontal displacement constraint expressed as a penalty. Our approach first identifies a promising region while efficiently training the surrogate model. A subsequent gradient-based optimization step then refines the design using the trained surrogate, achieving a volume reduction of more than

30 %

while satisfying the displacement constraint, without requiring any additional evaluations of the objective function.

Keywords:

Bayesian optimization; Hilbert expansions; Bayesian inference; uncertainty quantification

1. Introduction

Optimization of expensive-to-evaluate black-box functions is a problem found in a wide range of technological and engineering applications [1,2,3]. Two broad application fields include the tuning of Machine Learning (ML) and general Artificial Intelligence (AI) models [4,5,6], as well as multiple forms of engineering design optimization, such as shape, topology, and material optimization, spanning from structural to electronic engineering [7,8,9]. In both fields, complex computational models are required to evaluate the objective or performance function, resulting in elevated computation time for each evaluation. Additionally, these models behave as black-boxes, meaning that general properties of the underlying model (such as its derivative or gradient) are not available.

Bayesian Optimization (BO) is an optimization framework particularly well-suited to address the aforementioned black-box, expensive-to-evaluate optimization problem [10,11,12,13,14,15].

This article presents an algorithm that combines the advantages of classical Bayesian optimization with prior distributions constructed from an initial design of experiment (DoE), based on uncertainty quantification (UQ) techniques. While both the metamodel and prior distributions utilize the same DoE, they encode complementary information about the problem. The metamodel seeks to approximate the objective function across the entire domain, leveraging spatial correlations between samples. In contrast, the prior distributions quantify the search space in a Bayesian sense by identifying regions where the objective is likely to attain good or bad performance contingent on the observations.

In engineering contexts, expert priors are often unavailable or difficult to justify, particularly in early design phases. In such cases, investing in a small exploratory DoE enables data-driven prior construction that can guide the optimization process efficiently from the outset, as demonstrated in this work.

The organization of this article is the following: Section 2 reviews the fundamentals of Bayesian optimization and the original BOPrO scheme. Section 3 details the construction of the UQ-based performance prior and its integration into the complete algorithm. Section 4 presents numerical results from six academic test functions and a case study from an engineering application, highlighting the applicability of the proposed method. Finally, Section 5 summarizes the conclusions and outlines future research directions.

2. Background

We are interested in solving the following global optimization problem:

x^{*} = arg min \{f (x) ∣ x \in X \subset R^{d}\},

(1)

where

f : X \subset R^{d} \to R

is the function to be minimized.

Sequential Bayesian estimation gives rise to a global optimization framework, commonly referred to as BO, which is particularly suited for problems such as the one described above. BO relies on the fundamental principle of Bayesian updating. A typical iteration of a Bayesian Optimization Algorithm (BOA) proceeds as follows [16]:

Generate a surrogate model of the objective function f, usually under the form of a stochastic process.
Select a new point to evaluate the true objective function.
Update the surrogate model using the Bayesian update, the pair of the new point and the value of the objective function in it.

2.1. Standard Bayesian Optimization (SBO)

The most commonly used surrogate model in BOA is the

GP

[15,17,18,19,20,21]. A

GP

generalizes the multivariate Gaussian distribution to an infinite-dimensional stochastic process, where any finite collection of function evaluations follows a joint multivariate normal distribution. Just as a standard Gaussian distribution describes a random variable by its mean and covariance, a

GP

defines a distribution over functions, fully specified by a mean function

m (x)

and a covariance (or kernel) function

k (x, x^{'})

[19]:

f \sim G P (m, k) .

(2)

In practice, when the

GP

is evaluated at a finite set of input locations

{x_{1}, \dots, x_{n}}

, the corresponding function values follow a joint multivariate normal distribution:

f \sim N (m, K)

, where

m = {(m (x_{1}), \dots, m (x_{n}))}^{⊤}

is often set to zero and the covariance matrix

K

has elements

K_{i, j} : = k (x_{i}, x_{j})

. Equation (2) thus represents the prior distribution

p (f)

induced by the

GP

, encoding assumptions about the smoothness and structure of the underlying objective function.

The choice of a

GP

greatly simplifies the Bayesian update procedure, as the

GP

is conjugate to itself. This property ensures that the posterior distribution remains a

GP

, and that the update merely involves a straightforward modification of the mean and covariance functions [16]. More precisely, we assume that a set of observations,

D_{t} = {\{(x_{i}, y_{i})\}}_{i = 1}^{t}

, has already been collected from previous iterations, where the outputs are noise-free evaluations of a deterministic function, i.e.,

y_{i} = f (x_{i})

. Let

y = {[y_{1}, \dots, y_{t}]}^{T}

denote the vector of corresponding observations. Then, Bayes’ theorem yields the posterior distribution [19]:

p (f ∣ D_{t}) \propto p (D_{t} ∣ f) \cdot p (f),

(3)

where the posterior distribution,

f ∣ D_{t} \sim GP (μ_{t}, k_{t})

, remains a

GP

due to the aforementioned conjugacy. This updated distribution leads to the following expression for the predictive distribution at any next evaluation point

x

[19]:

f (x) ∣ D_{t} \sim N (μ (x), σ^{2} (x)),

(4)

with the predictive mean and variance given by

\begin{matrix} μ (x) & = k^{T} K^{- 1} y, \\ σ^{2} (x) & = k (x, x) - k^{T} K^{- 1} k, \end{matrix}

(5)

where

k = [k (x, x_{1}), \dots, k (x, x_{t})]

.

To select the next point

x

, in BOA, one must solve an auxiliary optimization problem involving the maximization of an acquisition function

α

. One of the most widely used acquisition functions is the expected improvement (EI). Let

f_{★}

denote the best (i.e., lowest) function value observed up to iteration t, and assume that the predictive distribution at

x

follows Equations (4) and (5). Then, the EI at a candidate point

x

is defined as follows [16]:

α_{E I} (x) = E [u_{E I} (x) ∣ D_{t}], with u_{E I} (x) = max \{0, f_{★} - f (x)\} .

(6)

Another popular acquisition function is the probability of improvement (PI):

α_{P I} (x) = P (f (x) < f_{★} ∣ D_{t}),

(7)

which can also be written as

α_{P I} (x) = E [u_{P I} (x) ∣ D_{t}], with u_{P I} (x) = \{\begin{matrix} 1 & if f (x) < f_{★}, \\ 0 & otherwise . \end{matrix}

(8)

Thanks to the properties of the

GP

predictive posterior, closed-form expressions can be obtained for both EI and PI. Let

Φ

and

φ

denote, respectively, the standard normal cumulative distribution function and the standard normal probability density function

N (0, 1)

. Since the predictive distribution at a candidate point

x

is Gaussian with mean

μ (x)

and standard deviation

σ (x)

, the EI can be expressed as [16]:

α_{E I} (x) = σ (x) \cdot [z (x) \cdot Φ (z (x)) + φ (z (x))],

(9)

with

z (x) = \frac{f_{★} - μ (x)}{σ (x)} .

(10)

Similarly, the PI is given by

α_{P I} (x) = Φ (z (x)) .

(11)

To illustrate how the SBO works in practice, we will consider a one-dimensional synthetic test function proposed by Gramacy and Lee [22], presented in Figure 1. Figure 1a shows the objective function, and Figure 1b shows the optimization process at iteration 1, which includes the predictive mean of

GP

, the associated uncertainty, and the next point to be evaluated, which represents the maximum of the acquisition function.

2.2. BOPrO

Bayesian optimization with a prior for the optimum (BOPrO), proposed by Souza et al. [23], is a recently developed version of BO inspired by the Tree-structured Parzen Estimator (TPE) algorithm [24]. It allows users to introduce intuitive domain knowledge into the optimization process through priors. These priors do not modify the surrogate model (e.g., a

GP

). Instead, they are combined with the probability of improvement (PI) to form a pseudo-posterior or posterior pseudo-density. In this context, a pseudo-posterior is a non-normalized posterior, or also a density that does not integrate to 1 over the prescribed domain. The pseudo-posteriors act as weighting functions in the Bayesian updating. They are integrated into the acquisition function, particularly in the calculation of a modified expected improvement, which implies that the priors directly adjust the importance assigned to different regions of the search space in the EI evaluation. Consequently, regions favored by a prior are more likely to be explored at the beginning of the iteration process. As the iterations progress, the effect of the prior decreases, and the predictive model’s influence dominates the search.

The mathematical expression corresponding to the modified version of the expected improvement criterion, adopted by BOPrO to effectively integrate the pseudo-posterior into the process of selecting new points, is formally derived from the TPE framework. It reads as follows:

E I_{f_{γ}} (x) \propto {(γ + \frac{b (x)}{g (x)} (1 - γ))}^{- 1},

(12)

where g and b are non-normalized density functions describing, respectively, the promising and less promising areas of the search space. The

γ

value acts as a quantile threshold that determines at which point a function value can be considered good.

The main contribution of BOPrO is the practical implementation of this pseudo-posterior to quantify performance.In the case of

g (x)

, a prior distribution

p_{g} (x)

is integrated, reflecting the user’s belief about promising regions, together with a probability derived from the model

M_{g} (x)

, which represents the possibility, under the surrogate

GP

, that the value of the function

x

falls below a threshold

f_{γ}

. We can express this pseudo-posterior as follows:

g (x) \propto p_{g} (x) \cdot M_{g} {(x)}^{\frac{t}{β}} .

(13)

The density for the unpromising regions is given by

b (x) \propto p_{b} (x) \cdot M_{b} {(x)}^{\frac{t}{β}},

(14)

where

p_{b} (x) = 1 - p_{g} (x)

represents the prior for bad regions, t is the current iteration, and

β

is a hyperparameter that affects the balance between prior and model importance, with higher

β

values prioritizing the prior and requiring more data to overrule it [23].

The probability

M_{g}

is defined as follows:

M_{g} (x) = Φ (\frac{f_{γ} - μ (x)}{σ (x)}),

(15)

and the probability for unpromising regions is given by

M_{b} (x) = 1 - M_{g} (x) .

(16)

Algorithm 1 describes the optimization process carried out by BOPrO in detail.

Algorithm 1: BOPrO Algorithm.

D_{t}

keeps track of all function evaluations so far:

{(x_{i}, y_{i})}_{i = 1}^{t}

[23].

3. Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization

3.1. Preliminaries

The BOPrO framework is a technical development adapted for machine learning applications, with the properties of the algorithm explicitly crafted for this type of problem. The specification of the performance priors is designed to facilitate its use by an ML expert with domain knowledge in hyperparameter tuning. The goal is to provide a practical tool to codify this expert knowledge into a prior that performs the following:

Describes the belief that some areas of the search space are more (or less) promising.
Can be combined with the information available, both from model evaluations and from surrogate model predictions. This formulation poses significant limitations when the domain of application is different, such as in structural design or general engineering design optimization. Notably, in such applications, expert knowledge may not be available, or it may not be sufficiently reliable, or the expert may provide input at a different stage of the process.

Our proposed method, Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization (HSBO), takes some of the central ideas in BOPrO (and the TPE algorithm). In particular, we retain the adapted version of the EI acquisition function and the structure of the priors over the search space, informing the early stages of the optimization iteration process. However, unlike the previous work, we replace the expert knowledge codified in the performance priors by systematic and reproducible distributions, learned directly from initial observations. To achieve this, we follow the methodology introduced by Souza et al. [25], who proposed two approaches based on Hilbertian representations to construct distributions: the first relying on De Finetti’s Representation Theorem, and the second directly providing UQ-based priors suitable for Bayesian procedures. In our framework, we specifically adopt the latter approach, which is detailed in Section 5.3 of the book [16].

3.2. Construction of Priors

The Hilbert Approach (HA), which can be viewed as an extension of Polynomial Chaos Approximations (PCA), can be applied under the following basic conditions:

The uncertainties can be modeled by a random vector U and the variable of interest is $X = X (U)$ ;
The random variable U takes its values on an interval $Ω$ of real numbers (or, more generally, on a bounded subset $Ω$ of $R^{n}$ );
X belongs to the space of square summable random variables: $E (X^{2}) < \infty$ , id est, X has finite moment of order 2;
Some statistical information about the couple $(U, X)$ is available.

These assumptions ensure that X can be represented by an infinite expansion involving a convenient orthonormal Hilbert basis

{\{φ_{i}\}}_{i \in N} \subset L^{2} (Ω)

, allowing one to represent X as [25,26]

X = \sum_{i \in N} x_{i} φ_{i} (U) .

(17)

As previously observed, U is a suitable random variable used to construct a stochastic representation of X. Nevertheless, in practical situations, U can be an artificial variable, conveniently chosen [16,27]. Artificial variables are used, namely, when no information is available about the source of randomness: then, U is chosen by the user, and it is mandatory to create an artificial correlation between samples of the artificial variable and samples of X. For instance, we can arrange both in an increasing order:

X_{1} \leq X_{2} \leq \dots \leq X_{n}

, and

U_{1} \leq

U_{2} \leq \dots \leq U_{n}

—this generates a non-negative covariance between X and the artificial U.

In practice, the expansion is truncated at order k, yielding the following approximation:

X = \sum_{i \in N} x_{i} φ_{i} (U) \approx P_{k} X = \sum_{i = 0}^{k} x_{i} φ_{i} (U) .

(18)

The finite representation

P_{k} X

is determined by finding the coefficients

x = {(x_{0}, x_{1}, \dots, x_{k})}^{t}

. A popular method for the determination of

x

is collocation, which considers a sample

U = (U_{1}, \dots, U_{n})

and solves the linear system [27]:

M x = N, M_{i j} = φ_{j} (U_{i}), N_{i} = X_{i}, 1 \leq i \leq n, 0 \leq j \leq k

(19)

The reader can find in the literature works using such an approach [27,28]. In general,

k + 1 < n

(the system is overdetermined) and Equation (19) must be solved by a least squares approach. Once

P_{k} X

is determined, the representation can be used to generate the distribution of X. For instance, we can generate a large sample of variates from U (eventually, a deterministic one):

U_{g} = (U_{1}, \dots, U_{n g})

(20)

and use it to generate a large sample

X_{g}

from X as

X_{i} = P_{k} X (U_{i})

,

i = 1, \dots, n g

. Then,

χ_{g}

is used to estimate the cumulative distribution function (CDF) and probability density function (PDF) of X, by using the empirical CDF of the large sample.

From the Bayesian standpoint, estimates can be obtained by minimizing a loss function defined over the estimated PDF. This approach can be used to define a univariate parametric prior distribution

p_{\hat{θ}} (x)

, where the parametric family (e.g., Gaussian, log-normal) is selected by the user.

In the case of a multidimensional input, we assume independence between dimensions, so that the prior multivariate distribution is constructed as the product of the univariate marginals:

p (x) = \prod_{j = 1}^{d} p_{{\hat{θ}}_{j}} (x_{j}) .

(21)

This independence hypothesis, although simplistic and not necessarily valid in all cases, is adopted here as a reasonable working assumption to facilitate model construction and advance the validation of the proposed ideas. If independence among variables is not assumed, the prior definition and estimation become more complex and computationally expensive, as the joint distributions can no longer be expressed as a product of univariate marginals. This approach requires the explicit modeling of the dependence structure among variables, a task that can be achieved by introducing copula models or parametric multivariate distributions. The use of copulas requires careful consideration of the selected parametric family. In the case of multivariate distributions, the number of parameters that need to be estimated drastically increases with the dimension of the problem (e.g., calculating a full covariance matrix and potentially joint higher-order moments), and the estimation of these parameters from limited data is in itself a difficult task.

The full optimization procedure is detailed in Algorithm 2.

Note that two quantile thresholds are used in our method:

f_{γ_{g b}}

is fixed at the initialization stage to split the DoE into good and bad subsets, while

f_{γ}

is recomputed at each iteration for the probability of improvement in the acquisition function.

Algorithm 2: Data-driven prior construction in Hilbert spaces for Bayesian optimization

Let us return to the case study introduced in Section 2, this time applying the method we propose. The objective is to visualize the interaction between the different elements of the framework: the surrogate model, the acquisition function, the prior distributions, and the update points. Figure 2a shows the true function

f (x)

along with the DoE, which is the same one used for the SBO example (see Figure 1b). Figure 2b shows the priors

p_{g} (x)

and

p_{b} (x)

, constructed with the Hilbertian approach. Figure 2c illustrates the prediction and uncertainty band associated with the

GP

model. Figure 2d shows the probability models

M_{g} (x)

and

M_{b} (x)

, which use the information provided by the

GP

to quantify the probability that a given input belongs to the good or bad region. Figure 2e shows the weighted scores

g (x)

and

b (x)

obtained by combining the priors with the probability model. We observe that some regions considered promising by the PI are penalized because they belong to the bad prior. In contrast, other regions gain importance as they are reinforced by the good prior. Finally, Figure 2f shows the EI, where the maximum no longer coincides with that of SBO. This change is a direct consequence of the integration of performance priors, which guide the algorithm to select a different update point than SBO.

4. Results and Discussion

In this section, we test our proposed method on six test functions and one engineering application. Our results are contrasted with SBO as a baseline. The test functions have been chosen to explore the general behavior of the method, even though they are not expensive black-box functions. The application is a good representative of the kind of problem where BO is expected to excel.

Our focus here is exploratory, as we seek to identify an interesting behavior of the proposed method against SBO, and with varied objective functions. We have thus selected performance metrics that reflect a mix of robustness (success rate), reliability (proximity to the optimum), and convergence speed (simple regret plots); moreover, particular attention is given to the variability of the method given different instances of the DoE, leading us to report extremes, averages, and variances of each instance of our method for several runs.

4.1. Test Design and Technical Aspects

HSBO is evaluated by identifying the global minima of a set of test functions from the Virtual Library of Simulation Experiments [22].

Table A1 in Appendix A shows the mathematical expression, the problem size, the search space, the known global minimum, and the threshold considered as the success criterion. This threshold is defined by applying a relative error criterion of

5 %

, using the following formula:

target = global minimum + 0.05 \times ∣ global minimum ∣ .

(22)

In the particular case of the Sum of Different Powers function, whose global minimum is zero, a target equal to

10^{- 3}

is arbitrarily set to avoid zero values.

Different families of parametric distributions (e.g., Gaussian, log-normal, etc.) are fitted to the parameters estimated using the Hilbert approximation implemented in the proposed method and compared with SBO. Ten runs of the code are performed, each with a randomly generated DoE. An optimization budget equal to 15× dimension is set. The size of the DoE is set to 10× problem dimension. The hyperparameter

β

was assigned values between 10 and 100. The first type of threshold

γ

takes values between 0.01 and 0.05, while the second type of threshold

γ_{g b} = 0.5

.

4.2. Benchmark Problems

Figure 3 provides an overall view of the performance of the proposed method under different prior distributions across six benchmark functions. Figure 3a shows the results obtained for the Gramacy and Lee function. Both SBO and HSBO with the Pearson distribution perform well. Figure 3b shows the results obtained for the Cross-in-Tray function. The normal and log-normal prior distributions offer better performance compared to the SBO reference. Although SBO shares a low median, it has an outlier, indicating that in at least one run it deviated significantly from the optimum. Figure 3c shows the results obtained for the Branin function. Although the prior-based configurations show competitive performance, in this particular case they fail to exceed the baseline (SBO). Figure 3d shows the results for the Hartmann 3D function. Here, the baseline and the HSBO configurations with different prior distributions perform very similarly. The medians and dispersion of the relative errors are practically equivalent, suggesting that, in this case, none of the approaches offers a significant advantage over the others. Figure 3e illustrates the distribution of final function values across runs. The box plots clearly show the stability of SBO, together with HSBO using normal and Pearson priors, which remain concentrated near zero. In contrast, log-normal, exponential, and Rayleigh priors exhibit wider spreads and higher median values, reflecting reduced robustness. As illustrated in Figure 3f, none of the methods is able to reach the global minimum of the Hartmann 6D function within 60 iterations. All priors display similar performance, with only slight variations in dispersion.

In addition to the box plot presented in Figure 3a–f, we provide complementary numerical results in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6. For each test function and each prior distribution used in the HSBO framework, we indicate the following:

Maximum and minimum values reached by the function (corresponding to the 10 runs) obtained in the last iteration. This allows us to capture the worst and best performance of the algorithm.
Mean of the function values at the last iteration (corresponding to the 10 runs) and the standard deviation of these values.
Success rate, defined as the number of executions in which the function reached a value below the target.

These statistics provide additional information about the stability and robustness of our proposal, allowing us to evaluate how often a method successfully identifies a near-optimal solution.

While the numerical results presented in the tables and box plots report the quantitative results of each test case, Table 7 summarizes the general patterns observed across all test cases, highlighting which priors tend to perform best depending on the characteristics of the different problems. This general description can serve as a practical guide when selecting prior distributions in new applications.

In addition to the main performance indicators presented through box plots and summary tables for the benchmark functions, we also reported the evolution of the simple regret metric. Following the definition in [29,30], the simple regret at iteration t is expressed as

{\hat{r}}_{t}^{+} = f (x^{*}) - min_{x_{t} \in D_{1 : t}} f (x_{t}),

(23)

where

x^{*}

denotes the global optimum of the objective function and

D_{1 : t}

is the set of evaluated points up to iteration t. The mean simple regret over 10 independent runs is plotted on a logarithmic scale in Appendix B, providing an additional perspective on the convergence behavior of the algorithms across the sequence of iterations.

4.3. Application: Shape Optimization of a Solid in Linear Elasticity Under Uniaxial Loading

A structural optimization problem is addressed in which the initial geometry, which is L-shaped, is parameterized using isogeometric analysis (IGA) on a finite element mesh constructed from non-uniform rational B-spline (NURBS) surfaces [31]. The mechanical solver used is based on a MATLAB R2024b implementation of IGA in 2D linear elasticity, originally developed by Vinh Phu Nguyen (Johns Hopkins University) for educational and academic purposes.

The numerical model is based on the open-source code MIGFEM by Vinh Phu Nguyen (Johns Hopkins University/Monash University), available on the author’s GitHub online repository (See reference [31] for details).

This MATLAB code implements linear isogeometric analysis (IGA) for 1D, 2D, and 3D elasticity problems, including

h / p / k

refinement and Bézier extraction. This implementation includes the construction of stiffness matrices from NURBS, adaptive h refinement, and the treatment of natural and essential boundary conditions.

This code has been adapted to incorporate a shape optimization problem, in which the geometry of the L-shaped plate is perturbed by three design parameters that vertically modify certain NURBS control points. The design vector is denoted as follows:

x = [\begin{matrix} ε_{y 1} \\ ε_{y 2} \\ ε_{y 3} \end{matrix}] \in X \subset R^{3}

where each variable

ε_{y i}

represents a vertical disturbance applied to a NURBS control point.

Figure 4 summarizes the reference mechanical problem and its numerical setup. Figure 4a illustrates the L-shaped domain and boundary conditions, where uniaxial tension is applied on the right edge while the left and bottom sides are clamped. The control points in Figure 4b parameterize the geometry and serve as design variables in the optimization process. The initial mesh shown in Figure 4c guarantees element quality before deformation, and the displacement field in Figure 4d reveals the expected stress concentration at the re-entrant corner.

The goal is to minimize the total volume while ensuring that the average horizontal displacement matches a prescribed target. The optimization problem is therefore formulated as follows:

min_{x \in X} f (x) : = V (x) + λ \cdot |{\bar{u}}_{x} (x) - u_{target}|

where

$V (x)$ is the total volume of the generated structure;
${\bar{u}}_{x} (x) = \frac{1}{| Γ_{D} |} \sum_{i \in Γ_{D}} u_{x}^{(i)}$ is the average horizontal displacement on the right boundary $Γ_{D}$ ;
$u_{target}$ =0.0013 is the target displacement;
$λ = 100$ is a regularization coefficient.

Each evaluation of

f (x)

requires solving an isogeometric elasticity FE problem:

K (x) u (x) = F,

where

K (x)

is the stiffness matrix depending on the geometry modified by

x

, and

u (x)

is the associated displacement field. This formulation guarantees mechanical consistency via the imposed equilibrium, while the volume term, although approximate, plays a regularizing role, guiding the solution toward more compact forms without explicitly restricting the space of admissible designs.

To initiate the optimization process, a DoE comprising 12 samples was generated using Latin hypercube sampling (LHS). The admissible set

X

is defined by the lower and upper bounds

{[- 0.5 0.5]}^{3}

. To evaluate the acquisition function at each iteration, a Monte Carlo sampling of

10^{6}

points was extracted from the design space. The hyperparameters of the proposed optimization algorithm were established as follows: the quantile threshold

γ = 0.01

was used in the definition of the probability of improvement, and the scale coefficient

β = 100

. Furthermore, to divide the DoE into good and bad subsets,

γ_{g b} = 0.5

response values were taken to construct prior distributions representing the promising and unpromising regions of the design space.

Figure 5 shows the volume evolution during the optimization. Starting from the original L-shaped design, the first 20 iterations with the proposed HSBO method progressively reduced the volume, particularly in the early steps. After this phase, the best HSBO solution and the trained Gaussian surrogate were used as the starting point for a local gradient-based optimization with Matlab’s fmincon (interior-point algorithm, default settings). This second stage successfully converged to the optimum, yielding a total volume reduction of about 30% compared to the initial design.

In terms of computational cost, HSBO required 12 functional evaluations to build the DoE, and 20 additional evaluations during the iterative process, for a total of 32. Since fmincon was applied to the Gaussian process surrogate, no additional functional evaluations were required during this refinement step. This hybrid approach illustrates HSBO’s versatility and efficiency in training the Gaussian surrogate with scarce data, achieving high accuracy in the region surrounding the optimum.

Figure 6 shows the results of the optimized L. Figure 6a shows the final positions of the three control points that act as design variables. Their vertical displacements generate the diagonal notch responsible for the volume reduction. Figure 6b shows the resulting refined NURBS mesh, which has uniform quality elements and a deformation field without local concentrations, confirming the mechanical consistency of the optimized shape. Figure 6c shows the contour of the horizontal displacement

u_{x}

, which exhibits a nearly linear gradient between the embedded edge and the right edge; the uniform color band on

Γ_{D}

confirms that the average displacement reaches the target

u_{target} = 1.3 \times 10^{- 3} m

with minimal dispersion.

5. Conclusions and Future Work

In this article, we have presented HSBO, a method that combines Bayesian–Hilbertian UQ modeling techniques within the framework of Bayesian optimization. The function of HA is to construct structured performance priors over the search space, identifying promising performance regions to guide the Bayesian enrichment process.

Numerical tests, carried out on various benchmark functions with varying characteristics and dimensions, reveal that the proposed method provides a systematic framework for constructing priors that can improve the classical performance of BO when the problem structure aligns with the selected parametric family. The performance improvement can manifest itself in the form of increased success rate, faster convergence, or greater proximity to the optimum with a fixed budget, which means that even in situations where the global minimum is not identified, the method converges to regions very close to the optimum, providing an excellent starting point for gradient-based refinement methods.

Future research can advance in several directions. First, explicit modeling of dependencies between data could be investigated to improve the construction of priors, as the current approach assumes independence. Second, sensitivity analyses of key hyperparameters, particularly

γ

and

β

, would be useful not only to confirm their influence but also to determine appropriate value ranges depending on the type of problem. Third, the adoption of alternative surrogate models could be explored to evaluate possible advantages over the current option. Finally, another promising avenue would be to extend the method to constrained optimization, configuring the prior distributions in such a way that they naturally respect the feasibility conditions.

Author Contributions

Conceptualization, E.P., E.S.d.C.; Formal analysis, C.S.A., O.S.J.; Investigation, C.S.A., O.S.J.; Methodology, C.S.A., O.S.J., E.P., E.S.d.C.; Software, C.S.A., O.S.J., E.S.d.C.; Supervision, E.P., E.S.d.C.; Visualization, C.S.A.; Writing-original draft, C.S.A.; writing-review and editing, O.S.J., E.P., E.S.d.C. All authors have read and agreed to the published version of the manuscript.

Funding

The first author received external funding from the Ministry of Education, Science and Technology (MESCyT) of the Dominican Republic within the framework of the Caliope program.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BO	Bayesian Optimization
$GP$	Gaussian Process
BOPrO	Bayesian Optimization with a Prior for The Optimum
BOA	Bayesian Optimization Algorithm
SBO	Standart Bayesian Optimization
TPE	Tree-structured Parzen Estimator
CDF	Cumulative Distribution Function
PDF	Probability Density Function
HSBO	Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization
DoE	Initial Design of Experiments
SR	Simple Regret
NURBS	Non-Uniform Rational B-Splines

Appendix A. Description of Test Functions

Table A1. Test cases.

Function	Dimension	Design Space	Minimun	Target
(F1) Gramacy and Lee
$f (x) = \frac{sin (10 π x)}{2 x} + {(x - 1)}^{4}$	1	$[0.5 2.5]$	−0.86901	−0.8256
(F2) Cross-in-Tray
$f (x) = - 0.0001 {(\|sin (x_{1}) sin (x_{2}) exp (\|100 - \frac{\sqrt{x_{1}^{2} + x_{2}^{2}}}{π}\|)\| + 1)}^{0.1}$	2	${[- 10 10]}^{2}$	−2.06261	−1.9595
(F3) Branin
$\begin{matrix} f (x) = a {(x_{2} - b x_{1}^{2} + c x_{1} - r)}^{2} + s (1 - t) cos (x_{1}) + s, where \\ a = 1, b = \frac{5.1}{4 π^{2}}, c = \frac{5}{π}, r = 6, s = 10, t = \frac{1}{8 π} \end{matrix}$	2	$[- 5 10] \times [0 15]$	0.397887	0.4178
(F4) Hartmann 3-Dimensional
$\begin{matrix} f (x) = - \sum_{i = 1}^{4} α_{i} exp (- \sum_{j = 1}^{3} A_{i j} {(x_{j} - P_{i j})}^{2}), where \\ α = {(1.0, 1.2, 3.0, 3.2)}^{T} \\ A = (\begin{matrix} 3.0 & 10 & 30 \\ 0.1 & 10 & 35 \\ 3.0 & 10 & 30 \\ 0.1 & 10 & 35 \end{matrix}) \\ P = 10^{- 4} (\begin{matrix} 3689 & 1170 & 2673 \\ 4699 & 4387 & 7470 \\ 1091 & 8732 & 5547 \\ 381 & 5743 & 8828 \end{matrix}) \end{matrix}$	3	${[0 1]}^{2}$	−3.86278	−3.6696
(F5) Sum of Different Powers Function
$f (x) = \sum_{i = 1}^{d} {\|x_{i}\|}^{i + 1}$	4	${[- 1 1]}^{4}$	0	0.001
(F6) Hartmann 6-Dimensional
$\begin{matrix} f (x) = - \sum_{i = 1}^{4} α_{i} exp (- \sum_{j = 1}^{6} A_{i j} {(x_{j} - P_{i j})}^{2}), where \\ α = {(1.0, 1.2, 3.0, 3.2)}^{T} \\ A = (\begin{matrix} 10 & 3 & 17 & 3.50 & 1.7 & 8 \\ 0.05 & 10 & 17 & 0.1 & 8 & 14 \\ 3 & 3.5 & 1.7 & 10 & 17 & 8 \\ 17 & 8 & 0.05 & 10 & 0.1 & 14 \end{matrix}) \\ P = 10^{- 4} (\begin{matrix} 1312 & 1696 & 5569 & 124 & 8283 & 5886 \\ 2329 & 4135 & 8307 & 3736 & 1004 & 9991 \\ 2348 & 1451 & 3522 & 2883 & 3047 & 6650 \\ 4047 & 8828 & 8732 & 5743 & 1091 & 381 \end{matrix}) \end{matrix}$	6	${[0 1]}^{6}$	−3.32237	−3.1563

Appendix B. Mean of Simple Regret

The Gramacy and Lee function (Figure A1a) shows that the configuration based on Pearson’s distribution stands out for its rapid convergence, achieving the lowest simple regret values in fewer iterations than the baseline and the other variants of the proposed method.

When analyzing the Cross-in-Tray function (Figure A1b), the normal and log-normal distributions offer clearly superior performance, achieving a considerably lower simple regret than the other methods, including SBO.

Figure A1. Mean simple regret on a logarithmic scale as a function of iteration count, for various benchmark functions. Results correspond to the average of 10 runs using random initial DoE.

Regarding the Branin function (Figure A1c), all methods tend to converge toward similar levels of simple regret during the first iterations. However, it can be observed that the baseline presents the fastest and most stable convergence, reaching a very low level of regret from approximately the tenth iteration onwards.

Turning to the Hartmann 3D function (Figure A1d), the SBO method shows the fastest convergence, reaching the lowest simple regret values in just a few iterations.

With the 4D version of the Sum of Different Powers function (Figure A1e), the configurations based on the normal and Pearson distributions converge slightly faster toward the target value of

10^{- 3}

. However, the difference with respect to SBO is not significant, since it also converges to the same level in a similar number of iterations.

Concerning the results for the Hartmann 6D function (Figure A1f), it can be seen that although SBO shows very rapid convergence in the first iterations, its progress soon stabilizes, while the log-normal configuration continues to improve gradually until it achieves the best overal performance.

References

Jones, D.R.; Schonlau, M.; Welch, W.J. Efficient global optimization of expensive black-box functions. J. Glob. Optim. 1998, 13, 455–492. [Google Scholar] [CrossRef]
Regis, R.G. Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Comput. Oper. Res. 2011, 38, 837–853. [Google Scholar] [CrossRef]
Kumagai, W.; Yasuda, K. Black-box optimization and its applications. In Innovative Systems Approach for Facilitating Smarter World; Springer: Singapore, 2023; pp. 81–100. [Google Scholar]
Ghanbari, H.; Scheinberg, K. Black-box optimization in machine learning with trust region based derivative free algorithm. arXiv 2017, arXiv:1703.06925. [Google Scholar] [CrossRef]
Pardalos, P.M.; Rasskazova, V.; Vrahatis, M.N. Black Box Optimization, Machine Learning, and No-Free Lunch Theorems; Springer: Berlin/Heidelberg, Germany, 2021; Volume 170. [Google Scholar]
Abreu de Souza, F.; Crispim Romão, M.; Castro, N.F.; Nikjoo, M.; Porod, W. Exploring parameter spaces with artificial intelligence and machine learning black-box optimization algorithms. Phys. Rev. D 2023, 107, 035004. [Google Scholar] [CrossRef]
Calvel, S.; Mongeau, M. Black-box structural optimization of a mechanical component. Comput. Ind. Eng. 2007, 53, 514–530. [Google Scholar] [CrossRef]
Lainé, J.; Piollet, E.; Nyssen, F.; Batailly, A. Blackbox optimization for aircraft engine blades with contact interfaces. J. Eng. Gas Turbines Power 2019, 141, 061016. [Google Scholar] [CrossRef]
Thalhamer, A.; Fleisch, M.; Schuecker, C.; Fuchs, P.F.; Schlögl, S.; Berer, M. A black-box optimization strategy for customizable global elastic deformation behavior of unit cell-based tri-anti-chiral metamaterials. Adv. Eng. Softw. 2023, 186, 103553. [Google Scholar] [CrossRef]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent advances in Bayesian optimization. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Wu, J.; Chen, X.Y.; Zhang, H.; Xiong, L.D.; Lei, H.; Deng, S.H. Hyperparameter optimization for machine learning models based on Bayesian optimization. J. Electron. Sci. Technol. 2019, 17, 26–40. [Google Scholar]
Cho, H.; Kim, Y.; Lee, E.; Choi, D.; Lee, Y.; Rhee, W. Basic enhancement strategies when using Bayesian optimization for hyperparameter tuning of deep neural networks. IEEE Access 2020, 8, 52588–52608. [Google Scholar] [CrossRef]
Do, B.; Zhang, R. Multi-fidelity Bayesian optimization in engineering design. arXiv 2023, arXiv:2311.13050. [Google Scholar] [CrossRef]
Frazier, P.I.; Wang, J. Bayesian optimization for materials design. In Information Science for Materials Discovery and Design; Springer: Cham, Switzerland, 2016; pp. 45–75. [Google Scholar]
Humphrey, L.; Dubas, A.; Fletcher, L.; Davis, A. Machine learning techniques for sequential learning engineering design optimisation. Plasma Phys. Control. Fusion 2023, 66, 025002. [Google Scholar] [CrossRef]
de Cursi, E.S. Uncertainty Quantification with R; Springer: Berlin/Heidelberg, Germany, 2024. [Google Scholar]
Mockus, J. Application of Bayesian approach to numerical methods of global and stochastic optimization. J. Glob. Optim. 1994, 4, 347–365. [Google Scholar] [CrossRef]
Frazier, P.I. A tutorial on Bayesian optimization. arXiv 2018, arXiv:1807.02811. [Google Scholar] [CrossRef]
Brochu, E.; Cora, V.M.; De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv 2010, arXiv:1012.2599. [Google Scholar] [CrossRef]
Shahriari, B.; Swersky, K.; Wang, Z.; Adams, R.P.; De Freitas, N. Taking the human out of the loop: A review of Bayesian optimization. Proc. IEEE 2015, 104, 148–175. [Google Scholar] [CrossRef]
Ben Yahya, A.; Ramos Garces, S.; Van Oosterwyck, N.; De Boi, I.; Cuyt, A.; Derammelaere, S. Mechanism design optimization through CAD-based Bayesian optimization and quantified constraints. Discov. Mech. Eng. 2024, 3, 21. [Google Scholar] [CrossRef]
Surjanovic, S.; Bingham, D. Virtual Library of Simulation Experiments: Test Functions and Datasets. 2013. Available online: https://www.sfu.ca/~ssurjano/optimization.html (accessed on 24 January 2025).
Souza, A.; Nardi, L.; Oliveira, L.B.; Olukotun, K.; Lindauer, M.; Hutter, F. Bayesian optimization with a prior for the optimum. In Proceedings of the Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, 13–17 September 2021; Proceedings, Part III 21. Springer: Berlin/Heidelberg, Germany, 2021; pp. 265–296. [Google Scholar]
Bergstra, J.; Bardenet, R.; Bengio, Y.; Kégl, B. Algorithms for hyper-parameter optimization. Adv. Neural Inf. Process. Syst. 2011, 24, 2546–2554. [Google Scholar]
de Cursi, E.S.; Fabro, A. On the Collaboration Between Bayesian and Hilbertian Approaches. In Proceedings of the International Symposium on Uncertainty Quantification and Stochastic Modeling, Fortaleza, Brazil, 30 July–4 August 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 178–189. [Google Scholar]
Bassi, M.; Souza de Cursi, J.E.; Ellaia, R. Generalized Fourier Series for Representing Random Variables and Application for Quantifying Uncertainties in Optimization. In Proceedings of the 3rd International Symposium on Uncertainty Quantification and Stochastic Modeling, Maresias, Brazil, 15–19 February 2016. [Google Scholar] [CrossRef]
De Cursi, E.S.; Sampaio, R. Uncertainty Quantification and Stochastic Modeling with Matlab; Elsevier: Amsterdam, The Netherlands, 2015. [Google Scholar]
Eldred, M.; Burkardt, J. Comparison of non-intrusive polynomial chaos and stochastic collocation methods for uncertainty quantification. In Proceedings of the 47th AIAA aErospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition, Orlando, FL, USA, 5–8 January 2009; p. 976. [Google Scholar]
AV, A.K.; Rana, S.; Shilton, A.; Venkatesh, S. Human-AI collaborative Bayesian optimisation. Adv. Neural Inf. Process. Syst. 2022, 35, 16233–16245. [Google Scholar]
Vakili, S.; Bouziani, N.; Jalali, S.; Bernacchia, A.; Shiu, D.s. Optimal order simple regret for Gaussian process bandits. Adv. Neural Inf. Process. Syst. 2021, 34, 21202–21215. [Google Scholar]
Nguyen, V.P.; Anitescu, C.; Bordas, S.P.; Rabczuk, T. Isogeometric analysis: An overview and computer implementation aspects. Math. Comput. Simul. 2015, 117, 89–116. [Google Scholar] [CrossRef]

Figure 1. Illustration of SBO process on the Gramacy and Lee [22] function. (a) The true test function

f (x)

to be minimized. (b) The surrogate Gaussian process model, with the initial design of experiments, the acquisition function’s selected next point, and the true function.

Figure 1. Illustration of SBO process on the Gramacy and Lee [22] function. (a) The true test function

f (x)

to be minimized. (b) The surrogate Gaussian process model, with the initial design of experiments, the acquisition function’s selected next point, and the true function.

Figure 2. Illustrative example of the optimization process of the proposed method. (a) True function

f (x)

with the DoE split into good points (

X_{g}

) and bad points (

X_{b}

), used for prior construction. (b) Data-driven priors

p_{g} (x)

and

p_{b} (x)

obtained from the good and bad subsets. (c) Surrogate model built using a

GP

, associated uncertainty, and the selected next evaluation point

x_{next}

. (d) Probability models

M_{g} (x)

and

M_{b} (x)

representing the likelihood of being a good or bad point, respectively. (e) Weighted scores

g (x)

and

b (x)

. (f) EI acquisition function, with the optimal next evaluation point.

Figure 2. Illustrative example of the optimization process of the proposed method. (a) True function

f (x)

with the DoE split into good points (

X_{g}

) and bad points (

X_{b}

), used for prior construction. (b) Data-driven priors

p_{g} (x)

and

p_{b} (x)

obtained from the good and bad subsets. (c) Surrogate model built using a

GP

, associated uncertainty, and the selected next evaluation point

x_{next}

. (d) Probability models

M_{g} (x)

and

M_{b} (x)

representing the likelihood of being a good or bad point, respectively. (e) Weighted scores

g (x)

and

b (x)

. (f) EI acquisition function, with the optimal next evaluation point.

Figure 3. Box plots illustrate the performance at the last iteration after 10 runs—each starting from a random DoE—of the proposed method under five different prior distributions (normal, log-normal, Rayleigh, exponential, and Pearson), with SBO used as the baseline. Results are reported for six benchmark functions. The vertical axis shows the relative error with respect to the global minimum, except for case (e), whose optimum is zero; there, the objective function value itself is plotted.

Figure 4. L-shape under uniaxial loading. (a) Mechanical problem: the lower right edge of the structure is subjected to a uniform tensile load, while the left and lower edges are restrained in the x and y directions, respectively. (b) Distribution of the control points defining the geometry. (c) Initial configuration of the computational domain. (d) Contour plot of the horizontal displacement in the x direction.

Figure 5. History of the objective function of volume for the L-shaped design problem. Volume is drastically reduced in the first iterations of the design and stabilizes after a reduction of approximately 30% from the initial design.

Figure 6. Results of the optimized L-shaped structure. (a) Final positions of the three control points acting as design variables. (b) Refined NURBS mesh of the optimized geometry. (c) Contour plot of the horizontal displacement component

u_{x}

.

Figure 6. Results of the optimized L-shaped structure. (a) Final positions of the three control points acting as design variables. (b) Refined NURBS mesh of the optimized geometry. (c) Contour plot of the horizontal displacement component

u_{x}

.

Table 1. Performance at final iteration (

f_{15}

) over 10 runs: Gramacy and Lee 1D function.

Table 1. Performance at final iteration (

f_{15}

) over 10 runs: Gramacy and Lee 1D function.

Method	Min ( $f_{15}$ )	Max ( $f_{15}$ )	Mean ( $f_{15}$ )	Std ( $f_{15}$ )	Success Rate
Method	Min ( $f_{15}$ )	Max ( $f_{15}$ )	Mean ( $f_{15}$ )	Std ( $f_{15}$ )	(f(x) < Target)
SBO (baseline)	−0.8690	−0.8492	−0.8648	$7.6 \times 10^{- 3}$	10/10
HSBO with normal prior	−0.8690	−0.6622	−0.8407	$6.5 \times 10^{- 2}$	9/10
HSBO with log-normal prior	−0.8690	−0.6619	−0.8271	$8.7 \times 10^{- 2}$	8/10
HSBO with exponential prior	−0.8690	−0.5185	−0.8099	$1.3 \times 10^{- 1}$	8/10
HSBO with Rayleigh prior	−0.8690	−0.5185	−0.7922	$1.3 \times 10^{- 1}$	7/10
HSBO with Pearson prior	−0.8690	−0.8492	−0.8648	$7.6 \times 10^{- 3}$	10/10

Table 2. Performance at final iteration (

f_{30}

) over 10 runs: Cross-in-Tray 2D function.

Table 2. Performance at final iteration (

f_{30}

) over 10 runs: Cross-in-Tray 2D function.

Method	Min ( $f_{30}$ )	Max ( $f_{30}$ )	Mean ( $f_{30}$ )	Std ( $f_{30}$ )	Success Rate
Method	Min ( $f_{30}$ )	Max ( $f_{30}$ )	Mean ( $f_{30}$ )	Std ( $f_{30}$ )	(f(x) < Target)
SBO (baseline)	−2.0626	−1.8894	−2.0447	$5.5 \times 10^{- 2}$	9/10
HSBO with normal prior	−2.0626	−2.0620	−2.0625	$1.96 \times 10^{- 4}$	10/10
HSBO with log-normal prior	−2.0626	−2.0426	−2.0564	$7.0 \times 10^{- 3}$	10/10
HSBO with exponential prior	−2.0456	−1.8891	−1.9740	$4.7 \times 10^{- 2}$	7/10
HSBO with Rayleigh prior	−2.0456	−1.8327	−1.9601	$6.6 \times 10^{- 2}$	6/10
HSBO with Pearson prior	−2.0626	−1.8833	−2.0270	$7.4 \times 10^{- 2}$	8/10

Table 3. Performance at final iteration (

f_{30}

) over 10 runs: Branin 2D function.

Table 3. Performance at final iteration (

f_{30}

) over 10 runs: Branin 2D function.

Method	Min ( $f_{30}$ )	Max ( $f_{30}$ )	Mean ( $f_{30}$ )	Std ( $f_{30}$ )	Success Rate
Method	Min ( $f_{30}$ )	Max ( $f_{30}$ )	Mean ( $f_{30}$ )	Std ( $f_{30}$ )	(f(x) < Target)
SBO (baseline)	0.3980	0.4303	0.4114	$1.3 \times 10^{- 2}$	6/10
HSBO with normal prior	0.3980	0.5495	0.4322	$4.4 \times 10^{- 2}$	3/10
HSBO with log-normal prior	0.3980	0.5495	0.4322	$4.4 \times 10^{- 2}$	3/10
HSBO with exponential prior	0.3980	0.4449	0.4171	$1.6 \times 10^{- 2}$	4/10
HSBO with Rayleigh prior	0.3980	0.5495	0.4322	$4.4 \times 10^{- 2}$	3/10
HSBO with Pearson prior	0.3980	0.4801	0.4252	$2.4 \times 10^{- 2}$	3/10

Table 4. Performance at final iteration (

f_{45}

) over 10 runs: hartmann 3D function.

Table 4. Performance at final iteration (

f_{45}

) over 10 runs: hartmann 3D function.

Method	Min ( $f_{45}$ )	Max ( $f_{45}$ )	Mean ( $f_{45}$ )	Std ( $f_{45}$ )	Success Rate
Method	Min ( $f_{45}$ )	Max ( $f_{45}$ )	Mean ( $f_{45}$ )	Std ( $f_{45}$ )	(f(x) < Target)
SBO (baseline)	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10
HSBO with normal prior	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10
HSBO with log-normal prior	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10
HSBO with exponential prior	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10
HSBO with Rayleigh prior	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10
HSBO with Pearson prior	−3.8609	−3.8428	−3.8539	$6.5 \times 10^{- 3}$	10/10

Table 5. Performance at final iteration (

f_{60}

) over 10 runs: Sum of Different Powers function.

Table 5. Performance at final iteration (

f_{60}

) over 10 runs: Sum of Different Powers function.

Method	Min ( $f_{60}$ )	Max ( $f_{60}$ )	Mean ( $f_{60}$ )	Std ( $f_{60}$ )	Success Rate
Method	Min ( $f_{60}$ )	Max ( $f_{60}$ )	Mean ( $f_{60}$ )	Std ( $f_{60}$ )	(f(x) < Target)
SBO (baseline)	$2.3 \times 10^{- 5}$	$2.2 \times 10^{- 4}$	$1.3 \times 10^{- 4}$	$6.5 \times 10^{- 5}$	10/10
HSBO with normal prior	$2.3 \times 10^{- 5}$	$2.2 \times 10^{- 4}$	$1.3 \times 10^{- 4}$	$6.5 \times 10^{- 5}$	10/10
HSBO with log-normal prior	$4.0 \times 10^{- 4}$	$3.3 \times 10^{- 3}$	$1.3 \times 10^{- 3}$	$9.9 \times 10^{- 4}$	5/10
HSBO with exponential prior	$2.7 \times 10^{- 3}$	$9.3 \times 10^{- 2}$	$4.3 \times 10^{- 2}$	$3.1 \times 10^{- 2}$	-
HSBO with Rayleigh prior	$2.7 \times 10^{- 3}$	$9.3 \times 10^{- 2}$	$4.4 \times 10^{- 2}$	$3.2 \times 10^{- 2}$	-
HSBO with Pearson prior	$2.3 \times 10^{- 5}$	$5.0 \times 10^{- 4}$	$1.5 \times 10^{- 4}$	$1.4 \times 10^{- 4}$	10/10

Table 6. Performance at final iteration (

f_{90}

) over 10 runs: hartmann 6D function.

Table 6. Performance at final iteration (

f_{90}

) over 10 runs: hartmann 6D function.

Method	Min ( $f_{90}$ )	Max ( $f_{90}$ )	Mean ( $f_{90}$ )	Std ( $f_{90}$ )	Success Rate
Method	Min ( $f_{90}$ )	Max ( $f_{90}$ )	Mean ( $f_{90}$ )	Std ( $f_{90}$ )	(f(x) < Target)
SBO (baseline)	−3.0166	−2.9428	−2.9742	$2.0 \times 10^{- 2}$	-
HSBO with normal prior	−3.0166	−2.9428	−2.9740	$2.0 \times 10^{- 2}$	-
HSBO with log-normal prior	−3.0166	−2.9583	−2.9864	$1.6 \times 10^{- 2}$	-
HSBO with exponential prior	−3.0166	−2.9428	−2.9742	$2.0 \times 10^{- 2}$	-
HSBO with Rayleigh prior	−3.0166	−2.9428	−2.9742	$2.0 \times 10^{- 2}$	-
HSBO with Pearson prior	−2.9871	−2.9184	−2.9632	$2.3 \times 10^{- 2}$	-

Table 7. Summary of observed patterns and associated key findings across benchmark functions.

Patterns	Key Points
Normal and log-normal priors	Across most test cases, they achieve high success rates, generally comparable to SBO. On the Cross-in-Tray function, characterized by sharp curvature and several localized minima, both priors outperform SBO by improving stability and robustness.
Pearson prior	Across most benchmark functions, it maintains high success rates, generally comparable to the SBO baseline. On unimodal problems, Pearson stands out as the most stable prior: for irregular shapes with steep changes (Gramacy and Lee) it performs best overall, while for smooth convex functions (Sum of Different Powers) it outperforms the other priors but remains slightly below SBO.
Exponential and Rayleigh priors	Generally less reliable, with high variance and lower success rates across test cases.
Dimensionality effect	Performance decreases with dimensionality, as illustrated by Hartmann 6D.
Problem structure dependency	Some problems, such as Branin, resist improvement from priors (SBO remains superior).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Santos Almonte, C.; Sanchez Jimenez, O.; Souza de Cursi, E.; Pagnacco, E. Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization. Algorithms 2025, 18, 557. https://doi.org/10.3390/a18090557

AMA Style

Santos Almonte C, Sanchez Jimenez O, Souza de Cursi E, Pagnacco E. Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization. Algorithms. 2025; 18(9):557. https://doi.org/10.3390/a18090557

Chicago/Turabian Style

Santos Almonte, Carol, Oscar Sanchez Jimenez, Eduardo Souza de Cursi, and Emmanuel Pagnacco. 2025. "Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization" Algorithms 18, no. 9: 557. https://doi.org/10.3390/a18090557

APA Style

Santos Almonte, C., Sanchez Jimenez, O., Souza de Cursi, E., & Pagnacco, E. (2025). Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization. Algorithms, 18(9), 557. https://doi.org/10.3390/a18090557

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization

Abstract

1. Introduction

2. Background

2.1. Standard Bayesian Optimization (SBO)

2.2. BOPrO

3. Data-Driven Prior Construction in Hilbert Spaces for Bayesian Optimization

3.1. Preliminaries

3.2. Construction of Priors

4. Results and Discussion

4.1. Test Design and Technical Aspects

4.2. Benchmark Problems

4.3. Application: Shape Optimization of a Solid in Linear Elasticity Under Uniaxial Loading

5. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Description of Test Functions

Appendix B. Mean of Simple Regret

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI