2.2.1. Implementation of the IBO Method
BO is a sequential sampling strategy based on the Gaussian process (GP), which is employed to solve black-box function optimization problems [
18]. This paper improves BO by using Tent chaotic mapping and the reverse learning strategy to construct an initial population with strong search potential in population initialization. In terms of Gaussian model hyperparameter optimization, a radial basis kernel function hyperparameter training mechanism based on the Adam optimizer was adopted to achieve fast and robust optimal parameter search and improve optimization efficiency. Using L-BFGS-B to optimize the EI function and achieve global extremum exploration of the EI function at low computational cost, improved the accuracy and speed of candidate point selection. The specific steps of the proposed IBO method are as follows:
(1) Population initialization
The initial population in BO is typically generated randomly, which has the drawback of low global search efficiency. To overcome this limitation, this paper employs Tent chaotic mapping combined with the reverse learning strategy for population initialization. The principle is as follows: First, generate an initial population covering the entire range through Tent chaotic mapping (as shown in Equation (1)). Then, create a corresponding reverse population using the reverse learning strategy (as detailed in Equations (2) and (3)). After merging both populations, they are sorted by fitness value, with the top N best fitness values retained as the initial population NS. This method can effectively improve the quality of the initial solution and get closer to the global optimal region, so as to speed up the search for the global optimal.
The Tent chaotic mapping is used to generate the chaotic initial population
, the expression is as follows:
where
is the chaotic control parameter,
∈ [1.8, 2.0].
By adopting the reverse learning strategy, the reverse solution of each individual
is generated to obtain the reverse solution
of the chaotic initial population. The two together constitute the initial population X, and the expressions are as follows:
where
lb and
ub are the search boundaries of the parameters.
(2) Establishing the Gaussian model
GP is a parameter-free Bayesian model used to approximate an unknown objective function. For the unknown parameters
, their objective functions are
. The GP is used as the prior distribution of the objective function, which is expressed as follows:
where
is the mean function, initialized to the mean of the training data;
is the covariance function, determining the function form. The radial basis kernel function is selected as the core component of the covariance function, which is expressed as follows:
where
l is the length scale, which measures the influence of input parameter changes on function output;
is the signal variance, indicating the range of function output variation under zero mean; and
is the noise variance.
(3) Gauss model hyperparameter optimization
To further enhance the search accuracy of BO, the hyperparameters of the Gaussian model are optimized by maximizing the marginal log-likelihood function. The model can best fit the observed distribution of the objective function, improving prediction accuracy and generalization ability. The expression is as follows:
where
y is the observed value;
K is the kernel matrix, which maximizes the observed distribution of the objective function to improve the prediction accuracy and generalization ability;
I represents the independent and identically distributed characteristics of the observed noise; and
C is the normalization constant of the probability distribution.
Directly optimizing the above objective function often faces multidimensional and non-convex numerical optimization problems [
19,
20]. To enhance computational efficiency and stability, we employ the Adam optimizer during the gradient descent phase of GP training. This method suppresses gradient oscillations and accelerates convergence by simultaneously tracking exponentially weighted moving averages of both first-order and second-order moments, with the specific implementation steps as follows:
(1) The Adam optimizer parameters are initialized, including the learning rate η, the exponential decay rate β1 for the first-order moment estimation, the exponential decay rate β2 for the second-order moment estimation, and the numerical stability correction ϵ. Typical values are η = 10−3, β1 = 0.9, β2 = 0.999, and ϵ = 10−8, with initial first-order moment vector m0 = 0 and second-order moment vector v0 = 0.
(2) Parameter optimization based on the Adam optimizer. Based on calculated loss values, the Adam optimizer employs a backpropagation algorithm to compute gradients for each parameter. The chain rule is used to determine the gradient ∇θ, L(θ) under current hyperparameters, where L(θ) = −ℓ(θ). To avoid direct matrix inversion, the Cholesky decomposition is applied in kernel matrix calculations to solve α = K − 1y.
(3) Rectification update and deviation correction. Updating and performing a deviation correction to obtain
m^t and
v^t, ensuring accurate estimation in the initial stage. The iterative update of the hyperparameters is as follows:
This process is performed in the hyperparameter training phase of each iteration of BO to ensure that the GP model always uses the latest and most optimal set of hyperparameters during the function evaluation.
According to the above update rules, the Adam optimizer uses gradient information to adjust the parameters, so that the model can be closer to the true value in the subsequent prediction, and then gradually reduce the loss value.
(4) Constructing the EI function
As parameters are continuously updated, posterior probabilities are obtained. To further optimize the objective function, BO utilizes the collection function to obtain the global optimum of
f(
x). During each iteration, the collection function balances high mean values with high volatility to select appropriate sample points for hyperparameter selection. The EI explicitly measures how much the error of the next parameter set can be improved compared to the historical optimal, expressed as follows:
where
is the best fitness of historical observation summary;
is exploration-use of trade-off factors, general values (0.01,0.1);
,
represents the improvement rate and improvement density, respectively, which are calculated by table checking;
is the predicted standard deviation, which represents the uncertainty of the model, it is usually taken as 0.005; and
Z is a standardized measure of relative improvement, and its expression is as follows:
(5) Optimizing the EI function with the L-BFGS-B method
The generated candidate points cover the parameter space by evenly distributing initial candidates across the range [xmin, xmax]. Using the EI function shown in Equation (8), we calculate EI values for all candidates. The historical optimal fitness value is determined by the minimum objective function value from the current training set. The top 5 high EI points are selected as parallel initial points for the L-BFGS-B method. Independent optimization threads are initiated for each point with maximum function evaluations and strict boundary constraints. After each thread optimization completion, the results are merged to select the parameter combination with the minimum objective function value as the optimal candidate set Xb, while simultaneously outputting its fitness value.
2.2.2. Implementation of the IWOA Method
WOA is a meta-heuristic optimization algorithm that simulates the hunting behavior of humpback whales to find optimal solutions. Assuming the whale population size is
N and the dimension of the problem-solving space is
D, the optimal position for the
i whale in the
D-dimensional space would be
,
i = 1,2,3,…,
N. Then the optimal position of the whale corresponds to the optimal solution of the problem [
21]. WOA has a simple principle and few parameter settings, but it lacks population diversity, has poor global search ability, and is prone to falling into local optima. To address these shortcomings, this paper introduces a nonlinear convergence factor and the adaptive Levy flight disturbance strategy to enhance WOA. The specific steps are as follows:
- (1)
Initializing the whale population
Centered on the optimal candidate set Xb output by IBO, initial individuals are generated in search space .
- (2)
Surrounding prey
During the hunt, the whale first observes and identifies the location of the prey, then surrounds it. The distance between an individual and the optimal solution (the prey) is calculated by the following equation:
where
t is the current iteration number;
is the current whale position vector;
is the current optimal solution, that is, the position of the prey; and
C is the swing factor, calculated as follows:
where
r2 is a random number between [0, 1].
The whale position update formula is as follows:
where
A is the matrix coefficient expression, and the calculation expression is as follows:
where
r1 is a random number between [0, 1].
a is the convergence factor. Typically, a decreases linearly from 2 to 0 as the iteration count increases. However, this linear reduction approach often leads to issues such as insufficient exploration, premature termination of exploration, and inadequate development in later stages. To address these problems, this paper introduces a nonlinear convergence factor expressed as follows:
where a
max is a constant with a value of 2.
k is a constant that controls the descent rate of the convergence factor, the larger its value, the faster the rate of decline in later iterations, and
k > 0.
P is a constant that regulates the plateau length during exploration, meaning the first 100
p% of iterations maintain a larger value,
p∈(0, 1)).
is the maximum iteration number.
The nonlinear convergence factor enables precise control over the platform length during the initial search phase. By maintaining a relatively stable value of α throughout this stage, it allows for larger step sizes to expand the search scope, effectively addressing issues of insufficient exploration or premature termination. After transitioning from the exploration phase to the development phase, the value of a rapidly decreases to nearly 0, accelerating the convergence speed in the later stage and achieving high-precision convergence.
- (3)
Bubblenet attacks prey
Whales attack their prey in two ways: by contracting and surrounding prey, or by spiraling out and blowing bubbles.
- (i)
Contracting and surrounding prey
The whale position is obtained by Equation (12), and the position is achieved by nonlinearly decreasing the convergence factor a from 2 to 0, where A is a random number in [−1, 1].
- (ii)
Spiraling out and blowing bubbles
The first step is to calculate the distance between each individual whale and the current best position, and then simulate whales using a spiral upward method for calculation during hunting. The expressions are as follows:
where
D represents the distance between the
ith whale and the current optimal position (optimal solution);
b is a constant coefficient used to limit the spiral form;
l is a random number between [−1, 1], where
l = −1 indicates the whale is closest to the prey, and l = 1 indicates the whale is farthest from the prey.
When the whale uses the spiral upward form to surround the prey, it also needs to shrink the circle surrounding, and choose the same probability
p to shrink the surrounding and spiral update the optimal solution. The expression is as follows:
- (4)
Hunting prey
To expand the whale search range and avoid getting trapped in local optima, this phase of the algorithm no longer updates its position based on the currently optimal prey but instead updates according to randomly selected whales [
22]. Specifically, to enhance global search capability and prevent local optimization, when the random probability
p ≥ 0.5, if |
A| ≥ 1, the distance
D is randomly updated. The mathematical model simulating whale searching can guide other whales by randomly selecting an individual whale
. The specific expression is as follows:
where
is the current position vector of the randomly selected whale.
- (5)
Position update based on the adaptive Levy flight disturbance strategy
After each iteration, the system monitors for stagnation. If the optimal solution remains unchanged through multiple consecutive iterations, the adaptive Levy flight disturbance mechanism is activated. The Levy flight is a walking pattern combining short-distance searches with occasional long-range movements, it can explain many random phenomena in nature, such as Brownian motion, random walks, etc. When applied to WOA, this mechanism enhances population diversity, expands search coverage, and improves local optimum escape capabilities. The position update formula is as follows:
where
α is the step size coefficient that controls the disturbance intensity, typically set to 0.1;
β denotes the heavy-tailed distribution degree, with 1.5 balancing exploration;
μ and
v are the Gaussian random vector and the unit Gaussian random vector, respectively, controlling the direction and magnitude of the disturbance. Enforcement of boundary constraints
.
- (6)
After the iteration ends, the optimal solution XWOA of whale optimization is output.
2.2.3. Reverse-Guided Optimization Mechanism
The reverse-guided optimization mechanism is used to select the optimal solution based on the fitness-oriented optimization strategy, and the feedback is returned to the Bayesian training set to complete the closed-loop process of reverse learning feedback. The specific steps are as follows:
(1) For the optimal solution XWOA of IWOA, its reverse solution Xo is generated according to Equation (2);
(2) The fitness values and of the optimal solution XWOA and the reverse solution Xo are calculated, respectively;
(3) Selecting based on the fitness-oriented optimization strategy:
If , Then the best result is obtained by reverse solution, ; if not, . And calculate the fitness value .
(4) Importing the preferred solution into the Bayesian training set . The initial training set is updated as the input of the next round of BO, forming a closed-loop process of “Bayesian global exploration → whale local optimization → reverse learning feedback”.
The synergistic effect of IBO and WOA is grounded in the complementary optimization paradigms of both algorithms and the profound theoretical reinforcement of global exploration capabilities by the reverse-guided feedback mechanism. IBO constructs a global probabilistic landscape through Gaussian processes, with its gradient-based hyperparameter optimization mechanism excelling in high-precision local exploitation. The improved WOA, through adaptive Levy flight perturbations regulated by a nonlinear convergence factor, focuses on efficient global exploration and local hunting. The core theoretical advantage of the reverse-guided feedback mechanism lies in overcoming the inherent limitations of sequential uncoupled optimization strategies. Sequential strategies are inherently unidirectional in the information transfer, where the output of the preceding algorithm serves as fixed input for the subsequent one. Potential suboptimal solutions or insufficiently explored regions from the preceding stage are difficult for the subsequent algorithm to effectively correct, constraining the breadth and depth of global search. The reverse-guided mechanism in this work dynamically feeds the optimization results of WOA and their reverse solutions back into the IBO training set based on fitness-oriented selection. This process injects and calibrates the IBO’s global probabilistic model in real-time with critical solution space information discovered during the WOA exploration, particularly potential high-quality regions distant from the initial population revealed through Levy perturbations. This closed-loop feedback significantly enhances the representational accuracy of the IBO surrogate model regarding the structure of complex multimodal solution spaces, thereby systematically strengthening the algorithm’s proactive exploration guidance capability towards unknown high-quality regions. This fundamentally enhances the robustness against local optima entrapment and the overall global optimization performance.