4.1. Method Overview
The core concept of the LTR-MHA method is to leverage the LTR framework to extract knowledge from historical execution data of multiple metaheuristic algorithms. The trained prediction model dynamically predicts the scores of candidate update behaviors during the search process, thereby intelligently selecting the update behavior most likely to improve the current solution. The overall methodological flow is illustrated in
Figure 1.
First, information is extracted from the update behaviors during the solving process of metaheuristic algorithms to construct a feature set and the corresponding label set. Next, a prediction model is trained to learn the relationship between different update behaviors and solution quality, which is used to predict the effectiveness of each candidate update behavior. Once the prediction model is trained, the relevant feature vectors of all candidate behaviors for the search agent to be updated are input into the model. By evaluating the feature vector of the current search agent, the model predicts a corresponding score, which reflects the potential solution quality after applying the given update behavior. Finally, the behavior with the highest predicted score is chosen to update agents.
In this way, the search process of algorithm can be guided more intelligently, avoiding blind exploration and thereby improving convergence speed and solution quality. The innovation of the LTR-MHA lies in applying LTR to learn the ranking of metaheuristic algorithm behaviors, enabling the automated generation of new algorithms. This approach breaks through the fixed search framework of traditional metaheuristic algorithms and offers a novel perspective for tackling optimization tasks. Notably, the primary objective of this study is not to identify the optimal algorithm combination but to verify the feasibility of multi-algorithm integration strategies in boosting the performance of individual algorithms. Based on this objective, three representative classical metaheuristic algorithms (GA, WOA, and HHO) are selected as the subjects of investigation in this work.
4.2. Feature Selection
The training feature set can be represented as a matrix
, where each row corresponds to a feature vector, and
denotes the feature value of the
i-th feature vector at the
j-th component. There are three sorts of features: search-dependent, solution-dependent, and instance-dependent [
1]. Search-dependent features are relevant to the search procedure, such as overall improvement over the initial solution. Solution-dependent features are linked to the solution encoding scheme. For example, in the TSP, the whole path encoding can be defined explicitly as a feature. Instance-dependent features record the issue instance’s specific properties, such as the number of vehicles and vehicle capacity. It is worth mentioning that when search-dependent or instance-dependent features are used, the knowledge gained can be transferred to other instances of comparable issues or even used to solve different problems. However, solution-dependent features are frequently constrained by unique issues, making it challenging to create generalizable approaches.
Therefore, in this method, search-dependent features are used to construct the feature set. Furthermore, two main coefficient vectors of the WOA, and ; the escape energy E in HHO; and the crossover and mutation rates in the GA are also included as components of the feature set. These parameters help to more comprehensively reflect the performance characteristics of the algorithm under different update behaviors.
Moreover, quantitatively assessing the levels of exploration and exploitation in algorithms remains an open scientific question [
34]. To address this issue, a feasible approach is to monitor the diversity of the population during the search process [
35,
36]. As an important metric, population diversity can provide a certain level of assessment of the exploration and exploitation in metaheuristics. Population diversity describes the extent of dispersion or clustering of search agents during the iterative search process. It can be calculated with the following formula:
where
represents the median value of the
j-th variable across all individuals in the population, and
denotes the value of the
j-th variable for the
i-th search agent. The total number of agents in the current iteration is given by
n, while
m represents the number of variables in the potential solution to the optimization problem. The term
quantifies the average distance between each individual’s value in the
j-th dimension and the median of that dimension, thus representing the diversity of the population along that specific variable.
, in turn, is the average of all
values across all dimensions, which reflects the overall diversity of the population. It is important to note that population diversity should be recalculated at each iteration. And
Table 1 presents the list of feature variables used in this study, along with their definitions.
4.3. Feature Extraction and Model Training
The core of the LTR-MHA lies in extracting key features from each iteration update of the metaheuristic algorithms. Specifically, the target problem is solved individually using the selected metaheuristic algorithms (WOA, HHO, and GA). These algorithms optimize solutions through a series of update behaviors, such as shrinking encircling, spiral updating, and random search in the WOA. To systematically represent these diverse update behaviors for the purpose of Learning-to-Rank modeling, each behavior is numerically encoded as a distinct action.
Table 2 presents the update behaviors of the three metaheuristic algorithms, along with their corresponding numerical encodings for the convenience of subsequent experiments.
Based on the information in
Table 2, the action space of the LTR-MHA can be defined as follows:
In continuous function optimization problems, the objective of metaheuristic algorithms is to search for the best solution within the continuous space that minimizes the objective function value. Let the algorithms set be
, where the population of algorithm
is denoted as
, consisting of
m search agents (individuals). Each search agent
represents a candidate solution. A solution
in
D-dimensional space is encoded as:
where
D represents the dimension, and
denotes the
j-th variable in the solution vector. Each variable
satisfies predefined boundary constraints. By continuously updating the positions of search agents within the population, the algorithm drives the objective function value
progressively closer to the global optimum.
At each iteration of the algorithm, the search agent
selects one action from multiple candidate update behaviors to update its position and constructs the corresponding feature vector
. According to
Table 1 and
Table 2, suppose the update behavior selected by
in the current iteration is
, where the value range of
is
. To ensure that all feature values fall within the range
, normalization is applied to each feature individually. The normalization formula for
is as follows:
where
and
represent the upper and lower bounds, which are 12 and 1, respectively. Similarly,
and
are normalized in the same way. The values of
,
, and
are originally within the range
. Meanwhile,
and
are
D-dimensional vectors with each component of
in
and each component of
in
, while
E is a scalar in
. After normalization, these three features are also scaled to
. Thus, the feature vector can be expressed as follows:
Here,
and
denote the
j-th component of the coefficient vectors
and
corresponding to the
i-th individual in the population. To convert the vector-valued parameters into scalar features, we compute the the mean value of
and
as follows:
.
It is worth noting that the information from unselected candidate actions is also crucial for model training. Therefore, corresponding feature samples need to be constructed for these actions as well. To prevent any impact on the optimization process, a replica of the current search agent should be created to simulate the updates of other candidate actions and record their feature data. In addition, when constructing the feature set, the core parameter values of other algorithms should be preserved and assigned a value of zero. For example, when executing the WOA algorithm, the parameter E of HHO, as well as the crossover rate and mutation rate of GA, should all be retained in the feature vector but uniformly set to zero. This design ensures the completeness of the feature vector.
Figure 2 illustrates the flowchart of training set construction and model training. To reduce the complexity of the figure, only a subset of feature vectors (
–
) is retained as examples.
For the selected action, the
is assigned as “1”, while for the unselected candidate actions, the
is set to “0”. To assess the performance of three algorithms, we rank their best fitness
in descending order (where lower values indicate better solutions) and assign the corresponding ranking numbers as the new
values. For example, as shown in
Figure 2, if the GA achieves the best fitness values (
) among the three algorithms, then the
for the selected actions within the GA is adjusted from “1” to “3”. Similarly, the
for the WOA and HHO are adjusted to “2” and “1”, respectively. This strategy more properly reflects each algorithm’s relative performance during the optimization process, as well as providing more discriminative annotations for predictive model training. After completing the optimization phase of each algorithm, the feature sets corresponding to the optimal solutions provided by the three algorithms are merged to form the training feature set for the prediction model
. The detailed implementation of the training set construction for the LTR-MHA is shown in Algorithm 1, with the specific feature extraction and construction process of the metaheuristic algorithm exemplified by the WOA in Algorithm 2.
Algorithm 1 The procedure of training set construction of LTR-MHA |
- Require:
Metaheuristics set , Update behaviors set , Dimension D, Maximum iterations , Population size m, Bounds , fitness function ; - Ensure:
Training feature set, label set; - 1:
Initialize search population ; - 2:
for algorithm do ▹See detailed WOA update steps in Algorithm 2 - 3:
Compute fitness values for all search agents via ; - 4:
Select the best search agent ; - 5:
while do - 6:
for search agent do - 7:
Execute strategy of ; ▹Update strategy follows Equation ( 16) - 8:
Record selected action ; - 9:
Construct feature vector according to Equation ( 19); - 10:
for candidate action do - 11:
if then - 12:
; - 13:
else - 14:
; - 15:
Construct feature vector for unselected actions via Equation ( 19); - 16:
end if - 17:
end for - 18:
end for - 19:
Adjust search agents exceeding search space boundaries via ; - 20:
Recalculate fitness values for all search agents via ; - 21:
Update when a better solution is available; - 22:
; - 23:
end while - 24:
Record best fitness value for algorithm ; - 25:
end for - 26:
Sort algorithms by values; - 27:
reassign , where rank is the descending order; - 28:
return Training feature set, label set;
|
Algorithm 2 The procedure of feature extraction and construction of WOA |
- Require:
Initialize search population P, max iterations , fitness function , WOA parameters. - Ensure:
For each agent: construct feature vectors and for both the selected and unselected actions. - 1:
Evaluate fitness for all search agents; - 2:
Select the best search agent ; - 3:
while
do - 4:
for each whale do - 5:
Update , and p; - 6:
if then - 7:
if then ▹Shrinking Encircling (SE) - 8:
Compute new position via Equation ( 2); - 9:
; //The numerical coding of update behaviors is shown in Table 2 - 10:
; - 11:
else ▹Random Search (RS) - 12:
Select random leader ; - 13:
Compute new position via Equation ( 6); - 14:
; - 15:
; - 16:
end if - 17:
else ▹Spiral Updating (SU) - 18:
Compute new position via Equation ( 4); - 19:
; - 20:
; - 21:
end if - 22:
Construct feature vector according to Equation ( 19); - 23:
for candidate action do - 24:
if then ▹select Random Search - 25:
; - 26:
Construct feature vector for unselected action (SE) via Equation ( 19); - 27:
; - 28:
Construct feature vector for unselected action (SU) via Equation ( 19); - 29:
else if then ▹select Shrinking Encircling - 30:
Similarly, construct the feature vectors for the unselected actions (RS, SU); - 31:
else ▹select Spiral Updating - 32:
Similarly, construct the feature vectors for the unselected actions (SE, RS); - 33:
end if - 34:
end for - 35:
end for - 36:
Adjust search agents exceeding search space boundaries via ; - 37:
Recalculate fitness values for all search agents via ; - 38:
Update when a better solution is available; - 39:
; - 40:
end while - 41:
return The best of WOA;
|
In Algorithm 1, to ensure fairness in the comparison among the algorithms, a unified initial population is used. Subsequently, the algorithms in the given algorithm set are executed sequentially, and the initial fitness values are calculated (lines 1–4). Within the , the search agents are updated according to the strategies of the metaheuristic algorithms while recording the selected update action and constructing the corresponding feature vector (lines 6–9). The update behaviors of each metaheuristic algorithm (WOA, HHO, GA) are the original, unmodified versions. The specific procedure for feature extraction and construction is illustrated using the WOA as an example, as shown in Algorithm 2. At the same time, the information of unselected candidate actions is equally crucial for model training. Therefore, feature vectors for these actions are also constructed (lines 11–15). In each iteration, the algorithm checks whether the population exceeds the search space boundaries and performs necessary adjustments, followed by recalculating the fitness values of each solution. Whenever a better agent is available, the best agent is updated (lines 19–21). At the end of the algorithm, the best fitness of each algorithm is recorded (line 24). Finally, the algorithms are ranked according to their best fitness values , and their corresponding labels are reassigned based on this ranking. The ranking is in descending order of best fitness, with the resulting position used as the new (lines 26–27). Thus, among the three algorithms, the one with the smallest best fitness value receives the highest label (i.e., ). The feature vectors set and its corresponding labels set together form the training dataset.
During the training stage of the predictive model , the constructed feature set, along with its corresponding label set, is used as the training set. Through supervised learning, the model is able to predict the potential quality of the solutions generated by the update actions performed by the current search agent and produce a prediction score. This score guides the algorithm in making decisions during the search process, steering the exploration toward more promising regions of the solution space. The model is implemented using the standard RandomForestRegressor from scikit-learn, without modification to its core training procedure. The full set of hyperparameter configurations is provided in the experimental settings.
4.4. Algorithm Implementation of LTR-MHA
Once the predictive model is trained, it can be employed to guide the search process. During each iteration, when a search agent requires position updating, the feature vectors for all candidate update actions of the current search agent are first constructed and fed into the predictive model
. The model evaluates these feature vectors and outputs a set of predicted scores. Based on these scores, the algorithm selects the candidate action with the highest predicted score for execution. As illustrated in
Figure 1, the model predicts that the action
corresponds to the solution with the highest predicted score (0.78), indicating that this action is most likely to guide the search process towards the best solution. Therefore, in this iteration, the search agent selects
. The implementation of the LTR-MHA is detailed in Algorithm 3.
Algorithm 3 Pseudocode of LTR-MHA |
- Require:
predictive model , Update behaviors set , Dimension D, Maximum iterations , Population size m, Bounds , fitness function ; - Ensure:
The best solution , the best fitness; - 1:
Initialize population ; - 2:
Compute fitness values for all search agents via ; - 3:
Select best agent to ; - 4:
while
do - 5:
for each agent do - 6:
for candidate behavior do - 7:
Construct feature vector for according to Equation ( 19); - 8:
end for - 9:
Feed feature set to , obtain prediction scores ; - 10:
Determine the optimal behavior: ; - 11:
Update using ; ▹Update strategy follows Equation ( 16) - 12:
end for - 13:
Adjust search agents exceeding search space boundaries via ; - 14:
Recalculate fitness values via ; - 15:
Update if a better solution is available; - 16:
; - 17:
end while - 18:
return , the best fitness;
|
First, the population of search agents P is initialized, and the fitness value of each search agent is calculated. The best search agent is selected as the initial optimal solution (lines 1–3). Then, within the , for each search agent and candidate action, the feature vectors to be evaluated are constructed. Since this method involves 12 update actions, the feature set to be predicted should contain 12 feature vectors (lines 5–8). These feature vectors are then fed into the predictive model, which outputs a set of predicted scores. The algorithm selects the update action corresponding to the highest predicted score for execution (lines 9–11). Subsequently, the algorithm checks whether any search agent has exceeded the search space boundaries, adjusts their positions accordingly, and recalculates the fitness values of all search agents (lines 13–14). If a better solution is found, the is updated (line 14). After the loop ends, the final optimal solution is returned (line 17).