To automate the generation of algorithms, we define the Master Problem (MP), characterized by an objective function and a set of constraints. This objective function optimizes the performance of an algorithm in processing a specific set of MKP instances. Performance quantifies the average error incurred by the algorithm while solving these instances, where error refers to the relative discrepancy between the algorithm-derived solution for an instance and the optimal solution for the same instance. Considering a feasible MKP solution
, a particular algorithm
, and a specific set of instances
used in algorithm generation, Equation (3) provides the formulation of this MP. The search for an optimal algorithm navigates through three different domains: the feasible MKP domain (
), the algorithmic domain (
), and the domain of problem instances (
). In this context, we define an optimization problem (4) that simultaneously traverses these three domains to identify the most effective algorithm.
A syntactic tree, where internal nodes represent functions and leaf nodes represent terminals, is considered an algorithm. In the context of the MKP, these functions serve as high-level instructions that determine how terminals combine to construct feasible solutions. Algorithm generation occurs by solving the MP, which aims to minimize the relative error of an algorithm when approximating the optimal MKP solution. The population evolves from an initial set of algorithms by applying genetic operators. Over successive generations, new algorithms are created and refined by solving sets of MKP instances with increasing efficiency. The algorithm generation process consists of five main steps:
Let’s describe such steps in more detail.
Solution container definition. The solution container comprises various data structures (lists) tailored to the problem. They consider two classes: variable lists and fixed lists. Two variable lists keep the information on the items contained in the knapsack:
The fixed lists organize the instance’s IDs of the items based on a criterion derived from well-known MKP heuristics. Seven lists are considered:
3.1. Definition of Functions and Terminals
In the second step, we establish functions and terminals that act as fundamental operations on the container data structures. The precise definition of these elements is crucial for creating algorithms that can effectively transfer items between IKL and OKL while optimizing total profit. We devise terminals based on existing construction heuristics for the MKP and propose additional functions that facilitate the generation of diverse combinations of these terminals. The functions and terminals must comply with the closure and sufficiency properties [
54]. Therefore, every function and every terminal must have known and bounded return values. Each function and terminal has a True or False return value to comply with the closure property. Each terminal feasibly adds or removes an item from the knapsack to comply with the sufficiency property. Therefore, the item insertion terminals are sufficient to comply with this property.
The functions are high-order algorithmic instructions. Most programming languages use them as control structures, such as the logical operators Not, Or, And, and Equal, the conditional statement If_Then_Else, and the loop Do_While. Seven functions, described below, were implemented:
If_Then (A1, A2): This function executes argument A1, and if it returns True, it executes argument A2. The function’s return value is equal to the return value of A1.
If_Then_Else (A1, A2, A3): This function executes argument A1, and if it returns True, it executes A2. Otherwise, it executes A3. The function always returns True.
Not (A1): This function executes argument A1 and returns the negation of the value.
And (A1, A2): This function executes argument A1, and if it returns True, it executes argument A2. If the executions of both arguments return True, the function returns True; in any other case, the function returns False.
Or (A1, A2): This function executes argument A1, and if it returns False, it executes argument A2. If the executions of both arguments return False, the function also returns False. Otherwise, the function returns True.
Equal (A1, A2): This function executes arguments A1 and A2 in that order. If both executions return equal values, the function returns True. Otherwise, it returns False.
Do_While (A1, A2): First, this function executes argument A1. As long as the return of the execution of A1 is True, argument A2 is executed. The cycle is executed a maximum number of times, equal to the number of items in the instance; otherwise, it stops when it completes a maximum number of iterations without changes in the benefit and total weight of the knapsack.
The terminals add or remove items from the knapsack according to a specific criterion; thus, each terminal is a heuristic capable of modifying the data structure. The terminals should search only in the space of feasible solutions, implying that none could generate an unfeasible MKP solution; consequently, all the algorithms produced construct only feasible MKP solutions. A total of 13 terminals were implemented:
Add_Max_Profit: This terminal places the first item in PL that is in OKL; that item is removed from OKL and inserted in IKL.
Add_Min_Weight: This terminal locates the last item in WL that is in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Add_Max_Normalized: This terminal locates the first item in NBPL that is in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Add_Max_Scaled: This terminal selects the first item in SNBPL and in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Add_Max_Generalized: This terminal locates the first item in GDL that is in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Add_Max_Senju_Toyoda: This terminal locates the first item in STL that is in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Add_Max_Freville_Plateau: This terminal locates the first item in FPL that is in OKL. If an item is found and fits in the knapsack, it is removed from OKL and inserted into IKL.
Del_Min_Profit: This terminal locates the last item in PL and inserts it in IKL. If the knapsack is not empty, that item is removed from IKL and inserted into OKL.
Del_Max_Weight: This terminal locates the first item in WL that is in IKL. If the knapsack is not empty, that item is removed from IKL and inserted into OKL.
Del_Min_Scaled: This terminal locates the last item in SNBPL and IKL. If the knapsack is not empty, that item is removed from IKL and inserted into OKL.
Del_Min_Normalized: This terminal locates the last item with a lower value in NBPL that is in IKL. If the knapsack is not empty, that item is removed from IKL and inserted into OKL.
Greedy: This terminal constructs an initial feasible solution by transferring items from OKL to IKL, following a greedy criterion based on the ratio between the item’s value and its average weight across all dimensions. For each item
, the following ratio is computed:
The items in OKL are sorted in decreasing order according to . Iteratively, an item with the largest ratio value is selected and moved from OKL to IKL, provided that the capacity constraints are not violated. This process is repeated until no additional items can be added without exceeding the capacity limits.
Local Search: This terminal applies a local search procedure to the current solution contained in IKL, aiming to improve the total value without violating capacity constraints. An item and an item are selected, where refers to an item currently inside the knapsack. This item is considered for removal as part of an improvement strategy. The removal frees capacity, potentially allowing the inclusion of a more valuable item. is considered for insertion in place of , provided that its inclusion does not violate capacity constraints and the exchange results in a solution with a higher total value. The process terminates upon reaching Tmax or when no further improvements are possible.
The terminals generated by Gemini for the AGA in the MKP are presented below. These terminals introduce new operations to enhance algorithm flexibility and generalization while replacing specific original terminals to mitigate overfitting, as detailed in the subsequent analysis.
Add_Random: Inserts a random item from OKL to IKL if it satisfies the capacity constraints.
Del_Worst_Ratio_In_Knapsack: Removes the item in IKL with the worst value/weight ratio.
Del_Random: Removes a random item from IKL and inserts it into OKL.
Swap_Best_Possible: Swaps items between IKL and OKL to maximize the total value.
Is_Empty: Checks if IKL is empty (returns True/False).
Is_Near_Full: Checks if IKL is close to the maximum capacity (returns True/False).
Although syntax trees are commonly associated with compiler construction, in this study, they serve a different purpose. They provide a formal and evolutionary representation of algorithms within the framework of GP, ensuring syntactic validity during crossover and mutation operations [
25,
54]. This structure enables the automatic combination and transformation of functional components (functions and terminals), allowing the evolutionary process to explore a wide range of algorithmic architectures while maintaining logical consistency. Furthermore, as observed in the AGA convergence section, the syntax tree representation contributes to evolutionary stability and reduces the relative error across generations (
Section 4.7).
3.4. Evaluation and Evolution Instances
Two groups of instances are chosen, one for the evolutionary process and another for evaluating the resultant algorithms. Typically, the Tightness Ratio,
defines the structure of an MKP instance [
50]. Such a ratio expresses the scarcity of the capacities of each dimension of the knapsack as defined in Equation (15).
The instances were obtained from the OR Library and contain 100, 250, and 500 items with 5, 10, and 30 constraints [
55]. Additional constraints were included from the SAC94 Suite: Collection of Multiple Knapsack Problems. Some instances have α values of 0.25, 0.5, or 0.75, and only three have a known optimal value. We used the best-known solution instead of the optimal solution for the remaining MKP instances.
For comparability, we enforced the same number of clusters across all methods, rather than using each method’s native or data-driven cluster count. For K-Means, the number of clusters (k) was set to eleven, matching the groups produced by HDBSCAN and random clustering. Although HDBSCAN is a density-based algorithm that can automatically determine the number of clusters based on data density, in this study, it was configured to match the cluster count of K-Means closely. This alignment allowed the analysis to focus on the impact of the clustering method itself, rather than differences in partition granularity, on the specialization and generalization of the generated algorithms. By keeping the number of groups consistent across all clustering techniques, the experimental comparison centered on the structural and methodological differences inherent to each approach.
For the random clustering baseline, the number of clusters was fixed at eleven, matching the configurations used in K-Means and HDBSCAN. Once this number was defined, instances were assigned randomly and uniformly to each cluster, ensuring that all groups contained approximately the same number of instances. This procedure was not intended to represent a clustering algorithm per se, but rather to provide an unstructured reference point for comparison. The goal was to isolate the effect of structured instance organization on the specialization and generalization behavior of the automatically generated algorithms. By maintaining the same number of groups across all clustering methods, the random clustering established a baseline performance level against which the benefits of meaningful cluster formation could be evaluated.
The set of MKP instances is divided into 11 groups according to its statistical characteristics. The grouping was made by three methods: K-Means, HDBSCAN, and Random selection. In Random clustering, the 328 instances were divided into 11 groups, with 30 instances in the first 9 groups and 29 instances in the last two groups, maintaining the proportions of the training and test sets as in the other clusterings. Each group was further split into training and testing sets. An algorithm was generated by AGA for one group and evaluated across all groups using all instances within each group.
The notation MKPA1 refers to the algorithm trained on Group 1 of MKP instances. In this study, a total of 33 instance groups are considered, with each group associated with a specific set of algorithms automatically generated through a clustering process. The groups are organized as follows:
Groups 1 to 11 (MKPA1–MKPA11): These correspond to instances clustered using the K-Means algorithm, where each group gave rise to a specialized algorithm through AGA.
Groups 12 to 22 (MKPA12–MKPA22): These represent the instances clustered using the HDBSCAN algorithm, with corresponding algorithms generated and adapted specifically for each group.
Groups 23 to 33 (MKPA23–MKPA33): These correspond to randomly generated groupings, used as a comparative baseline to evaluate the effectiveness of guided clustering.
Table 2 presents the structural characteristics of the instance groups obtained through K-Means clustering; each group is associated with a specialized algorithm (MKPA1–MKPA11). The table details the number of training and test instances per group, as well as the diversity in instance size and number of constraints. Most groups contain between 27 and 63 instances, with MKPA6 having the largest number (63) and MKPA10 the smallest (5). The size of the instances varies widely, ranging from small-scale problems (e.g., sizes 10 to 28 in MKPA9 and MKPA10) to larger instances (e.g., size 500 in MKPA5 and MKPA1). Similarly, the number of constraints differs significantly across groups, ranging from as few as two constraints in MKPA10 to as many as 30 in MKPA2 and MKPA8. This variability illustrates the heterogeneity of the instance space. It justifies the need for clustering-based specialization, enabling the generation of algorithms tailored to the unique structural properties within each group.
Table 3 outlines the characteristics of the instance groups derived from clustering via HDBSCAN, each corresponding to a specialized algorithm (MKPA12–MKPA22). The table includes the number of training and test instances, the total number of instances per group, the range of instance sizes, and the number of constraints. Group sizes vary from as few as 11 instances (e.g., MKPA15 and MKPA17) to 31 in MKPA22. Instance sizes are primarily concentrated in the medium to large scale, with several groups exclusively containing instances of sizes 250 or 500 (e.g., MKPA13, MKPA14, MKPA16, MKPA18), while others, such as MKPA12 and MKPA20, include small-sized instances ranging from 30 to 100. The number of constraints across groups ranges from 5 to 30, with MKPA13 and MKPA14 including only high-dimensional instances (30 constraints), whereas groups such as MKPA18, MKPA20, and MKPA21 include instances with lower dimensionality (5 to 10 constraints). This distribution reflects the ability of HDBSCAN to form clusters with structurally coherent instances, which supports the generation of highly adapted algorithms for subsets sharing similar problem configurations.
Table 4 presents the characteristics of the instance groups formed through random clustering, each associated with a specialized algorithm (MKPA23—MKPA33). Unlike clustering methods guided by structural similarity (e.g., K-Means and HDBSCAN), the random grouping method includes a fixed number of instances per group, typically consisting of 30 instances (24 for training and 6 for testing), except for MKPA32 and MKPA33, which contain 29 instances. The diversity within these groups is notably broader, both in terms of instance sizes, which range from very small (e.g., 10, 15, 20) to large-scale instances (e.g., 500), and in the number of constraints, which span from 2 to 50 across the groups. For example, MKPA30 includes one of the most heterogeneous combinations of instance sizes and constraint values, whereas other groups, such as MKPA24 and MKPA27, though still diverse, show slightly narrower ranges. This high degree of intra-group variability highlights the lack of structural coherence in random clustering, potentially limiting the effectiveness of algorithm specialization when compared to strategies based on meaningful similarity metrics. Nonetheless, these groups serve as a valuable baseline for evaluating the performance gains achieved through clustering-driven specialization.
To analyze instances of the MKP, two matrix-based structures are proposed to systematically encode the relationships between items, their associated profits, and the available resources in each dimension. These representations are invariant under scale transformations, ensuring their robustness against changes in measurement units. The first structure corresponds to the matrix of weight-to-capacity proportions (denoted as E), where each element
represents the fraction of capacity consumed in dimension
by item
, according to Equation (16). This matrix enables the quantification of how well an instance conforms to its capacity constraints.
The second structure is the weight-profit efficiency matrix (denoted as
), whose elements
are given by Equation (17).
reflects the relative efficiency of each item in each dimension, considering the benefit per unit of resource consumed.
Both matrices enable the derivation of a wide range of statistical descriptors used in the quantitative characterization of the instances. Thus, the characterization of MKP instances is based on the statistical analysis of the and matrices, as these encode key information regarding the relationship between items, their profits, and the multidimensional constraints. The primary objective is to transform these matrices, whose dimensions vary depending on the instance, into standardized numerical representations that can be compared across instances, thereby facilitating their use in clustering processes and the automatic generation of algorithms.
We encode the instances through matrices E and F due to their ability to structurally represent the relationships among items, their profits, and multidimensional constraints, without depending on the number or ordering of elements. This statistical representation captures both the global patterns and local variations in each instance, ensuring comparability across problems of different sizes and scales. Moreover, by summarizing the information into normalized statistical descriptors, it prevents the loss of generality and enhances the stability of the clustering process. In preliminary experiments, other, more direct encodings (e.g., those based on the raw item values) exhibited higher intra-cluster variance and lower structural coherence, confirming the suitability of the statistical encoding approach adopted in this study.
To address the heterogeneity in instance sizes, a procedure known as the statistical descriptor matrix is implemented. This procedure transforms each input matrix (either or ) into a new matrix with fixed dimensions. Each cell of the resulting matrix contains a statistical summary of the corresponding values in the original matrix, thus enabling a coherent and comparable representation across instances of varying sizes. This procedure occurs in three main stages.
In the first stage, a set of statistical metrics is defined and systematically applied to the original matrices. These include classical measures of central tendency (mean, median, mode), dispersion (standard deviation, variance), and shape of the distribution (skewness, kurtosis). Additionally, percentile-based measures such as the 25th percentile, 50th percentile (median), and 75th percentile are computed to capture the spread and distribution of the data. Other relevant metrics include the coefficient of variation, which normalizes the standard deviation relative to the mean, as well as the minimum and maximum values, which provide bounds on the range of observed values.
As a result of this dual aggregation scheme, a set of features is derived from both the matrices, yielding a total of 259 variables that summarize key properties of each instance. This strategy enables the capture of not only global statistics but also internal patterns related to the ordering and interactions among items, an essential aspect for distinguishing instances that may share similar aggregated values yet exhibit distinct underlying structures. Although this approach generates a substantial number of potentially redundant or collinear variables, such challenges are addressed in a subsequent stage through specific techniques for collinearity reduction and relevant variable selection.
Once the features have been extracted, a structured preprocessing procedure is implemented to ensure the quality, interpretability, and efficiency of subsequent analyses. In the first stage, all variables are normalized using the Min-Max scaling method, which mitigates the effects of scale differences among attributes and facilitates comparison across heterogeneous dimensions. Subsequently, a collinearity reduction stage is carried out by calculating the Variance Inflation Factor (VIF), eliminating those variables with a VIF greater than 10. As a result of this process, the dataset was reduced to 42 variables, effectively minimizing redundancy among descriptors and enhancing the robustness of the resulting models.