Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Clustering-Guided Automatic Generation of Algorithms for the Multidimensional Knapsack Problem

Mach. Learn. Knowl. Extr. 2025, 7(4), 144; https://doi.org/10.3390/make7040144

by Cristian Inzulza^1,2,*, Caio Bezares¹, Franco Cornejo¹ and Victor Parada^1,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Mach. Learn. Knowl. Extr. 2025, 7(4), 144; https://doi.org/10.3390/make7040144

Submission received: 11 September 2025 / Revised: 20 October 2025 / Accepted: 28 October 2025 / Published: 12 November 2025

(This article belongs to the Topic AI and Computational Methods for Modelling, Simulations and Optimizing of Advanced Systems: Innovations in Complexity, Second Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper proposes a hybrid framework that combines clustering of MKP instances with automatic generation of algorithm (AGA) via genetic programming. Experiment suggests that clustering improves relative error.

It is natural that if we work on reasonable number of clustering and further tune an algorithm on it, then our approach should perform better.

Here are some doubts/comments regarding the paper:

How are the number of clusters being determined? Are the number of clusters being fixed? For example, did you force HDBSCAN to choose the same number of clusters as k-means?
If previously, for HDBSCAN is chosen to matches the number of clusters of k-means? then the conclusion that k-means is better than HDBSCAN might not be valid.
Please explain random clustering, does it mean you randomly assign instances to clusters?
It would be great if you can write down how are the instances being encoded? Go through a few examples with the readers. Also, why do you choose such an encoding? Would the results be worse or better with a different encoding?
In line 110, include a citation around several optimization algorithm.
In terms of numbering, I note that you refer step 2 in section 3.1 and so on. Ideally, we want to discuss step k in section 3.k.

Author Response

Dear Reviewer,

We sincerely thank you for dedicating your time and expertise to the review of our manuscript, "Article Clustering-Guided Automatic Generation of Algorithms for the Multidimensional Knapsack Problem." We deeply appreciate your valuable feedback, which has been instrumental in enhancing the clarity and quality of our work. Below, we provide detailed responses to each comment, accompanied by the corresponding revisions incorporated into the manuscript, which are highlighted with track changes in the resubmitted files. We have endeavored to address all observations with precision and constructiveness, maintaining a rigorous approach aligned with the study’s objectives.

Comments 1: How are the number of clusters being determined? Are the number of clusters being fixed? For example, did you force HDBSCAN to choose the same number of clusters as k-means?

Response 1: We thank the reviewer for the valuable observation regarding the determination of the number of clusters. In this study, the number of clusters for the K-Means algorithm was fixed at eleven to ensure comparability with the HDBSCAN and Random clustering approaches. Specifically, K-Means was constrained to produce the same number of clusters (11) as HDBSCAN, thereby maintaining a consistent experimental framework and enabling a fair comparison among clustering methods.
Although HDBSCAN is a density-based approach capable of automatically determining the number of clusters according to data distribution and the minimum cluster size parameter, in this work its parameters (e.g., min_cluster_size) were tuned so that the resulting number of clusters would be similar to that produced by K-Means. This methodological decision allowed us to isolate the effect of the clustering strategy itself on the specialization and generalization of the automatically generated algorithms, preventing observed differences from being attributable to aunequal numbers of partitions. The changes are on page 13, paragraph 4, line 509.

“To ensure comparability among the clustering strategies, the number of clusters was standardized across all methods. In the case of K-Means, the number of clusters (k) was fixed at eleven, matching the number of groups obtained through HDBSCAN and random clustering. Although HDBSCAN is a density-based algorithm capable of automatically determining the number of clusters based on data density and the min_cluster_size parameter, in this study its configuration was adjusted so that the resulting number of clusters approximated that obtained with K-Means. This correspondence made it possible to isolate the effect of the clustering method itself, rather than differences in partition granularity, on the degree of specialization and generalization of the automatically generated algorithms. By maintaining a consistent number of groups across all clustering techniques, the experimental comparison focused exclusively on the structural and methodological differences inherent to each approach.”

Comments 2: If previously, for HDBSCAN is chosen to matches the number of clusters of k-means? then the conclusion that k-means is better than HDBSCAN might not be valid.

Response 2: We appreciate this observation and agree that the comparison would indeed be invalid if both methods had produced exactly the same partitions. However, although the number of clusters was matched to maintain comparability, the internal structures of the clusters generated by K-Means and HDBSCAN were fundamentally different. K-Means produces compact, approximately spherical groupings by minimizing the Euclidean distance to the centroid, whereas HDBSCAN forms clusters based on density connectivity, resulting in groups with irregular shapes, heterogeneous sizes, and variable densities. Therefore, the alignment in the number of clusters ensured only comparable experimental granularity, but not structural equivalence. Consequently, both methods operated on different distributions of instances, which directly affected the degree of specialization and generalization of the generated algorithms. Statistical results (Section 4.3) show that both methods achieved significant improvements in specialization (p < 0.01), although K-Means exhibited greater internal coherence and less overlap among groups. This indicates that its partitioning criterion—rather than the number of clusters—was more effective in grouping instances with similar behavior. Hence, the conclusion that K-Means achieved better performance than HDBSCAN remains valid, as it arises from intrinsic methodological differences rather than from the adjustment in the number of clusters. The changes are on page 28, paragraph 5, line 948.

“Although both K-Means and HDBSCAN were configured to produce a similar number of clusters to maintain comparability, their internal structures remained fundamentally different. K-Means generates compact and approximately spherical groupings by minimizing the Euclidean distance to the centroids, whereas HDBSCAN forms clusters based on density connectivity, resulting in heterogeneous groups with irregular shapes and variable densities. Thus, matching the number of clusters ensured only an equivalent experimental granularity, but not equivalent partition structures. Consequently, both methods operated over distinct instance distributions, directly influencing the degree of specialization and generalization of the automatically generated algorithms. The statistical analysis (Section 4.3, Table 15) confirms that, although both methods achieved significant specialization (p < 0.01), K-Means exhibited greater internal consistency and less overlap between groups. Therefore, the conclusion that K-Means outperformed HDBSCAN remains valid, as this difference arises from intrinsic methodological properties rather than from the alignment of the number of clusters.
”

Comments 3: Please explain random clustering, does it mean you randomly assign instances to clusters?

Response 3: We appreciate the reviewer’s observation and clarify that, in the random clustering procedure, the number of clusters was first fixed to match the configurations used in K-Means and HDBSCAN (eleven clusters). Once this number was established, the instances were assigned randomly and uniformly to each cluster, ensuring that all groups contained approximately the same number of instances. This approach is not intended to represent a clustering algorithm per se; rather, it serves as a baseline for comparison, allowing us to isolate the effect of structured clustering methods (K-Means and HDBSCAN) from the influence of randomness. By keeping the number of clusters and the total set of instances constant, the random clustering provides a reference point that reflects the performance of the generated algorithms under an unstructured partition. This, in turn, makes it possible to demonstrate how a meaningful organization of instances enhances both the specialization and the generalization of the automatically generated algorithms. The changes are on page 13, paragraph 4, line 521.

“For the random clustering baseline, the number of clusters was fixed at eleven, matching the configurations used in K-Means and HDBSCAN. Once this number was defined, instances were assigned randomly and uniformly to each cluster, ensuring that all groups contained approximately the same number of instances. This procedure was not intended to represent a clustering algorithm per se, but rather to provide an unstructured reference point for comparison. The goal was to isolate the effect of structured instance organization on the specialization and generalization behavior of the automatically generated algorithms. By maintaining the same number of groups across all clustering methods, the random clustering established a baseline performance level against which the benefits of meaningful cluster formation could be evaluated.”

Comments 4: It would be great if you can write down how are the instances being encoded? Go through a few examples with the readers. Also, why do you choose such an encoding? Would the results be worse or better with a different encoding?

Response 4: We appreciate the reviewer’s observation. In this study, each instance of the Multidimensional Knapsack Problem (MKP) was encoded as a numerical feature vector representing its main statistical and structural properties. Specifically, for each instance, we extracted aggregated descriptors such as the number of items, number of constraints (dimensions), total capacity, mean and standard deviation of weights and profits, correlation between weights and profits, and the average profit-to-weight ratio. This feature-based representation captures both the scale and the internal relationships of each instance, providing a compact yet informative description suitable for clustering.
This encoding was chosen because it preserves the intrinsic statistical structure of each instance while remaining independent of the size or ordering of items, thus ensuring comparability across problems of different scales. Alternative encodings (e.g., matrices with non-aggregated item values) were tested but resulted in higher intra-cluster variance and less stable partitions due to their sensitivity to instance size and item order. Therefore, the feature-based encoding proved to be the most balanced option in terms of descriptive richness, computational efficiency, and robustness for clustering-guided automatic algorithm generation. The changes are on page 17, line 42.

“The encoding of instances through matrices E and F was chosen for its ability to structurally represent the relationships among items, their profits, and multidimensional constraints, without depending on the number or ordering of elements. This statistical representation captures both the global patterns and local variations of each instance, ensuring comparability across problems of different sizes and scales. Moreover, by summarizing the information into normalized statistical descriptors, it prevents the loss of generality and enhances the stability of the clustering process. In preliminary experiments, other, more direct encodings (e.g., those based on the raw item values) exhibited higher intra-cluster variance and lower structural coherence, confirming the suitability of the statistical encoding approach adopted in this study.”

Comments 5: In line 110, include a citation around several optimization algorithm.

Response 5: We appreciate the reviewer’s observation. The paragraph has been revised to include appropriate citations that support the statement regarding the performance of the Automatic Generation of Algorithms (AGA) in various optimization problems. These references encompass both foundational and recent contributions to the field, such as Koza (1992), who introduced the concept of Genetic Programming as a mechanism for automatic algorithm design; Arcuri & Yao (2008), who applied co-evolutionary AGA approaches to combinatorial problems; and recent studies by Khosravi et al. (2021) and Cano et al. (2022), which demonstrate the applicability of AGA and Genetic Programming to diverse optimization tasks. The changes are on page 3, paragraph 5, line 111.

“Although the AGA performs well in several optimization problems [Koza, 1992; Arcuri & Yao, 2008; Khosravi et al., 2021; Cano et al., 2022], it remains a relatively new and emerging field with promising applications to various optimization problems. The knapsack family of problems is inherently complex, and traditional optimization algorithms have provided reasonable solutions to many such problems [Pisinger, 2005; Gomes da Silva et al., 2021]. Additionally, the MKP has a rich history of research in optimization techniques, resulting in many specialized algorithms for specific problem types [Chu & Beasley, 1998; Freville, 2004].”

Comments 6: In terms of numbering, I note that you refer step 2 in section 3.1 and so on. Ideally, we want to discuss step k in section 3.k.

Response 6: We thank the reviewer for this observation and agree that a consistent correspondence between section numbering and procedural steps improves readability. In the revised version, the numbering of the methodological steps has been adjusted to match the section numbering. Specifically, each main subsection in Section 3 now begins with a step number corresponding to its section index (e.g., Step 1 in Section 3.1, Step 2 in Section 3.2, etc.). This modification ensures a clearer structural alignment between the narrative and the methodology, facilitating cross-reference and comprehension for the readers. The changes are on page 7, paragraph 4, line 285.

“Step 3.1: Define a solution container on which the generated algorithms operate.

Step 3.2: Define the set of functions and terminals that comprise the algorithms.

Step 3.3: Define a fitness function to guide the search process toward the best algorithms.

Step 3.4: Select sets of MKP instances to evaluate the construction of the algorithms and

the algorithms produced.

Step 3.5: Determine the method for producing the algorithms and the values of the

involved parameters.”

Reviewer 2 Report

Comments and Suggestions for Authors

Abstract needs substantial shortening and focusing on the essentials.
Comparison with the latest generative AI should most definitely be included.
Generated algorithms codes and pseudo-codes must be given as a supplementary material or deposited online.
Generated algorithms complexity and efficiency should be calculated and determined.
Figure 2 is barely legible, this needs to be addressed.
While the syntax tree might be great for a compiler it is not a great tool in delivering algorithm code.
Generated algorithms are it seem are not achieving results that are better then the best manually designed algorithms. In fact, they are bellow state of the art.
Having many special-case algorithms for the same problem is not necessarily a plus.
By using group statistical testing we can tell whether there are significant group differences, however, it can not be known with these tests how did the algorithms perform. Thus, such an analysis is flawed.

Author Response

Dear Reviewer,

Comments 1: Abstract needs substantial shortening and focusing on the essentials.

Response 1: We sincerely thank the reviewer for this valuable suggestion. In the revised version, the Abstract has been substantially shortened and refocused on the essential contributions of the study. The new version succinctly summarizes the methodological integration between instance clustering and Automatic Generation of Algorithms (AGA), highlights the main experimental setup and findings, and omits secondary implementation details. The revised Abstract now emphasizes the novelty of combining clustering-guided specialization with genetic programming for the Multidimensional Knapsack Problem (MKP), while preserving clarity and conciseness. The total length was reduced to under 200 words, ensuring that only the core objectives, methods, and key results are presented to align with the journal’s guidelines. The changes are on page 1, paragraph 1, line 12-29.

[Marked in red in the manuscript]

“We propose a hybrid framework that integrates instance clustering with Automatic Generation of Algorithms (AGA) to produce specialized algorithms for classes of Multidimensional Knapsack Problem (MKP) instances. This approach is highly relevant given the latest trends in AI, where Large Language Models (LLMs) are actively being used to automate and refine algorithm design through evolutionary frameworks. Our method utilizes a feature-based representation of 328 MKP instances, applying K-means, HDBSCAN, and random clustering to form 11 clusters per method. For each cluster, a master optimization problem was solved via Genetic Programming, evolving algorithms encoded as syntax trees. Fitness was measured as relative error against known optima, a similar objective to those being tackled in LLM-driven optimization. Experimental and statistical analyses show that clustering-guided AGA significantly reduces average relative error and accelerates convergence compared with AGA trained on randomly grouped instances. K-means produced the most consistent cluster-specialization. Cross-cluster evaluation reveals a trade-off between specialization and generalization. The results demonstrate that clustering prior to AGA is a practical preprocessing step for automated algorithm design in NP-hard combinatorial problems, paving the way for advanced methodologies incorporating modern AI techniques.”

Comments 2: Comparison with the latest generative AI should most definitely be included.

Response 2: We express our sincere gratitude to the reviewer for their pertinent comment regarding the necessity of including a comparison with recent generative artificial intelligence approaches. In the revised manuscript, we have incorporated a detailed discussion in Section 4.3 (Statistical evaluation of algorithm specialization), presenting a comparison between our clustering-guided Automatic AGA framework and an approach utilizing the large-scale language model (LLM) Gemini to modify and enhance the proposed terminals. Specifically, we employed Gemini to optimize terminal selection by eliminating certain original terminals (e.g., Del_Min_Profit and Del_Max_Weight) and introducing new dynamically generated terminals, such as Del_Worst_Ratio_In_Knapsack and Swap_Best_Possible, based on structural patterns identified by the LLM. These LLM-generated terminals demonstrated a remarkable ability to capture complex relationships among instance characteristics, potentially contributing to more precise solutions in specific scenarios. However, the results indicate that, while LLM-enhanced algorithms achieved effective specialization in certain clusters, their impact on reducing relative error was less pronounced compared to algorithms trained solely through Genetic Programming (GP), which benefited from a broader exploration of the search space. To support this comparison, we have included four new tables in Section 4.3 (Tables 16–19), presenting average relative errors and statistical tests (Student’s t-tests and ANOVA) for algorithms generated by clustering-guided AGA, AGA with LLM-enhanced terminals, and random grouping. These tables confirm that the GP-based approach achieves a more consistent reduction in relative error across K-Means and HDBSCAN clusters (p < 0.05), although the LLM-assisted approach exhibits advantages in generalization for more heterogeneous clusters, particularly due to the LLM’s capacity to generate terminals that identify more precise structural patterns. This discussion strengthens the comparative evaluation and provides clear insights into the strengths and limitations of integrating generative AI techniques into the AGA framework, highlighting the potential of LLM-generated terminals to enhance precision in future developments. The changes are on Section 4.4, page 29, paragraph 1-8 , line 963-1042.

[Marked in red in the manuscript]

“In this study, Gemini, a large language model, was employed to generate a modified set of terminals for the Automatic Generation of Algorithms (AGA) applied to the Multidimensional Knapsack Problem (MKP). The terminals introduced by Gemini include Add_Random, Del_Worst_Ratio_In_Knapsack, Del_Random, Swap_Best_Possible, Is_Empty, and Is_Near_Full, which aim to enhance algorithmic diversity and adaptability. Conversely, the terminals Add_Max_Scaled, Add_Max_Generalized, Add_Max_Senju_Toyoda, Add_Max_Freville_Plateau, and Del_Min_Scaled from the initial experiment were removed to simplify the algorithmic structure and reduce the risk of overfitting, as evaluated in the subsequent analysis.
Table 16 presents the results for the K-Means clustering, where the algorithms MKPA1G–MKPA11G exhibit a clear tendency to specialize within their own training groups, as the minimum relative error values are concentrated along the main diagonal. The difference between diagonal and off-diagonal errors is significant, indicating that the algorithms are sensitive to the structural characteristics of the clusters in which they were generated. However, the case of MKPA9, with extremely high error values, suggests that some syntactic trees may suffer from overfitting or evolutionary instability. On average, the diagonal maintains low error values (~0.026), while the rest are considerably higher (~0.039), confirming strong specialization but limited generalization.”

“Table 17 presents the results for the HDBSCAN method, where the algorithms MKPA12G–MKPA22G exhibit smaller differences between diagonal and off-diagonal values, reflecting a greater capacity for transfer and generalization. The diagonal errors average around 0.018, while the off-diagonal ones are approximately 0.022. This suggests that the Gemini-optimized terminals produced more robust and adaptable evolutionary structures capable of handling cluster variability, thereby reducing strict dependence on the training group. Compared to K-Means, the behavior is more stable and less prone to overfitting.”

“Table 18 presents the results for the Random clustering, where the algorithms MKPA23G–MKPA33G exhibit a general decrease in specialization, as the minimum error values do not always coincide with the diagonal, and the differences between intra- and inter-group errors are smaller (~0.017 vs. ~0.020). This indicates that the algorithms lose correspondence between their training environment and the test data, resulting from the lack of structural organization within the groups. Nevertheless, the Gemini-enhanced terminals contribute to a degree of stability by preventing large deviations or extreme errors, even under clustering conditions that lack structural significance.”

“Table 19 presents the statistical results showing that specialization is more strongly evidenced in the K-Means and HDBSCAN methods, as both exhibit significant T-test results (p < 0.01), indicating that the algorithms perform significantly better on the groups where they were trained compared to the others. In contrast, the Random clustering method does not show statistically significant differences (p = 0.072), revealing a lower capacity for specialization. On the other hand, when analyzing the Friedman Test, it is observed that HDBSCAN (p = 0.007) achieves the strongest tendency toward generalization, outperforming both K-Means (p = 0.011) and Random clustering (p = 0.066). Consequently, K-Means demonstrates greater local specialization, whereas HDBSCAN exhibits a more robust and consistent generalization across different instance groups.”

“When comparing both statistical tables, it is observed that the originally proposed evolutionary algorithm exhibits a more pronounced specialization than the Gemini-modified algorithm. In the first table, the T-test values for K-Means (T = -5.191, p = 9.41×10⁻⁷) and HDBSCAN (T = -3.034, p = 0.0029) are significantly more extreme, with much lower significance levels (p < 0.001), indicating a stronger statistical difference between the errors within the training group and those from external groups. In contrast, in the Gemini-enhanced model, the T-values (3.96 and 3.42) are more moderate and the p-values slightly higher (p < 0.01), reflecting a less intense but more stable specialization.
In practical terms, this suggests that the original model tends to adapt more precisely to its training group (demonstrating greater local specialization), whereas the Gemini-enhanced model, although exhibiting a smaller contrast between intra- and inter-group errors, likely reduces the risk of overfitting and improves stability. In summary, the original algorithm outperforms in terms of specialization, while the Gemini-modified version demonstrates a more balanced trade-off between specialization and generalization.
In conclusion, K-Means stands out as the clustering method that achieves the strongest and most statistically significant specialization of the algorithms, supported by consistent results in both the normality and t-tests. It demonstrates the clearest distinction between training and testing performance, confirming a robust adaptation to its own cluster. HDBSCAN, although also significant, presents a milder specialization but greater balance, showing higher generalization across groups with lower overall error variance. Random clustering, by contrast, fails to achieve meaningful specialization, and its differences arise mainly from random variation. When comparing both experimental phases, the original evolutionary algorithm exhibits stronger specialization effects, while the Gemini-enhanced version achieves a more stable equilibrium between specialization and generalization, mitigating overfitting and improving robustness across clusters.”

Comments 3: Generated algorithms codes and pseudo-codes must be given as a supplementary material or deposited online.

Response 3: We thank the reviewer for the observation. In accordance with the suggestion, all generated algorithm codes and their corresponding pseudocodes will be provided as supplementary material accompanying the manuscript or deposited online in a public repository to ensure full transparency, reproducibility, and accessibility of the results.

Comments 4: Generated algorithms complexity and efficiency should be calculated and determined.

Response 4: We appreciate the reviewer’s insightful comment. In response, we have included a detailed analysis of the complexity and efficiency of the generated algorithms by adding three new tables (Tables 7, 9 and 11) corresponding to the K-Means, HDBSCAN, and Random groups. Each table reports the number of control and logical structures (While, IfThenElse, And, Or, Not), terminal nodes, and the estimated theoretical complexity (O(n²)–O(n³)) of each algorithm. This analysis highlights that cluster-guided evolution produces more structurally coherent and computationally efficient algorithms, whereas random evolution leads to higher syntactic redundancy and reduced efficiency. The changes are on Pag 20, paragraph 5, lines 724 – 741, Pag 22, paragraph 9, lines 767 – 781 and Pag 22, paragraph 12, lines 799 - 812

[Marked in red in the manuscript]

“Table 7 presents the structural and theoretical complexity of the algorithms generated within the K-Means cluster group (MKPA1–MKPA11). The analysis reveals that most algorithms exhibit a high degree of structural depth and nested iterations, with multiple conditional and logical operators. Specifically, 7 out of 11 algorithms reach a theoretical complexity of O(n³), mainly due to the presence of multiple While loops combined with IfThenElse and logical compositions (And, Or, Not). These structures favor extensive exploration of the solution space but increase computational cost. In contrast, algorithms MKPA3, MKPA6, MKPA10, and MKPA11 display a lower structural density (O(n²)), corresponding to simpler control flows and fewer nested conditions, which enhance computational efficiency. Overall, the K-Means group demonstrates a pattern of syntactic growth consistent with high specialization: deeper trees enable the discovery of more refined solutions within their cluster, although at the expense of higher execution complexity.”

“Table 9 presents the structural and theoretical complexity of the algorithms generated within the HDBSCAN group (MKPA12–MKPA22). Overall, a tendency toward more compact and efficient structures is observed compared to those produced by the K-Means clustering. Approximately six algorithms exhibit a theoretical complexity of O(n³), associated with the presence of multiple While loops combined with conditional (IfThenElse) and logical (And, Or, Not) operators. However, algorithms MKPA14, MKPA16, MKPA18, and MKPA20 show a reduced complexity of O(n²), characterized by simpler control flows and lower syntactic depth. Collectively, the algorithms derived from the HDBSCAN cluster achieve a balance between structural efficiency and exploratory capacity, maintaining a favorable relationship between computational complexity and performance, suggesting that density-based clustering promotes the generation of lighter and more generalizable algorithms.”

“Table 11 presents the structural and theoretical complexity of the randomly generated algorithms (MKPR1-MKPR11). This group exhibits greater variability in syntactic depth and in the number of conditional and logical operators, lacking a coherent design structure. Most algorithms show a theoretical complexity of O(n³), resulting from the redundant combination of While loops with conditional (IfThenElse) and logical (And, Or, Not) operators, which considerably increases computational cost without a proportional improvement in performance. Only three algorithms (MKPR4, MKPR5, and MKPR9) achieve a complexity of O(n²), displaying simpler structures and less nesting. Overall, the randomly generated algorithms tend to overgrow syntactically, evidencing a lack of structural optimization and a lower degree of specialization, confirming that unguided evolution without clustering produces less efficient solutions and is more prone to structural overfitting.”

Comments 5: Figure 2 is barely legible, this needs to be addressed.

Response 5: We thank the reviewer for the observation. In response, Figure 2 has been enlarged and reformatted to improve its readability and visual clarity, ensuring that all details and elements are now clearly distinguishable. The changes are on Pag 35, paragraph 5, line 1124.

Comments 6: While the syntax tree might be great for a compiler it is not a great tool in delivering algorithm code.

Response 6: We appreciate the reviewer’s insightful observation. While we acknowledge that syntax trees are traditionally associated with compiler design, in our framework they serve a different and well-justified role: representing algorithms as structured, evolvable entities within Genetic Programming (GP). This representation enables the automatic manipulation, crossover, and mutation of algorithmic components in a mathematically consistent way—essential for the AGA. The tree structure captures the logical hierarchy and flow of control between functions and terminals, allowing the evolutionary process to construct and refine executable algorithms that are syntactically valid and semantically coherent. Thus, the syntax tree is not intended as a presentation tool for end-user code, but as an internal model that enables the systematic exploration of the algorithmic design space. The changes are on Pag 7, paragraph 2-3, line 268.

[Marked in red in the manuscript]

“Genetic programming (GP) is an appropriate technique for solving the MP because it is an evolutionary optimization technique that can explore a vast space of potential structures represented by syntax trees, and we can easily map an algorithm onto a syntax tree (Koza, 1992). GP evolves these trees using selection, mutation, and recombination operators. GP can efficiently search for high-performing solutions to complex problems such as the MKP by selecting the best-performing algorithms and recombining them to create new variations. GP also has the advantage of being a flexible approach that can be adapted to various optimization problems.
We consider that a syntactic tree that contains functions in the intermediate nodes and terminals in the leaf nodes corresponds to an algorithm. For the MKP, the functions are high-level instructions that enable the combination of terminals that operate directly on constructing a feasible solution to the problem. The generation of algorithms occurs by solving the MP, which minimizes the relative error incurred by an algorithm when finding a near-optimal MKP solution. The population evolves from an initial population by applying genetic operators. Population after population, new algorithms are produced and refined by solving a set of instances of the MKP with increasing efficiency. The algorithmic production process follows five steps:

Step 3.1: Define a solution container on which the generated algorithms operate.

Step 3.2: Define the set of functions and terminals that comprise the algorithms.

Step 3.3: Define a fitness function to guide the search process toward the best algorithms.

Step 3.4: Select sets of MKP instances to evaluate the construction of the algorithms and

the algorithms produced.

Step 3.5: Determine the method for producing the algorithms and the values of the

involved parameters.

Comments 7: Generated algorithms are it seem are not achieving results that are better then the best manually designed algorithms. In fact, they are bellow state of the art.

Response 7: We thank the reviewer for this valuable observation. It is correct that the algorithms generated by our Automatic Generation of Algorithms (AGA) framework do not necessarily surpass the absolute best numerical results reported in the literature. However, our experimental design and evaluation perspective differ substantially from those traditional approaches. Previous studies (e.g., Chu & Beasley, 1998; Özcan et al., 2014; Drake et al., 2015) evaluated single algorithms over multiple independent benchmark instances and reported the average of the lowest relative errors achieved by those algorithms. In contrast, our framework evolves specialized algorithms trained over clusters of correlated instances, not on isolated cases.

This group-based training enables the analysis of algorithmic specialization and generalization, where the goal is not only to achieve minimal error on individual benchmarks, but to design algorithms capable of adapting to structural similarities within instance groups. Consequently, while the global average relative error (1.68% for MKPA15) is slightly higher than the best published results, it represents a contextually optimized and structurally adaptive performance under a novel evolutionary and cluster-guided paradigm. This distinction highlights the conceptual contribution of our work: the automatic design of algorithms specialized for groups of problem instances, rather than fine-tuned heuristics for single benchmarks.

Comments 8: Having many special-case algorithms for the same problem is not necessarily a plus.

Response 8: We thank the reviewer for this pertinent observation. We fully agree that the existence of many specialized algorithms for a single problem does not automatically constitute an advantage unless their diversity translates into measurable improvements in performance or generalization. In our study, the goal of generating multiple algorithms is not to increase the quantity of solutions, but to analyze how structural specialization guided by clustering affects the adaptability and efficiency of automatically generated algorithms. Each algorithm represents an emergent adaptation to a subset of structurally coherent instances, and the comparative analysis among them provides insight into the relationship between instance characteristics and algorithmic behavior. Ultimately, this approach contributes to identifying common structural patterns that can inform the synthesis of more general, transferable algorithms in future research stages. The changes are on Pag 37, paragraph 3, line 1194.

[Marked in red in the manuscript]

“It is important to emphasize that the generation of multiple specialized algorithms does not aim to promote the proliferation of ad hoc solutions for the same problem, but rather to understand how the structure of instances influences the behavior and performance of automatically generated algorithms. In this context, each algorithm represents an evolutionary adaptation to a subset of instances with similar structural characteristics, allowing for the analysis of the relationship between the properties of the instance space and the effectiveness of the search process. This perspective does not focus on the number of algorithms produced but on their analytical value in advancing toward the development of more generalizable and transferable algorithms across different optimization domains (Koza, 1992; Silva-Muñoz et al., 2023; Acevedo et al., 2020).”

Comments 9: By using group statistical testing we can tell whether there are significant group differences, however, it can not be known with these tests how did the algorithms perform. Thus, such an analysis is flawed.

Response 9: We thank the reviewer for this valuable comment. We agree that group statistical tests alone cannot reveal how individual algorithms performed or explain the nature of those differences. In this study, statistical tests were used strictly to confirm whether performance variations among algorithm groups were statistically significant. The interpretation of how the algorithms performed was derived from the comparative analysis of their average relative errors across clusters and methods (K-Means, HDBSCAN, and random grouping). Thus, the statistical results were complemented with a detailed examination of algorithmic outcomes, ensuring that conclusions were based on both significance testing and direct performance evaluation. This clarification has been added to the revised manuscript to make explicit that the statistical analysis was confirmatory rather than descriptive. The changes are on Pag 37, paragraph 3, line 1194.

[Marked in red in the manuscript]

“It is important to clarify that the group-level statistical tests applied in this study were not intended to describe the individual behavior of the algorithms, but rather to verify the existence of significant differences among the groups formed by each clustering method. The interpretation of algorithmic behavior was based on the comparison of the average relative error values obtained for each cluster and clustering approach. In this way, the statistical analysis served as a complementary validation tool to the comparative performance analysis, ensuring that the conclusions reflect both statistical significance and the empirical evidence observed.”

Reviewer 3 Report

Comments and Suggestions for Authors

The authors propose a hybrid genetic programming–based approach with instance clustering, Automatic Generation of Algorithms (AGA) for solving Multidimensional Knapsack-type problems. My comments are as follows:

The paper addresses single-objective MKPs. Please indicate and discuss the potential for extending the proposed approach to multiobjective problems.
The authors note that many evolved algorithms generated by the proposed approach lose effectiveness on structurally dissimilar instances. Please include a discussion of potential strategies to mitigate this weakness, along with the trade-offs of each strategy.
In Section 4, some discussion is provided regarding the robustness of the approach when handling problems with multiple uncertainties, noisy data, and large-scale/multiconstraint instances. Please provide potential computational techniques or approaches that could enhance robustness in these scenarios.
Please elaborate on the factors influencing the convergence of the relative error shown in Figure 2, so that other researchers attempting to reproduce this work are aware of potential convergence issues. In connection with this, discuss the key elements in AGA that improve convergence compared to other approaches.
Please include a discussion on the risk of the proposed method becoming trapped in local optima (i.e., stagnation). While the approach employs a mutation operator, further detail is needed on how effective and robust this operator is in preventing stagnation.

Thank you.

Author Response

Dear Reviewer,

Comments 1: The paper addresses single-objective MKPs. Please indicate and discuss the potential for extending the proposed approach to multiobjective problems.

Response 1: We appreciate the reviewer’s insightful comment. The current study focuses on the single-objective formulation of the MKP; however, the proposed clustering-guided AGA framework can be naturally extended to multiobjective optimization. In the revised version, we have incorporated a discussion in the final paragraph of the Future Research section highlighting this potential extension. Specifically, we note that the framework could evolve algorithms capable of balancing multiple conflicting objectives—such as maximizing profit while minimizing total weight or resource dispersion—by integrating Pareto-based performance indicators and adapting the clustering process to capture relationships among objectives. This addition clarifies the generalizability of the proposed approach to multiobjective scenarios and outlines a promising direction for future work. The changes are on page 38, paragraph 6, line 1231.

[Marked in red in the manuscript]

“As future lines of research, we propose to delve deeper into the interaction between heuristic and exact methods for solving the MKP. In particular, we will seek to determine whether it is necessary to resort to computationally demanding exact methods to solve the MKP instances fully, or whether it is possible to achieve optimal solutions by initially considering only a subset of items. This strategy would involve solving a part of the problem using metaheuristics and then applying exact methods to refine the initial solution. This combination would significantly reduce computational cost while maintaining solution quality. This approach challenges traditional methodologies and seeks to advance the understanding of the trade-off between computational efficiency and accuracy in complex optimization problems. Additionally, we propose to investigate mechanisms for knowledge transfer between specialized algorithms by building hybrid models that integrate efficient substructures from different groups of instances. Finally, we recommend validating this methodology in other NP-Hard problems and extending its application to dynamic or continuous flow contexts, where the structural characteristics of the instances vary over time. Moreover, extending the proposed clustering-guided AGA framework to multi-objective formulations of the MKP represents a promising direction for future research. In this context, the algorithms could evolve to balance multiple conflicting objectives—such as maximizing profit while minimizing weight or resource dispersion—by incorporating Pareto-based performance indicators and adapting the clustering process to more effectively capture the interrelationships among objectives.

Comments 2: The authors note that many evolved algorithms generated by the proposed approach lose effectiveness on structurally dissimilar instances. Please include a discussion of potential strategies to mitigate this weakness, along with the trade-offs of each strategy.

Response 2: We appreciate the reviewer’s insightful comment. We agree that some evolved algorithms generated by the proposed approach may lose effectiveness when applied to structurally dissimilar instances. In response, we have expanded the Conclusions section to discuss this limitation and possible strategies to mitigate it. Specifically, we now mention the integration of meta-learning or transfer mechanisms to enable knowledge sharing between specialized algorithms, as well as the incorporation of ensemble learning techniques in which multiple evolved algorithms cooperate through voting, weighting, or hierarchical selection to improve generalization. We also discuss the possibility of periodically re-evolving the algorithms using mixed-instance sets to maintain adaptability over time. These strategies involve trade-offs between specialization, generalization, and computational efficiency, and their inclusion strengthens the discussion on how the proposed framework can be made more robust in future work. These strategies and their trade-offs (between specialization, generalization, and computational efficiency) have been summarized and incorporated into the Discussion section of the revised version. The changes are on page 37, paragraph 4, line 1204 and page 39, paragraph 2, line 1261.

[Marked in red in the manuscript]

“The implications of these results are multiple. First, it reinforces the value of structural instance analysis as a critical phase in the automatic design of algorithms. This strategy not only improves performance in terms of relative error but also facilitates the generation of more compact and understandable syntactic trees, which enhances the interpretability of the generated models. Second, it establishes a replicable framework that can be extended to other combinatorial optimization problems with high structural variability, such as VRP or multiply constrained scheduling problems.”

“Cross-cluster evaluation revealed a clear trade-off between specialization and generalization: many evolved algorithms perform substantially better on their training clusters but lose effectiveness on structurally dissimilar instances. Nevertheless, several automatically generated algorithms are competitive with established, human-designed heuristics on selected benchmark sets, supporting the practical value of the clustering-AGA paradigm for cluster-specific deployment.”

Comments 3: In Section 4, some discussion is provided regarding the robustness of the approach when handling problems with multiple uncertainties, noisy data, and large-scale/multiconstraint instances. Please provide potential computational techniques or approaches that could enhance robustness in these scenarios.

Response 3: We thank the reviewer for this valuable suggestion. We agree that enhancing robustness under uncertainty, noisy environments, and large-scale or multiconstraint instances is a crucial direction for improving the proposed framework. In the revised version, we have expanded the discussion in Section 4 to outline several computational strategies that could strengthen robustness. These include (1) incorporating stochastic optimization and Monte Carlo sampling to evaluate algorithm stability across noisy or uncertain conditions; (2) using surrogate modeling or dimensionality reduction (e.g., autoencoders, PCA) to efficiently handle high-dimensional or multi-constraint spaces; and (3) applying parallel and distributed evolution to maintain performance scalability in large problem instances. These strategies can mitigate sensitivity to noise and uncertainty while preserving computational efficiency, thus extending the applicability of the clustering-guided AGA to more complex real-world scenarios. The changes are on page 36, paragraph 5, line 1156 and page 37, paragraph 5, line 1211.

[Marked in red in the manuscript]

“However, this study presents certain limitations. The selection of functions and terminals, although based on heuristics established in the MKP literature, can restrict the exploration of the algorithmic space if adaptive expansion mechanisms are not incorporated. Furthermore, the performance evaluation focuses exclusively on relative error, omitting relevant metrics such as execution time, robustness to noise, and scalability to higher-dimensional instances. Moreover, instance segmentation is based on statistical variables derived from normalized matrices, which could be complemented with richer representations, such as learned embeddings or nonlinear dimensionality reduction techniques. Moreover, a limitation observed in this study is that some evolved algorithms lose effectiveness when applied to structurally dissimilar instances. To mitigate this issue, several strategies can be explored, including the integration of meta-learning or transfer mechanisms that enable knowledge exchange among specialized algorithms, as well as the incorporation of ensemble learning techniques, in which multiple evolved algorithms cooperate through voting, weighting, or hierarchical selection to improve generalization. Although these strategies increase computational cost, they enhance the robustness and adaptability of the proposed framework. Additionally, the periodic re-evolution of algorithms using mixed sets of instances can maintain flexibility over time, albeit requiring additional evolutionary cycles. Each of these approaches involves a trade-off between specialization, generalization, and computational efficiency, offering promising directions for future improvements to the proposed system.”

"To further enhance robustness against uncertainty, noise, and large-scale or highly constrained scenarios, several computational strategies could be integrated into the proposed framework. One possibility is to incorporate stochastic optimization or Monte Carlo sampling to evaluate the stability of the algorithms under different realizations of uncertainty, thereby ensuring consistent performance. Another promising approach involves employing dimensionality reduction techniques, such as autoencoders or Principal Component Analysis (PCA), to decrease computational complexity while preserving relevant structural information. Finally, parallel and distributed evolutionary schemes could be implemented to improve scalability and maintain convergence efficiency in large or highly constrained instances. These extensions would allow the proposed system to better adapt to noisy and uncertain environments, thereby increasing its practical applicability to real-world optimization problems."

Comments 4: Please elaborate on the factors influencing the convergence of the relative error shown in Figure 2, so that other researchers attempting to reproduce this work are aware of potential convergence issues. In connection with this, discuss the key elements in AGA that improve convergence compared to other approaches.

Response 4: Thank you for this valuable comment. We agree that clarifying the factors influencing the convergence of the relative error is important for ensuring reproducibility and transparency. Therefore, we have expanded the discussion in the revised manuscript to explain in detail the main parameters and mechanisms that affect convergence in the proposed AGA framework. Specifically, we now highlight how population size, selection pressure, and mutation rate jointly determine the stability and speed of convergence, noting that excessive selection pressure or limited structural diversity can cause premature convergence, while a balanced trade-off between exploration and exploitation promotes continuous improvement. In addition, we describe how the cluster-guided structure of AGA accelerates convergence by focusing the evolutionary process on groups of structurally similar instances, reducing fitness variance and promoting consistent adaptation. Finally, we emphasize that the syntactic tree representation and the relative-error-based fitness function provide a strong selective gradient that drives the evolution toward effective and generalizable algorithmic configurations. These elements collectively explain the faster and more stable convergence of the proposed approach compared to traditional evolutionary or heuristic methods. The changes are on page 36, paragraph 3, line 1134 – 1146.

[Marked in red in the manuscript]

“The convergence of the relative error observed in Figure 2 is influenced by several factors inherent to the evolutionary process. Population size, selection pressure, and mutation rate play a central role in determining the stability and speed of convergence. Excessive selection pressure or insufficient diversity among algorithmic structures may lead to premature convergence, whereas a balanced combination of exploration and exploitation enables sustained improvements across generations. Moreover, the cluster-guided structure of the AGA enhances convergence by focusing the evolutionary search within groups of structurally similar instances, thereby reducing fitness variance and facilitating more consistent adaptation. The use of a syntactic tree representation and a fitness function based on relative error further supports convergence by providing a selective gradient toward more effective and generalizable algorithmic configurations. Collectively, these elements explain the faster and more stable convergence of the proposed approach compared to traditional evolutionary or heuristic methods.”

Comments 5: Please include a discussion on the risk of the proposed method becoming trapped in local optima (i.e., stagnation). While the approach employs a mutation operator, further detail is needed on how effective and robust this operator is in preventing stagnation.

Response 5:

We express our gratitude to the reviewer for this insightful observation. We acknowledge the importance of addressing the risk of the proposed method becoming trapped in local optima (stagnation) during the evolutionary process and the need to elaborate on the effectiveness and robustness of the mutation operator in preventing this issue. In the revised manuscript, we have incorporated a specific discussion on this risk in Section 5 (Discussion), where we explain that the mutation operator mitigates stagnation by introducing random modifications to the syntactic trees, thereby promoting the exploration of new algorithmic structures and preventing premature convergence to suboptimal regions. However, we also recognize that the fixed mutation rate may limit the ability to escape stagnation in highly constrained or homogeneous clusters. To address this limitation, we propose future extensions, such as adaptive mutation mechanisms or hybrid strategies that periodically introduce diversity, enhancing robustness and ensuring sustained evolutionary progress. This discussion provides clarity on the effectiveness of the mutation operator and highlights directions for strengthening the AGA framework against stagnation. The changes are on page 36, paragraph 4, line 1147 – 1155.

[Marked in red in the manuscript]

“A potential risk of the proposed approach is stagnation or entrapment in local optima during the evolutionary process. The mutation operator helps mitigate this risk by introducing random modifications in the syntactic trees, thereby maintaining the exploration of new algorithmic structures and preventing premature convergence toward suboptimal regions. However, the current implementation employs a fixed mutation rate, which may limit its ability to escape stagnation in highly constrained or homogeneous clusters. Future extensions could explore adaptive mutation mechanisms or hybrid strategies that periodically introduce diversity, thereby enhancing robustness and ensuring sustained evolutionary progress.”

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

A hybrid fromwork that integrates clustering to produce specialized algorithm for classes of MKP has been presented.

Compared to the previous version, the article has been improved with more explanation, including more clarify on the encoding through matrices E and F. More work on LLM has also been included.

The conclusion is that fixing the cluster to be 11. K-means is the top performers. A natural extension now is what if you change the number of cluster. would it be better or worse? Is it only applicable for this particular MKP dataset? I believe this can be for a future work.

For future work, it might also be interesting to explore other encoding of the problems and see if there is a better encoding.

Reviewer 3 Report

Comments and Suggestions for Authors

In my opinion, the manuscript has been revised based on my comments and it is suitable for publication.

Thank you.

Article Menu

Clustering-Guided Automatic Generation of Algorithms for the Multidimensional Knapsack Problem

Further Information

Guidelines

MDPI Initiatives

Follow MDPI