1. Introduction
Global optimization plays a central role in scientific discovery, engineering design, and decision-making under uncertainty [
1]. It involves finding the best possible solution in a non-finite, often non-convex search space where multiple local optima may exist and gradient information is typically unavailable or unreliable [
2]. Due to their flexibility and generality, metaheuristic algorithms, such as genetic algorithms (GAs), particle swarm optimization (PSO), differential evolution (DE), evolution strategies (ESs), simulated annealing (SA) and Covariance Matrix Adaptation Evolution Strategy (CMA-ES), have become indispensable tools for tackling these challenges [
3]. Metaheuristics are particularly well suited for black-box optimization and multi-modal landscapes, and have demonstrated success across a wide range of applications [
4,
5].
Despite their practical utility, metaheuristics face several limitations. Their performance often depends heavily on parameter settings [
6], and they may struggle to maintain a good balance between exploration and exploitation [
7,
8]. Furthermore, they tend to be sample-inefficient, especially in highly dimensional or computationally expensive scenarios [
9,
10]. These challenges have prompted growing interest in data-driven and adaptive approaches that can augment or guide heuristic search [
11,
12].
In parallel, machine learning (ML) has emerged as a powerful paradigm for pattern recognition, function approximation, and decision-making [
13,
14]. Its ability to learn from data and generalize across instances offers new opportunities to enhance the design, control, and adaptability of metaheuristic algorithms [
15]. Recent research has shown that ML techniques can be employed to learn effective operators [
16], model objective functions [
17], analyze fitness landscapes [
18], and even design new optimization strategies altogether [
19]. This convergence of learning and search has led to the emergence of a new class of methods, often referred to as “learning-enhanced” or “intelligent” metaheuristics [
12,
20].
This review aims to provide a comprehensive and structured overview of the integration of machine learning with metaheuristic optimization, with a particular focus on global optimization problems. We begin by presenting foundational concepts from both metaheuristics and machine learning, emphasizing the aspects most relevant to their integration. We then propose a taxonomy that categorizes the various ways in which ML can support or enhance metaheuristic search. Following this, we examine different ML techniques, including supervised learning, unsupervised learning, reinforcement learning, and meta-learning, and discuss how each contributes to the advancement of global optimization strategies.
It is also important to situate ML-enhanced metaheuristics within the broader landscape of AI-based optimization schemes. Recent surveys in industrial informatics emphasize how reinforcement learning, deep neural models, and knowledge-guided AI optimizers are increasingly applied to scheduling, planning, and control tasks in real-world settings [
21]. Similarly, applications in power and energy systems demonstrate the impact of AI-driven optimization on large-scale, safety-critical infrastructures [
22]. Compared to these application-focused approaches, our taxonomy highlights the methodological role of machine learning within metaheuristics themselves, emphasizing how learning can adapt operators, guide surrogates, control parameters, and enable self-improving optimizers. This complementary perspective reflects the advantages of ML-enhanced metaheuristics for advancing global optimization theory while remaining relevant for practical domains.
Throughout the paper, we highlight representative examples, identify emerging trends, and analyze critical challenges in the field. This review is intended as a narrative survey rather than a systematic review. Works were selected based on their influence and representativeness in the context of machine learning-enhanced global metaheuristics, drawing primarily from leading journals (e.g., IEEE Transactions on Evolutionary Computation, Swarm and Evolutionary Computation) and conferences (e.g., GECCO, CEC), as well as recent preprints. The emphasis is on illustrating integration approaches and emerging trends through representative studies.
By synthesizing developments across these rapidly evolving domains, this review seeks to serve both as a resource for researchers entering the field and as a roadmap for advancing the design of intelligent optimization algorithms.
2. Background
2.1. Metaheuristics for Global Optimization
Global optimization refers to the task of locating the best possible solution in complex search spaces where objective functions may be non-convex, non-differentiable, noisy, or expensive to evaluate [
1,
2]. In such contexts, traditional exact methods are often impractical, necessitating the use of approximate algorithms that can efficiently explore the space and converge toward high-quality solutions. Metaheuristics are a broad class of optimization algorithms inspired by natural and physical processes, including evolution (e.g., genetic algorithms), social behavior (e.g., particle swarm optimization), or thermodynamic analogies (e.g., simulated annealing) [
3,
5]. Their success lies in their general-purpose nature, simplicity of implementation, and ability to operate in black-box settings.
Central to all metaheuristics is the interplay between
exploration and
exploitation. Exploration refers to the global search behavior that prevents premature convergence by probing new regions of the solution space. Exploitation, by contrast, focuses on intensifying the search in promising areas already identified [
7]. A well-designed algorithm must balance these forces to avoid stagnation in local optima while making steady progress toward convergence [
23].
Convergence denotes the process by which a population or trajectory narrows around optimal (or suboptimal) regions over time. However, rapid convergence can be detrimental if it sacrifices solution quality or diversity too early in the search [
6,
24].
Diversity maintenance is thus a critical concept, ensuring that candidate solutions remain sufficiently varied to escape local basins and discover new optima. Another major consideration is
scalability. While many metaheuristics perform well on low- to moderate-dimensional problems, their efficiency and effectiveness often degrade significantly as the dimensionality of the problem increases, a phenomenon sometimes referred to as the “curse of dimensionality” [
9,
10].
Despite their practical appeal, the performance of metaheuristics can be highly problem-dependent, requiring significant expertise to fine-tune algorithmic behavior for new domains. This sensitivity stems from their reliance on fixed heuristics and hand-crafted rules, which may fail to generalize across problem instances or adapt to changing landscape characteristics [
6]. Moreover, their stochastic nature introduces variability in results, often necessitating multiple runs to obtain statistically meaningful conclusions [
25].
Another challenge arises in the design of search operators such as crossover, mutation, and perturbation schemes, which typically encode prior assumptions about the search space. When these assumptions are misaligned with the problem structure, the search may become inefficient or misguided [
26]. Similarly, population-based methods can suffer from selection pressure dynamics: excessive pressure may lead to loss of diversity, while insufficient pressure can delay convergence [
27]. Striking the right balance is difficult without mechanisms that adapt to the evolving state of the search.
Metaheuristics are also inherently
agnostic to problem structure. Unlike gradient-based methods that exploit analytic properties (e.g., Lipschitz continuity or smoothness), metaheuristics do not leverage domain-specific information unless it is manually incorporated. This generality is both a strength and a weakness—enabling black-box optimization, but potentially ignoring exploitable regularities in the problem landscape [
28].
To address these issues, recent research has turned to machine learning as a means to imbue metaheuristics with learning capabilities. Instead of relying solely on manually designed rules and fixed control schemes, ML methods allow metaheuristics to learn from data—whether from prior optimization runs, ongoing search behavior, or collections of problem instances. This enables a shift from static to adaptive and even predictive search strategies. For example, regression models or neural networks can learn mappings from candidate solutions to objective values, serving as surrogates for expensive evaluations [
29]. Clustering and dimensionality reduction techniques can uncover latent structure in the search space, facilitating more efficient navigation [
30]. Reinforcement learning can be used to dynamically select operators or control hyperparameters in response to feedback from the search trajectory [
31,
32].
2.2. Relevant Machine Learning Foundations
The integration of machine learning with metaheuristic optimization relies on several foundational paradigms, each offering distinct capabilities for learning from data and improving search behavior. This section outlines the conceptual tools that underpin the enhancement of metaheuristics, with emphasis on the roles these tools play in guiding, modeling, and adapting the optimization process.
Supervised learning provides the framework for learning mappings from inputs to outputs based on labeled data. In the context of optimization, this paradigm is often used to predict objective values, model search landscapes, or classify candidate solutions as promising or unpromising based on historical evaluations [
11,
17]. Regression models such as Gaussian processes, neural networks, and ensemble methods are commonly employed in surrogate modeling [
29,
33], while classification approaches can be used to construct decision boundaries or feasibility predictors [
20]. Supervised learning thus serves as a critical component in fitness approximation, model-based algorithm selection, and landscape-aware control.
Unsupervised learning focuses on uncovering latent structure in data without the need for labels. In metaheuristics, clustering algorithms are used to group similar solutions for adaptive population management, niching, or archive pruning [
31,
34], thereby promoting diversity and reducing redundancy. Dimensionality reduction techniques, such as principal component analysis (PCA), autoencoders, and other manifold learning methods, can aid scalability in some high-dimensional optimization problems by simplifying the search landscape. These tools facilitate latent space modeling and allow for exploration in reduced representations, which can significantly improve scalability in high-dimensional optimization problems [
30,
35].
Reinforcement learning (RL) formalizes decision-making under uncertainty by learning policies that map states to actions in a sequential manner, driven by reward signals [
13]. Within metaheuristics, RL is increasingly applied to control operator selection, parameter tuning, and phase-switching strategies [
31,
32,
36]. Unlike supervised methods, RL does not require pre-collected labels but learns from interaction with the environment, making it particularly suitable for adaptive online control during optimization. Techniques such as Q-learning, policy gradient methods, and deep reinforcement learning (DRL) are employed to manage exploration–exploitation trade-offs dynamically and autonomously [
37].
Meta-learning, or “learning to learn”, extends the adaptability of ML by training models to generalize across distributions of tasks. In the context of global optimization, meta-learning enables optimizers to transfer knowledge between problem instances, automatically configure algorithm parameters, or even synthesize new search strategies [
20,
38,
39]. Bi-level optimization and recurrent architectures such as Long-Short-Term Memory neural networks (LSTMs) or Transformers are often used to encode meta-learning processes [
40]. This paradigm supports the development of self-improving, problem-agnostic optimizers that can adapt to new tasks with minimal data, improving sample efficiency and generalization.
Beyond these core paradigms, several cross-cutting methodologies play a foundational role in ML-enhanced metaheuristics. Among them,
surrogate modeling and
representation learning stand out for their versatility and impact. These techniques are not tied to a single learning paradigm but are applied across supervised, unsupervised, and reinforcement learning contexts. Both serve as enablers of data-efficient optimization and structure-aware search, and are integral to many of the most effective hybrid algorithms [
17,
29].
Integrating surrogate models effectively requires not only accurate predictions but also mechanisms for uncertainty quantification and model management throughout the optimization process [
41,
42]. Meanwhile, representation learning focuses on constructing compact, structured, or task-adaptive encodings of candidate solutions. In many real-world problems, raw decision variables may be redundant or ill-conditioned; learning more meaningful feature spaces can facilitate better sampling, more robust constraint handling, and improved generalization across instances [
30,
43].
Finally, optimization problems frequently arise in
non-stationary environments, where the underlying objective function, constraints, or data distributions may shift over time. These changes may be gradual, abrupt, cyclical, or even adversarial in nature, and they are common in real-world scenarios such as adaptive control systems, dynamic supply chains, financial modeling, and user-centered recommender systems. In such settings, optimization algorithms must not only identify high-quality solutions but also continually adapt their strategies as the problem landscape evolves. Standard metaheuristics are typically designed under the assumption of stationarity, which limits their ability to respond effectively to concept drift or structural changes in the optimization task. To overcome this,
continual learning and
online learning methods have been introduced to endow the optimizer with a memory of past experiences and mechanisms for adaptation. Continual learning aims to incrementally acquire knowledge over time while avoiding catastrophic forgetting, often through mechanisms such as elastic weight consolidation, memory replay, or dynamic network expansion [
44]. Online learning, in contrast, allows the model to update its parameters immediately based on new data, enabling real-time responsiveness and improved tracking of environmental changes.
Within ML-enhanced metaheuristics, these principles have been operationalized in several ways. For instance, adaptive parameter control mechanisms that use reinforcement learning can adjust mutation rates or population sizes in response to observed performance shifts during the search [
36,
45]. Similarly, clustering-based controllers or memory-enhanced surrogate models can detect transitions in the landscape and selectively update strategies or retrain approximators [
46,
47]. By integrating continual and online learning paradigms, metaheuristics become more than static optimizers; they evolve into adaptive agents capable of sustained performance in dynamic, uncertain, and information-rich domains. Together, these machine learning foundations form the theoretical and computational basis for enhancing metaheuristic optimization. They not only increase the adaptivity and intelligence of traditional methods but also open the door to new algorithmic paradigms where learning is tightly integrated with search.
3. Taxonomy of Integration Approaches
The integration of machine learning with metaheuristic optimization has led to a range of hybrid strategies aimed at improving search performance, adaptability, and scalability. These approaches differ based on the role that learning plays within the metaheuristic loop. In this section, we propose a taxonomy based on the functional role of ML in the optimization process.
We note that the categories in this taxonomy are not mutually exclusive, since many hybrid methods combine multiple learning roles (e.g., a surrogate that both approximates the objective and adapts operators). To establish clear boundaries, we assign approaches according to their primary functional role in the optimization loop: if the central purpose is to replace or approximate evaluations, the method is categorized under surrogate modeling; if the focus is on adjusting search dynamics, it belongs to adaptive operators or parameter control; and if the main goal is to generalize across tasks, it falls under meta-learning. Hybrid cases are acknowledged, but classification by dominant role provides a consistent organizing principle. This ensures that each subsection highlights a coherent functional contribution, while still recognizing natural cross-couplings across categories.
3.1. Learning or Adapting Search Operators
In traditional metaheuristics, the behavior of search operators—such as mutation, crossover, or neighborhood perturbation—is typically fixed or controlled by static heuristics [
4]. While effective in many cases, these rigid designs cannot adapt to the changing demands of the search, where different phases or objective function features may favor different operator behaviors. To address this limitation, researchers increasingly apply machine learning techniques that allow metaheuristics to learn or adapt operator behavior dynamically during optimization. Although operator adaptation sometimes uses surrogate predictions or landscape features, we classify these methods here because their primary role is to guide operator choice and behavior.
A prominent direction is the use of reinforcement learning to manage a portfolio of operators. Here, the metaheuristic is modeled as an agent interacting with the fitness landscape, with operators treated as actions chosen based on past success. For instance, Ming et al. [
32] introduced a deep RL-assisted operator selection framework for constrained multi-objective optimization. Using population convergence, diversity, and feasibility as states, and a deep Q-network to estimate operator values, their method dynamically selected operators that improved population quality. Embedded into several constrained evolutionary algorithms, the framework outperformed traditional fixed or random strategies across benchmarks.
Similarly, Durgut et al. [
31] proposed an RL-based adaptive operator selection for the Artificial Bee Colony (ABC) algorithm, combining Q-learning with clustering to map problem states to effective operators. Applied to the Set Union Knapsack Problem, this approach significantly outperformed state-of-the-art methods, underscoring the value of adaptive learning in discrete and binary optimization.
Beyond operator choice, some methods adapt operator behavior itself. Bolufé-Röhler and Luke [
48] enhanced an Estimation of Distribution Algorithm (EDA) with a machine-learned controller for sampling. By extending the
Thresheld Convergence mechanism with a trained classifier, the algorithm learned when to tighten or relax sampling thresholds, improving the exploration–exploitation balance. This data-driven strategy consistently outperformed fixed-threshold and conventional EDA implementations on Congress on Evolutionary Computation (CEC) benchmarks.
Although this review focuses on global optimization, similar approaches appear in combinatorial settings. Karimi-Mamaghan et al. [
34] proposed Q-learning for operator selection in permutation flowshop scheduling, while Johnn et al. [
49] developed the Graph Reinforcement Learning for Operator Selection (GRLOS) framework, combining RL and graph neural networks to guide operator choice in Adaptive Large Neighborhood Search for vehicle routing. These works, though targeting discrete problems, illustrate how ML can enhance adaptive decision-making across metaheuristics and offer inspiration for global optimization research.
Adaptive operator strategies bring several benefits. They improve performance by learning which operators are most effective in different stages or problem types, add flexibility by transferring across metaheuristic frameworks, and reduce manual effort by automating operator schedules [
50]. However, challenges remain. ML-based adaptation introduces computational overhead, complicates credit assignment in stochastic search, and raises open questions about the generalization of learned strategies across problem instances, particularly in black-box optimization [
50].
Theoretical insights into adaptive operator selection are relatively limited, though simplified models of reinforcement learning controllers have been analyzed under stochastic approximation theory, establishing asymptotic stability under certain conditions [
6]. Yet, rigorous bounds on convergence rates, generalization across problem classes, or robustness of deep RL-based operator controllers remain lacking, leaving most advances empirically validated rather than formally guaranteed.
3.2. Surrogate Modeling for Fitness Approximation
One of the most widely studied strategies for enhancing metaheuristic optimization, particularly for computationally expensive global optimization problems, is the use of surrogate models, also known as meta-models or approximate models. Surrogates are trained to approximate the true objective or constraint functions, allowing the optimizer to explore the search space efficiently while reducing the number of expensive exact evaluations. They are especially valuable in domains such as engineering design, multi-objective optimization, and black-box global optimization, where each evaluation may require significant simulation or experimental resources. Although surrogates may indirectly influence parameter or operator adaptation, we categorize them here because their primary role is to approximate fitness evaluations.
In the context of evolutionary computation, surrogate models are typically integrated either as replacements for direct fitness evaluation or as guides to focus exploration. A foundational contribution in this area is the framework by Jin et al. [
29], which introduced evolution control strategies to manage the interplay between surrogate predictions and real evaluations. Their work demonstrated that without appropriate control, approximate models can mislead the optimization process, particularly when approximation errors introduce false optima. The proposed solution combines individual-based and generation-based control mechanisms, ensuring correct convergence while significantly reducing computational effort.
Lim et al. [
51] extended this idea by generalizing surrogate-assisted evolutionary computation, proposing memetic algorithms that run parallel local searches supported by ensemble surrogates. This approach leverages both the strengths and imperfections of surrogates, turning the “curse of uncertainty” into a “blessing of uncertainty” by smoothing rugged landscapes to aid global exploration.
Recent advancements further illustrate the versatility and performance of surrogate-assisted techniques. Yu et al. [
41] proposed a two-stage dominance-based surrogate-assisted evolutionary algorithm (TSDEA) tailored for high-dimensional, expensive multi-objective optimization problems. Their method combines radial basis function (RBF) models with an angle-penalized distance mechanism to balance convergence and diversity, outperforming several state-of-the-art competitors on large-scale benchmarks.
Another notable contribution is the adaptive switching framework by Chung et al. [
52], which alternates between global and local search phases using a weighted maximin distance metric for exploration and multi-start gradient optimization for exploitation. Tested on both synthetic benchmarks and real-world engineering problems, this approach demonstrated superior robustness and efficiency compared to traditional surrogate-assisted schemes.
Machine learning models have also been integrated into surrogate-based global optimization frameworks. Bertsimas and Margaritis [
53] developed a machine learning-enhanced mixed-integer optimization (MIO) framework that embeds surrogates such as decision trees, gradient boosted trees, neural networks, and support vector machines directly into MIO formulations. With adaptive sampling and robust optimization enhancements, their method improved feasibility and optimality, showing competitive performance against commercial solvers like BARON.
A novel metaheuristic approach, Landscape-Sketch-Step (LSS), was introduced by Monteiro and Sau [
54], combining reinforcement learning and stochastic search without explicitly constructing surrogate models. Instead, it builds a dynamic state-value function to guide exploration, achieving promising performance on rugged low-dimensional landscapes.
While empirical progress is notable, theoretical understanding remains limited. Formal analyses confirm that uncontrolled surrogate error can bias search and mislead convergence, with some guarantees available for Gaussian process models under smoothness assumptions [
1,
17]. However, these results rarely extend to modern data-driven or ensemble surrogates in high-dimensional, noisy, or multi-objective settings. Developing theoretical tools that balance approximation bias with exploration efficiency remains an open frontier.
Collectively, these studies highlight the growing sophistication of surrogate-assisted metaheuristics, encompassing diverse strategies such as ensemble modeling, adaptive control, local-global hybridization, and direct ML-model embedding. For a broader synthesis of the field, including applications to dynamic, constrained, and multi-modal optimization, we refer readers to the survey by Jin [
17], which outlines key developments, challenges, and future directions in surrogate-assisted evolutionary computation.
3.3. Adaptive Parameter Control
One of the most influential lines of research in integrating machine learning with metaheuristics is the development of adaptive parameter control strategies. These approaches aim to dynamically adjust critical parameters such as mutation rates, crossover probabilities, or cooling schedules—based on feedback from the search process itself rather than fixed or manually tuned values. This dynamic adjustment improves efficiency, robustness, and adaptability, particularly in high-dimensional or multimodal optimization problems. While adaptive parameter control may interact with operator selection or even surrogate guidance, its primary role is the dynamic adjustment of algorithmic parameters during the search.
Early pioneering work in this area includes Eiben et al. [
45], who proposed using reinforcement learning to guide parameter adjustments online in evolutionary algorithms. They modeled parameter control as a sequential decision-making problem, where an RL agent learns to adjust settings such as mutation rates or population sizes to maximize progress. This work laid foundational concepts for online learning control mechanisms.
Recent advances have significantly expanded these ideas. Tessari and Iacca [
36] presented a general framework for RL-based adaptive metaheuristics, demonstrating how models such as Q-learning can be embedded within metaheuristics to adapt operators and parameters in real time. Similarly, Reijnen et al. [
55] applied deep reinforcement learning to control key parameters of differential evolution in multi-objective optimization. Their DRL-APC-DE framework adaptively set scaling factors and crossover probabilities, outperforming static DE configurations on benchmarks and generalizing well to unseen tasks.
In a broader review, Karafotias et al. [
6] highlighted the benefits of adaptive control over static tuning: (1) tailoring parameter values to different search phases (e.g., exploration early, exploitation later), (2) resilience to dynamic or non-stationary fitness landscapes, and (3) reduced manual effort, since adaptive mechanisms self-regulate without extensive pre-tuning. They also emphasized challenges such as adaptation overhead and risks of overfitting to particular instances.
Complementing these adaptive methods are offline tuning approaches such as irace [
56], which uses iterated racing to identify strong static configurations before runtime. While effective, such offline tuning cannot replace online adaptability in dynamic environments.
More recently, Tatsis and Ioannidis [
46] proposed a cluster-based online parameter control method, where ML techniques group similar search states and apply tailored parameter updates, enhancing adaptability across heterogeneous landscapes. This data-centric approach aligns with Bolufé-Röhler and Han [
47], who advocate using ML to extract patterns in search dynamics to guide parameter decisions.
Comprehensive surveys such as Huang et al. [
57] and Talbi [
12] review both offline (parameter tuning) and online (parameter control) strategies, noting that while offline tuning is easier to implement, online adaptive approaches hold greater promise for generalization and robustness across domains.
Parameter control is somewhat more theoretically tractable than other categories because the adaptation rules often reduce to low-dimensional stochastic processes. Reinforcement-based controllers have been analyzed under stochastic approximation theory, showing that certain update schemes converge asymptotically to stable equilibria or optimal parameter values under simplified assumptions [
6]. Similar analyses exist for cooling schedules in simulated annealing and mutation-rate adaptation in evolution strategies, where convergence rates can be formally bounded under idealized conditions [
1,
2]. However, these results typically rely on smoothness or stationarity assumptions that rarely hold in practical, high-dimensional, noisy optimization. For modern controllers based on deep reinforcement learning or cluster-driven adaptation [
46], rigorous guarantees are still lacking. This creates a widening gap between empirical success and provable reliability, especially in settings where non-stationary landscapes, limited budgets, and stochastic evaluation noise make theoretical tools much harder to apply. Bridging this gap will likely require new analytical frameworks that combine elements of stochastic control, online learning theory, and non-asymptotic analysis.
Although considerable progress has been made, open challenges remain. These include designing generalizable controllers transferable across problem types, ensuring sample efficiency when learning during optimization, and integrating uncertainty modeling for noisy or partial feedback. Future directions include hybrid systems that combine offline-learned priors with online adaptation, and meta-reinforcement learning for cross-task generalization.
3.4. Offline Algorithm Configuration and Selection
While most learning-enhanced metaheuristics focus on adapting search behavior during a single run, a complementary body of work applies machine learning techniques before the optimization process begins. Specifically, these methods aim to either (1) automatically configure a metaheuristic by finding a static set of parameter values that perform well on average across multiple instances, or (2) automatically select the best-performing algorithm or configuration for each individual problem instance. These approaches emphasize cross-instance generalization by building reusable knowledge to guide future runs. Offline configurators sometimes embed adaptive elements at runtime, but they are grouped here because their core function is to establish static configurations or portfolios prior to the optimization run.
One prominent direction in this area involves
offline configuration techniques. Early approaches such as F-Race [
58] treated parameter tuning as a generate–evaluate process, using statistical racing to quickly eliminate poor configurations. More advanced configurators combine racing with model-based search strategies. For example, ParamILS [
59] performs iterated local search in the parameter space, while SMAC [
60] and BOHB [
61] use surrogate models such as random forests or Bayesian optimization to guide the exploration of promising configurations. The irace package [
56] adopts adaptive sampling and sequential testing to focus evaluations on promising regions of the parameter space. These configurators are now standard tools for producing tuned static configurations of popular metaheuristics such as CMA-ES, DE, PSO, and various hybrids.
When no single configuration or algorithm consistently outperforms others across all instances, researchers turn to
per-instance selection and portfolio strategies. A foundational example is SATzilla [
62], which extracts cheap-to-compute instance features, trains regression models to predict solver performance, and selects the one with the lowest predicted cost. Other systems such as ISAC [
63] and Hydra [
64] apply instance clustering or greedy selection to build complementary solver portfolios. In continuous optimization, exploratory landscape analysis (ELA) features [
65] have been used to guide portfolio selection among CMA-ES, DE, and PSO variants. AutoFolio [
66] combines SMAC-based configuration with SATzilla-style selectors, winning multiple international algorithm-selection competitions.
These offline approaches offer several advantages. First, they deliver strong out-of-the-box performance on unseen instances by leveraging patterns learned from prior experience. Second, they yield reproducible, fixed configurations or deterministic selection rules, simplifying deployment in industrial settings. Third, they can complement the online adaptive mechanisms described in previous sections, as offline-selected or configured solvers may also embed online learning modules.
Offline configuration methods (e.g., ParamILS, SMAC) are among the few with statistical guarantees, since model-based search can be framed as Bayesian optimization [
60]. These guarantees often establish convergence to near-optimal configurations under assumptions of smoothness or bounded noise in the configuration space. Nonetheless, such results apply only to the configuration search process, not to the downstream optimizer’s behavior once deployed. Portfolio-based selection frameworks further complicate analysis, as the mapping from instance features to solver performance is typically learned empirically with no generalization bounds. This highlights a theoretical blind spot: while the meta-level configurator may be provably efficient, the end-to-end performance of configured or selected solvers remains largely characterized empirically rather than by formal guarantees.
In summary, data-driven offline configuration provides strong plug-and-play solvers that can later be embedded within the online schemes discussed in
Section 3.1,
Section 3.2 and
Section 3.3. Although training such models is compute-intensive and dependent on informative instance features, they establish reproducible baselines and reduce deployment effort. Recent hybrid frameworks, such as BOHB’s combination of Bayesian surrogate modeling with multi-fidelity resource allocation [
61], illustrate how offline and online principles can be blended, reinforcing the view that static and adaptive strategies are complementary rather than competing.
3.5. Learning Landscape Characteristics
Understanding the structural characteristics of optimization landscapes—such as modality, ruggedness, neutrality, or constraint violation—is essential for designing effective global optimization strategies. Recent advances in ML-based landscape analysis provide new ways to characterize these structures and adapt metaheuristic behavior dynamically. Landscape-learning techniques may inform surrogate construction or parameter adaptation, yet we classify them here because their main role is to extract structural problem features that guide optimization.
A foundational pillar of this area is
exploratory landscape analysis (ELA), first formalized by Mersmann et al. [
67]. Subsequent extensions, such as the information-content features of Muñoz et al. [
68] and the automated algorithm-selection framework of Kerschke and Trautmann [
65], broadened ELA’s scope and strengthened its utility. ELA extracts quantitative features (e.g., dispersion, skewness, ruggedness, local-optima density) from continuous optimization problems, which can then feed ML models for algorithm selection, performance prediction, or adaptive control. Its key strength lies in producing interpretable insights from relatively small samples.
Complementing this, Malan and Engelbrecht [
69] introduced entropy-based measures to quantify ruggedness in continuous landscapes, offering a theoretically grounded assessment of difficulty. Malan’s later survey [
70] provides a comprehensive synthesis of advances in landscape analysis, including new landscape types (multiobjective, constrained, or dynamic) and applications spanning algorithm explanation to automated configuration.
Building on these foundations, newer research has explored deep learning and latent representations for landscape characterization. Seiler et al. [
30] evaluated deep learning-based, feature-free methods, highlighting the potential of convolutional and recurrent architectures to bypass handcrafted feature sets. These approaches promise scalability to high-dimensional spaces but raise challenges of interpretability and computational cost.
For constraint handling, Malan [
71] proposed a landscape-aware switching mechanism for differential evolution, where online landscape features decide when and how to apply constraint-handling techniques. This dynamic approach outperformed each constituent technique on a diverse test set.
An innovative development comes from Karp et al. [
72], who proposed the Landscape-Aware Growing (LAG) strategy for model scaling in deep learning. Unlike traditional loss-preserving growth heuristics, LAG uses early training dynamics after expansion to predict final performance, showing that signals observed shortly after initialization are reliable indicators of long-term outcomes. Although developed for deep neural networks, this insight carries implications for broader optimization, particularly stage-wise adaptive strategies.
Comparatively, classical ELA approaches excel in interpretability and ease of integration but struggle in high-dimensional or noisy settings. Deep learning-based methods offer scalability and adaptability but at the cost of explainability and higher data demands. Latent space optimization, as explored by Tripp et al. [
35], enables metaheuristics to search in compressed spaces, smoothing rugged landscapes, though risks losing global optima during projection. Finally, landscape-aware strategies like LAG shift the focus from static structural metrics to dynamic patterns emerging in early optimization.
From a theoretical perspective, landscape learning sits at the intersection of exploratory analysis and statistical generalization. Classical measures such as entropy-based ruggedness or dispersion indices have well-defined mathematical properties, offering provable insights into search difficulty [
69]. However, when features are mapped to algorithm performance, formal guarantees are scarce: most studies establish empirical correlations rather than predictive bounds. In high-dimensional settings, information-theoretic limits suggest that accurate feature estimation requires exponentially more samples, raising questions of scalability [
8,
9]. For deep learning-based approaches, the situation is even less clear—while convolutional or recurrent models can approximate complex landscapes, no convergence or generalization guarantees currently link their representations to optimizer success. Bridging this gap between interpretable theoretical metrics and black-box learned features remains an open challenge.
Taken together, these works underscore the growing role of ML in extracting and exploiting landscape knowledge to inform global optimization. They highlight a shift from static heuristics toward adaptive, data-driven strategies that tailor search behavior to the problem structure.
3.6. Meta-Learning and Self-Improving Optimizers
The ambition to learn not just parameters of an algorithm but the algorithm itself—often referred to as learning to optimize or meta-learning—has deep roots in machine learning. One early breakthrough was the LSTM-based optimizer by Andrychowicz et al. [
38], which demonstrated how update rules could be parameterized and trained via backpropagation through the optimization process. These works, initially in supervised learning, formalized principles such as bilevel optimization (inner and outer loops), generalization across task distributions, and replacing hand-crafted rules with learned modules [
73]. Meta-learning approaches often embed elements of surrogate modeling or adaptive parameter control, but their central purpose is cross-task generalization, which motivates their classification here.
Although this paradigm originated in gradient-based learning, its influence has extended to global optimization, where gradients are unavailable and optimization relies on population-based metaheuristics like CMA-ES, DE, or ES. Here, meta-learning is used to learn control policies, update rules, or entire algorithmic designs, often yielding improvements in performance and adaptability.
A notable example is the learned evolution strategy (LES) by Lange et al. [
40], which employs a small self-attention network trained offline to learn recombination weights and step-size policies. Trained across a distribution of test functions, LES generalizes to new problems and longer horizons, matching or surpassing canonical ES on MuJoCo tasks.
In parallel, several works have embedded reinforcement learning agents within metaheuristics for online control. Shala et al. [
74] used guided policy search to control step-size in CMA-ES, outperforming standard Cumulative Step-Size Adaptation. Yang et al. [
75] and Zhao et al. [
76] applied Q-learning to adapt mutation and exploration in multi-population DE, improving performance across CEC benchmarks. Tessari et al. [
36] went further, training a policy to switch among CMA-ES, PSO, and DE, illustrating how RL can serve as a meta-level controller for dynamic landscapes.
Beyond controlling existing algorithms, some methods evolve algorithm structures themselves. Chen et al. [
77] introduced MetaDE, treating the entire DE pipeline (mutation, crossover, population structure) as a search space, evolving improved variants via GPU-accelerated DE. Guo et al. [
43] proposed LCC-CMAES, where a neural controller schedules cooperative-coevolution decompositions, enabling optimization in extremely high-dimensional spaces (up to 10,000 variables).
An earlier form of meta-learning is predicting when to switch from exploration to exploitation in hybrid algorithms. This decision has been modeled as a supervised classification task [
78], using run-time features as inputs. Once trained, the classifier generalizes across functions and dimensionalities, providing an effective low-cost control mechanism.
Recent surveys map this emerging design space. Ma et al. [
39] frame meta-learning in black-box optimization (Meta-BBO) as a unifying paradigm connecting online control, algorithm selection, and hyper-heuristics. Szenasi et al. [
79] catalog ML-enhanced local search hybrids, while Nomura et al. [
80] show that even parameters like the covariance learning rate in CMA-ES can benefit from meta-learned tuning.
From a theoretical standpoint, meta-learning in optimization is far less mature than in supervised learning. Formal analyses exist in bilevel optimization, where gradient-based meta-learners can be studied under convexity and smoothness assumptions [
73], but such results rarely extend to population-based metaheuristics. Sample complexity bounds for meta-learning in reinforcement learning and algorithm configuration settings are beginning to emerge [
81], yet these typically assume large-scale task distributions and do not cover data-scarce regimes that arise in expensive global optimization. Moreover, while universal approximation theorems suggest that recurrent or attention-based optimizers can, in principle, represent any update rule, no convergence guarantees currently link such representations to stable optimization across unseen tasks. Bridging this gap—between what is provable in idealized settings and what is observed empirically in meta-learned optimizers—remains one of the central open questions.
Despite progress, challenges remain: improving sample efficiency (can meta-learned optimizers generalize from limited, expensive problems?), expanding theoretical understanding (which problem classes admit universal optimizers?), and enhancing interpretability. Work reverse-engineering LES suggests learned strategies can resemble concise heuristics, hinting that meta-learning may also aid algorithm discovery.
In summary, meta-learning is reshaping global optimization. Whether through offline-learned update rules, RL-driven controllers, meta-evolved pipelines, or supervised triggers, these approaches automate the design of self-improving metaheuristics—extending handcrafted algorithms with data-driven intelligence.
3.7. Summary Table
Table 1 provides a comparative overview of the six main categories in our taxonomy, highlighting what is learned, when it is applied, typical ML techniques, representative metaheuristics, and the key gains and caveats. To complement this tabular view,
Figure 1 provides a schematic taxonomy, showing the six categories in a functional map. Whereas the table emphasizes comparative detail, the figure captures the conceptual structure at a glance, giving the taxonomy a more intuitive “shape” that can be quickly digested.
Although presented as distinct categories, many of these approaches are naturally complementary rather than isolated. For example, surrogate modeling can directly support adaptive parameter control by providing low-cost feedback signals or uncertainty estimates that guide mutation rates and step sizes [
6,
17]. Similarly, landscape learning contributes informative features that improve surrogate accuracy or feed into offline algorithm selection and portfolio methods [
65,
70]. Meta-learning frameworks often embed these ideas as building blocks, reusing surrogates, controllers, or configurators to accelerate cross-task generalization [
39,
81]. Hybrid designs such as learnheuristics [
20] illustrate how operator adaptation, data-driven surrogates, and offline configuration can be unified in practice. More broadly, this interplay reflects a long-standing theme in optimization research: the effectiveness of integrated pipelines that combine approximation, adaptation, and learning components [
11,
12]. We therefore expect future ML-enhanced metaheuristics to move beyond single-role augmentation toward multi-layered systems where surrogates, controllers, and meta-learned priors co-evolve to provide both robustness and efficiency.
4. Machine Learning Techniques Applied
4.1. Supervised Learning
Supervised learning techniques have been extensively applied within metaheuristics to build predictive models that guide search, particularly when dealing with expensive objective functions or the need for selective evaluation. Their most notable impact is in surrogate-assisted optimization, fitness landscape modeling, and solution preselection. One prominent class of applications is
surrogate-assisted evolutionary algorithms (SAEAs), where supervised regression models replace or augment exact objective evaluations. For example, Jin et al. [
29] demonstrated a framework where radial basis function networks are used to approximate the fitness landscape, reducing the number of exact evaluations by an order of magnitude. More recent implementations rely on Gaussian processes, which offer both predictions and uncertainty estimates, and are widely used in Bayesian optimization pipelines. A concrete example is provided by Guo et al. [
33], who applied dropout neural networks to create uncertainty-aware surrogates in high-dimensional multi-objective problems. Their method used Monte Carlo dropout to quantify predictive uncertainty, allowing the algorithm to avoid over-exploitation of potentially misleading predictions. This led to substantial gains in sample efficiency and solution diversity in engineering design benchmarks.
Supervised models have also been integrated into
online model management strategies. In the work by Yu et al. [
41], two surrogate models (a global and a local one) were adaptively selected based on performance, helping the optimizer switch between exploration and exploitation modes. This dynamic selection enabled better responsiveness to landscape heterogeneity, which is often difficult to address with static heuristics alone.
In another line of work,
classification-based preselection has been used to discard poor-quality or infeasible solutions before costly evaluation. Calvet et al. [
20] embedded supervised classifiers into a hybrid routing optimization framework, where decision trees were trained to identify low-potential moves based on historical performance. This allowed the metaheuristic to focus computational effort on more promising regions without manually encoding decision rules. Supervised learning has also been leveraged to enhance
selection and replacement strategies in population-based algorithms, particularly in scenarios involving many-objective or noisy optimization. Instead of relying solely on raw fitness evaluations or Pareto dominance—which can be unreliable in the presence of noise or computational cost—surrogate models are trained to approximate solution quality and guide the selection process. For instance, Han et al. [
82] proposed a surrogate-assisted evolutionary algorithm for many-objective optimization in the refining process. Their approach employed ensemble learning to predict solution quality and prioritize candidates, thereby reducing the number of expensive evaluations while preserving convergence pressure. By integrating prediction-based preselection, the algorithm effectively navigated complex search spaces with reduced reliance on direct fitness comparisons.
These examples illustrate that supervised models are not merely plug-ins for fitness approximation—they can play active roles in guiding the core evolutionary operators, improving adaptivity, and mitigating stagnation. Their success depends not only on prediction accuracy but also on how uncertainty, model updating, and computational overhead are handled within the optimization loop. For instance, supervised classification has also been employed to regulate exploration pressure in distribution-based metaheuristics, where a learned controller adjusts sampling parameters based on observed optimization dynamics [
48].
4.2. Unsupervised Learning
Unsupervised learning techniques are commonly integrated into metaheuristics to extract structural information from evaluated solutions, allowing the optimizer to adapt its behavior without explicit labels. These methods are particularly valuable for preserving diversity, identifying latent structure, and enhancing scalability in high-dimensional spaces. One of the most impactful uses of unsupervised learning in metaheuristics is the integration of
clustering techniques to enhance population diversity, control mating strategies, or define adaptive niches. These methods extract structural information from the population to steer selection, variation, and environmental pressure more intelligently. Wang and Zhang [
83] proposed a K-means clustering-based offspring generation mechanism for evolutionary multi-objective optimization. By partitioning solutions in the objective space, their method adapted mating pool construction based on local search behaviors, achieving improvements in convergence and diversity. Similarly, Zhang et al. [
84] introduced a fuzzy c-means clustering-based mating restriction strategy that limited crossover to within-cluster individuals, thereby preserving local convergence properties while maintaining global diversity.
Affinity propagation has also been applied for automated niching. Wang et al. [
85] developed a differential evolution variant that uses contour prediction and affinity propagation clustering (APC) to discover and exploit multimodal niches. Their approach outperformed traditional niching methods by dynamically adapting the number and shape of clusters based on evolving solution distributions. Alternative formulations based on immune-inspired metaheuristics have also incorporated clustering as a diversity mechanism. Tsang and Lau [
86] designed a multi-objective immune algorithm where clusters were used to modulate clonal selection pressure and manage competing subpopulations along the Pareto front. Across these frameworks, clustering provides a powerful unsupervised signal for structuring variation, preserving diversity, and enabling region-specific search—all of which are crucial in solving multimodal, many-objective, or noisy optimization problems.
Dimensionality reduction techniques are used to simplify visualization, analysis, or search in complex, high-dimensional landscapes. Methods such as PCA and t-distributed stochastic neighbor embedding (t-SNE) have been incorporated into visual steering tools that help human-in-the-loop optimizers interpret search dynamics. In more autonomous settings, low-dimensional embeddings can also be used to bias variation operators toward meaningful subspaces. For instance, Lim et al. [
87] introduced a PCA-based mutation operator in genetic algorithms applied to IIR filter design. Their method perturbed candidate solutions along principal directions of variance, which increased genetic diversity and improved convergence compared to uniform or non-uniform mutations.
Another powerful application involves
latent space modeling via deep unsupervised learning. Autoencoders and variational autoencoders (VAEs) are increasingly used to encode candidate solutions into compressed representations, enabling exploration and variation in a structured latent space. Tripp et al. [
35] proposed an architecture where candidate solutions are encoded and then perturbed in the latent space before decoding back to the solution space. This allowed the search to respect underlying manifold structure and operate more effectively in high-dimensional domains with correlated or redundant variables.
These unsupervised techniques are particularly beneficial in settings where search landscapes are complex, noisy, or poorly understood. By extracting structure from the evolving population, they provide a foundation for adaptive sampling, diversity control, and dimensionality-aware search, thereby improving both robustness and efficiency.
4.3. Reinforcement Learning
Reinforcement learning provides a powerful framework for adaptive control in metaheuristic optimization. By modeling the metaheuristic as an agent interacting with its search environment, RL allows the algorithm to dynamically select operators or adjust parameters based on feedback from the search process. Early demonstrations, such as Eiben et al.’s Q-learning approach to online parameter control, showed that evolutionary algorithms could benefit from real-time adaptation without relying on fixed schedules [
45]. Recent studies have significantly advanced this idea. For instance, Durgut et al. applied a simple RL-based strategy to learn which variation operator to apply at each iteration, leading to consistent gains over static heuristics [
31]. Similarly, Ming et al. developed a deep Q-network that selects among crossover and mutation strategies based on features like population diversity and convergence, demonstrating superior performance in constrained multi-objective settings [
32].
Beyond operator selection, RL has been used for dynamic parameter tuning. Reijnen et al. used a deep RL agent to adjust differential evolution parameters online, improving performance on problems with shifting landscapes [
55]. Von Eschwege and Engelbrecht integrated a Soft Actor-Critic (SAC) agent into particle swarm optimization, enabling the algorithm to continuously adjust its learning coefficients based on swarm behavior [
88].
RL can also coordinate between different metaheuristics. Tessari and Iacca trained a policy-gradient agent to switch among algorithms like DE, CMA-ES, and PSO, choosing the most appropriate one for each search phase [
36]. Guo et al. applied a similar strategy to switch among DE variants [
89]. These approaches transform the metaheuristic into a self-configuring strategy selector, guided by learning rather than static design. RL has also been employed to control restart timing and reinitialization strategies in hybrid metaheuristics. For example, a deep Q-network was used to implement a smart restart mechanism that dynamically switches between exploration- and exploitation-focused configurations, yielding significant performance gains on CEC benchmark problems without requiring prior knowledge of the problem structure [
37].
Among the strengths of RL in metaheuristics are adaptive control, online learning, and flexibility across diverse problems. RL-based methods can continuously refine their strategies during the run, without requiring offline training. This makes them well-suited for dynamic or non-stationary optimization. Multi-objective settings can also benefit from custom reward designs that balance competing objectives [
32]. However, challenges remain. RL is often sample-inefficient, requiring many evaluations to learn effective policies. State and reward design is non-trivial and problem-dependent. Poorly shaped rewards can mislead learning, and high-dimensional or partially observable states can overwhelm simple RL algorithms. Deep RL approaches mitigate these issues but introduce complexity in tuning and stability. Moreover, RL agents may overfit to specific problem types or learn brittle behaviors if not properly generalized. Nevertheless, reinforcement learning continues to evolve as a key enabler of intelligent, adaptive metaheuristics. Its ability to guide search based on learned experience opens new possibilities for robust and flexible global optimization.
4.4. Representation Learning
Representation learning aims to discover compact, meaningful embeddings of candidate solutions or search spaces that improve the effectiveness of metaheuristic optimization. Instead of operating in raw, high-dimensional decision spaces, metaheuristics can benefit from latent spaces that capture important structure or constraints, making the search smoother and more efficient. Autoencoders and variational autoencoders (VAEs) are commonly used to construct such representations. Tripp et al. showed that searching in the latent space of a VAE trained on candidate solutions can drastically reduce the number of evaluations needed for convergence [
35]. Latent variables encode high-level patterns in the data, allowing metaheuristics to perform variation along semantically meaningful directions. Representation learning has proven particularly useful in constrained optimization. Bentley et al. introduced the COIL framework, which trains a VAE on feasible solutions to learn a feasibility-biased latent space. Metaheuristic search conducted in this space naturally produces valid solutions, alleviating the burden of constraint handling [
90].
In addition to handling constraints, representations can support transfer learning. By training embeddings on multiple related tasks, metaheuristics can generalize across problem instances. Wang et al. demonstrated that a co-surrogate model using a transfer-learned latent space can accelerate multi-objective optimization [
91]. Principal component analysis (PCA) and other linear embeddings have also been used to bias variation operators. Lim et al. proposed a PCA-based mutation operator that perturbs individuals along principal directions, improving exploration [
92]. More complex models such as GANs have been employed to generate realistic candidate solutions in design optimization. The advantages of representation learning include dimensionality reduction, constraint embedding, and knowledge reuse. Representations can smooth rugged search landscapes and steer the search toward feasible or promising regions. They also allow for amortized inference: once trained, the encoder and decoder can be quickly applied during optimization. Yet, representation learning introduces new challenges. Poor training data can bias the latent space away from optimal regions. Training deep models adds computational overhead, and determining the right dimensionality for the latent space is often non-trivial. Interpretability is another concern, especially when using black-box neural embeddings. Additionally, these methods require sufficient training samples, which may be unavailable in sparse-data settings. Overall, representation learning offers a powerful toolkit for structuring the search space in metaheuristic optimization. When properly integrated, it enables algorithms to operate more intelligently and efficiently by leveraging patterns learned from the problem domain.
4.5. Meta-Learning and Learnheuristics
Meta-learning enables optimizers to improve based on past experiences, shifting from problem-specific tuning to generalizable learning across tasks. This approach captures the idea of “learning to learn,” allowing metaheuristics to adapt faster and more effectively to new optimization problems. A foundational concept is optimizer transfer: using performance data from previous runs to inform decisions on new problems. Early work by Calvet et al. introduced learnheuristics, where machine learning is used to predict heuristic behavior in response to changing inputs [
20]. This laid the groundwork for adaptive optimizers that evolve based on accumulated experience. Modern approaches go further by training neural models to perform optimization. Andrychowicz et al. trained LSTMs to act as optimizers via meta-learning, showing that learned update rules can outperform hand-crafted ones on new tasks [
38]. Chen et al. surveyed learning-to-optimize frameworks, highlighting how recurrent and transformer models can generalize across optimization tasks [
73].
Few-shot learning techniques enable rapid adaptation with minimal data. Meta-learners can identify the closest known problems to a new instance using initial evaluations, allowing fast algorithm selection or configuration. Guo et al. demonstrated deep RL-based dynamic algorithm selection, an approach that can be extended to meta-learning by training controllers on performance data [
89]. Self-configuring optimizers represent another trend. Shala et al. trained a CMA-ES variant with a learned step-size adaptation rule, improving convergence on unseen problems [
74]. Such continual learning approaches allow the optimizer to refine its internal behavior across runs.
LSTMs and Transformers are particularly effective for modeling optimization trajectories. Lange et al. introduced the Evolution Transformer, which learned to imitate an evolution strategy and generalized to new tasks via imitation learning [
40]. Supervised meta-controllers can also be trained using historical data. These models map problem features to algorithm configurations or operator choices, reducing the need for manual tuning [
93]. Durgut et al. proposed reusing RL policies across problems, demonstrating that knowledge transfer can reduce learning overhead [
94]. The strengths of meta-learning include rapid adaptation, robust generalization, and reduced need for manual configuration. Meta-learned optimizers can identify patterns across tasks, recall effective strategies, and configure themselves with minimal input. They are particularly valuable in dynamic or repetitive optimization scenarios. However, meta-learning also demands extensive training data, increasing upfront cost.
This dependence on abundant training data does not eliminate the practical value of meta-learning in data-scarce settings. Many optimization problems permit only a limited number of evaluations, yet approaches such as few-shot meta-learning leverage priors from related tasks to adapt with only a handful of samples [
81]. Surrogate-assisted transfer builds models from previous tasks and reuses them to guide new searches efficiently, reducing the need for direct evaluations [
95]. Likewise, auxiliary signals such as landscape features or physics-informed constraints can provide structural guidance that compensates for limited data [
96]. These strategies show that meta-learning can remain effective under scarcity, provided inductive biases and auxiliary knowledge are systematically exploited.
Complex models may lack interpretability, and negative transfer can degrade performance on dissimilar tasks. Implementing meta-learning frameworks requires expertise and may involve significant development effort. Despite these challenges, meta-learning holds promise for creating autonomous optimizers that improve with experience. As training data becomes more accessible and architectures more efficient, meta-learning is poised to transform how metaheuristics adapt to complex, evolving problem landscapes.
5. Emerging Trends
The integration of machine learning into continuous metaheuristic optimization has progressed from basic surrogate assistance to a rich and growing ecosystem of data-driven, learning-enhanced strategies. Recent developments highlight six particularly promising and interconnected trends, signaling a fundamental shift in how optimization algorithms are designed, adapted, and deployed.
5.1. Learned Optimizers via Sequence Models
A new class of approaches seeks to learn the optimization algorithm itself, replacing hand-crafted update rules with trained neural architectures. This trend is exemplified by
in-context evolutionary optimization, where large Transformer models ingest sequences of fitness and search data to propose updates in an autoregressive fashion. Notably, the Evolution Transformer is trained via behavioral cloning to mimic the distributional update of canonical evolution strategies and can generalize across tasks and horizons [
97]. In parallel, the MetaBBO framework learns attention-based update rules for evolution strategies through a bi-level meta-optimization loop [
40].
More recently, RIBBO reframes black-box optimization as a sequence modeling task. By augmenting optimization traces with regret tokens and using hindsight relabeling, it enables Transformers to learn robust sampling strategies offline, outperforming their teacher optimizers on BBOB, HPO, and control tasks [
98]. Together, these works suggest that large-scale, attention-based architectures can act as general-purpose optimizers, trained once and reused across problem families.
5.2. Reinforcement Learning for Online Control
Whereas sequence models aim to learn the full optimizer offline, reinforcement learning is increasingly used to adapt critical components during search. For instance, Soft Actor-Critic agents have been used to control PSO velocity parameters on the fly, enabling robust, hyperparameter-free performance [
88]. In a similar vein, online adjustment of the CMA-ES learning rate based on a signal-to-noise heuristic allows for more efficient adaptation without population inflation [
80].
RL can also orchestrate hybrid algorithms. In [
37], a deep Q-network is trained to manage smart restarts between exploration- and exploitation-focused configurations in a CMA-ES–UES hybrid. At a more applied level, RL has been successfully embedded in Crow Search algorithms for real-time energy loss optimization in smart grid systems, improving voltage stability and reducing distribution loss [
99].
These examples show how RL bridges the gap between reactive search and proactive decision-making, especially when paired with well-defined state representations and reward signals.
5.3. Surrogate Modeling: Deep, Explainable, and Cost-Aware
Surrogate models remain a core tool in metaheuristic optimization for reducing expensive evaluation costs. Recent advances have not only improved their scalability and accuracy but also extended their functionality into domains like explainability, budget-awareness, and transfer learning. Deep learning surrogates, including dropout neural networks, now serve dual roles as high-capacity predictors and uncertainty estimators. These models outperform traditional Gaussian processes in high-dimensional and many-objective problems, where standard kernels become infeasible [
33]. Their robustness and scalability have enabled broader deployment in real-world applications, such as engineering design and bioinformatics, where evaluating each candidate solution is computationally intensive.
Explainability has emerged as a surprising but powerful complement to surrogate modeling. Instead of relying solely on surrogate predictions to rank or sample solutions, some frameworks now use interpretable features of the surrogate to inform variation. For instance, in the EXO-SAEA framework [
100], SHapley Additive exPlanations (SHAP) values derived from the surrogate model guide crossover and mutation by identifying influential input variables. This approach not only improves convergence but also embeds problem-specific knowledge directly into the search dynamics. By integrating model explanations into the generation of new solutions, the optimization process becomes more strategic and data-aware.
Surrogate modeling is also adapting to the reality of asymmetric evaluation costs. In many multi-objective problems, not all objectives are equally expensive to compute. Recent approaches use co-surrogates to learn the relationships between low-cost and high-cost objectives, allowing the optimization to defer expensive evaluations until they are most informative [
91]. This enables more efficient discovery of the Pareto set and opens the door to cost-aware strategies that dynamically allocate computational resources across objectives.
A recent survey consolidates these trends into a general design blueprint for surrogate-assisted evolutionary algorithms [
42]. The review organizes the space into four core components, sampling, modeling, control, and integration, and highlights open challenges such as handling uncertainty under dynamic conditions and managing multiple models over time. This framework provides both a conceptual roadmap and a practical checklist for future developments. Together, these innovations suggest that surrogate modeling is evolving from a passive evaluation shortcut into a fully integrated layer of intelligence within metaheuristics. Deep models expand the reach, explainability improves decision-making, and budget-awareness brings real-world practicality. As optimization problems grow more complex, surrogate-assisted strategies will likely play an even more central role in adaptive, scalable, and interpretable search.
5.4. Generative Diffusion Models for Offline Optimization
Generative diffusion models have recently emerged as a promising new paradigm for offline black-box optimization. These models learn to sample from high-fitness regions of the solution space by iteratively denoising noise vectors toward the data distribution, offering a powerful generative framework that can operate without explicit surrogate models or gradients.
Li et al. [
101] introduced a reward-directed diffusion model that employs classifier-free guidance to steer samples toward high-reward regions. By incorporating reward signals directly into the denoising process and deriving formal sub-optimality bounds, their approach bridges the gap between probabilistic modeling and optimization. This makes it particularly effective for problems with noisy or limited logged data, where traditional optimization algorithms may struggle due to sparse supervision or costly evaluations.
Complementing this, Krishnamoorthy et al. [
102] proposed DDOM (Diffusion-based Decision Optimizer for Modeling), which uses a reweighted loss function during training to emphasize high-reward samples. At inference time, DDOM applies a guidance mechanism that nudges the generative process toward extrapolated regions of the fitness landscape—potentially sampling beyond known optima. Their results show that DDOM consistently outperforms GAN-based and surrogate-assisted approaches on benchmark problems from the Design-Bench suite, demonstrating the method’s sample efficiency and flexibility.
One of the key strengths of diffusion models lies in their ability to capture complex, multimodal distributions over solutions. Unlike GANs, which can suffer from mode collapse, or surrogates, which are limited to pointwise approximation, diffusion models naturally learn a full generative process that preserves diversity and structure in the solution space. This makes them especially well-suited for domains like materials discovery, protein design, or neural architecture search, where optimal solutions are sparse and varied.
Beyond this advantage, diffusion models also possess several properties that align naturally with the constraints of offline black-box optimization. Unlike GANs, whose adversarial training dynamics can be unstable and do not yield normalized likelihoods, diffusion models are trained with score matching and maximum-likelihood surrogates, producing calibrated densities over the design space. This enables principled mechanisms such as (i) reweighting the training distribution toward high-reward regions without requiring a discriminator, and (ii) test-time guidance that strengthens conditioning on target rewards while preserving sample diversity [
102].
Diffusion-based approaches also admit theoretically grounded analysis in offline settings. Li et al. [
101] derive sub-optimality bounds for reward-directed diffusion, showing how the strength of guidance trades off between reward improvement and faithfulness to the data manifold. Such theoretical scaffolding, which is difficult to obtain for GAN objectives, helps calibrate inference schedules and diagnose failure modes when the logged data provide limited coverage.
Another distinctive feature is the iterative denoising process, which provides fine-grained control at inference. Krishnamoorthy et al. [
102] demonstrate that loss reweighting during training, combined with classifier-free guidance at inference, allows exploration near or even beyond the best observed samples without collapsing to a single mode. This refinement pathway preserves solution diversity and offers a stable mechanism for nudging candidates toward high-fitness regions.
Finally, diffusion models integrate naturally with semi-supervised or preference-based supervision often encountered in offline design tasks. Reward-directed conditional diffusion can exploit large unlabeled corpora alongside limited labeled data or pairwise preferences, while maintaining statistical guarantees via conditional score estimation [
101]. Together, these features—likelihood-based training, theoretical guarantees, controllable sampling, and diversity preservation—make diffusion models especially apt for offline optimization compared to GANs or surrogate-based alternatives.
By shifting the optimization paradigm from explicit modeling to implicit generative reasoning, diffusion models provide a scalable, surrogate-free framework for offline optimization. As these models become more expressive and controllable, they offer new avenues for integrating machine learning with global search—particularly in settings where data is fixed, structured, or expensive to obtain.
5.5. Landscape-Aware and Data-Centric Search
While traditional metaheuristics often rely on heuristic restarts, niching, or random diversification, recent methods are shifting toward explicit modeling of the fitness landscape to guide search more intelligently. Instead of treating the search space as opaque, these algorithms infer structural properties such as modality, basin boundaries, and optima distribution to inform decisions during optimization.
The LADE algorithm exemplifies this trend by actively maintaining a memory of detected peaks and employing peak-distinction heuristics to distinguish between global and local optima [
103]. When the search stagnates or risks premature convergence, LADE guides reinitialization toward underexplored or diverse regions of the landscape. This structured feedback loop improves both convergence and population diversity, making LADE particularly effective in multimodal and deceptive environments where traditional heuristics may falter.
Complementing landscape-aware design are data-centric techniques that adaptively adjust algorithm behavior based on features extracted during the optimization run. These approaches aim to replace static control mechanisms with learned or reactive policies. Simple supervised models have been shown to improve the adaptation of threshold control in run time [
48]. This data-driven strategy led to better performance than fixed schedules, suggesting that even lightweight learning components can yield substantial gains when integrated into the control loop.
Together, these approaches mark a shift from generic, one-size-fits-all heuristics toward instance-aware, feedback-driven optimization. Landscape-aware algorithms gain insight into the global structure of the search space, while data-centric models offer flexible, real-time control based on evolving signals. As metaheuristics increasingly blend these two perspectives, they become more capable of adapting to the nuances of diverse and challenging optimization problems.
5.6. Meta-Black-Box Optimization and Automated Algorithm Design
A unifying trend across many recent advances is the emergence of Meta-Black-Box Optimization (Meta-BBO): the idea of not only solving optimization problems but also optimizing the optimizers themselves. This paradigm elevates the design of algorithms into an outer-loop black-box problem—treating algorithm configuration, selection, and even construction as tasks that can be solved by meta-level search.
In a comprehensive survey, Ma et al. [
39] classify Meta-BBO tasks into three categories: algorithm configuration, where parameters are tuned for specific tasks; algorithm selection, where the best-performing algorithm is chosen from a portfolio; and algorithm generation, where new optimizers are synthesized altogether. The review catalogs a wide spectrum of approaches, ranging from reinforcement learning and supervised regression to neuroevolution and transformer-based policy learning. It also highlights an emerging role for large-scale pretrained models and meta-learning pipelines that can generalize across problem families—enabling the creation of self-adaptive, cross-domain optimizers that improve with experience.
This shift reflects a deeper convergence between optimization, AutoML, and program synthesis. As optimization increasingly adopts learning-based tools, the role of the algorithm designer evolves. Rather than manually crafting heuristics, researchers now act as curators of training data, architects of learning objectives, and designers of policy spaces. This reorientation opens up new forms of abstraction and automation in algorithm development—accelerating innovation while reducing reliance on domain-specific expertise.
Ultimately, Meta-BBO recasts the challenge of algorithm design as a learning problem. By embedding optimization knowledge into models that themselves evolve, this approach promises a future where optimizers are not just engineered, but trained—adapting automatically to the structure, dynamics, and complexity of the tasks they are deployed on.
5.7. Summary and Outlook
Taken together, these trends signal a transition from static, manually tuned heuristics to adaptive, learning-driven optimizers that respond to the search landscape, performance feedback, and computational constraints. Deep models are now viable both as surrogate predictors and as full optimizers. Reinforcement learning is enabling intelligent online adaptation. Surrogate models are being made interpretable and cost-aware, and general-purpose meta-optimization frameworks are emerging to tie everything together.
Figure 2 illustrates this trajectory in historical perspective, highlighting the evolution of ML–metaheuristic integration from surrogate modeling and exploratory landscape analysis in the early 2000s, through adaptive control and offline configuration in the 2010s, to the rise of deep reinforcement learning, learned optimizers, and foundation models in the 2020s. This chronological overview situates the emerging trends within a longer arc of research development, reinforcing both their novelty and their continuity with past advances.
As this ecosystem matures, new challenges arise: maintaining generalization across tasks, balancing transparency with learning capacity, and reducing the computational overhead of complex models. Yet the trajectory is clear—machine learning is no longer an add-on to metaheuristics; it is becoming the foundation of next-generation global optimization.
6. Open Challenges
Despite the rapid progress and promising results in integrating machine learning with metaheuristics for global optimization, several critical challenges remain unresolved. These challenges reflect both the unique difficulties of global optimization (e.g., high dimensionality, sample inefficiency, non-stationarity) and the emerging complexities introduced by learning-based enhancements (e.g., generalization, interpretability, data scarcity). In this section, we highlight five key open challenges that, if addressed, could significantly improve the reliability, scalability, and usability of ML-enhanced metaheuristics in practice.
6.1. Generalization Across Problem Instances
A central challenge in ML-enhanced global optimization is the ability of learned strategies—whether update rules, control policies, or configuration selectors—to generalize beyond their training distribution. Unlike classical metaheuristics, which rely on manually designed and generally applicable rules, ML-based methods often depend on data from previous runs or representative tasks to guide their learning process. As a result, there is a risk that learned optimizers overfit to specific benchmark families, dimensions, or search space geometries.
This issue is particularly acute in meta-learning and learned optimizer approaches. For example, Lange et al. [
40] propose a learned evolution strategy (LES) using a self-attention-based architecture trained via meta-black-box optimization. While LES shows promising generalization to new tasks, dimensions, and population sizes, its success relies heavily on careful meta-training across diverse representative functions. The authors acknowledge that performance can degrade if test problems fall too far outside the meta-training distribution.
Guo et al. [
89] tackle a similar challenge from the angle of dynamic algorithm selection using deep reinforcement learning. Their proposed RL-DAS framework learns to switch among differential evolution variants in real time, based on observed landscape and algorithmic features. Although RL-DAS outperforms any single DE variant and exhibits zero-shot generalization across CEC benchmark functions, the authors note that generalization is still constrained by the feature design and diversity of the training instances.
Bolufé-Röhler et al. [
78] explores this problem in the context of hybrid metaheuristics, modeling the transition point between exploration and exploitation phases as a supervised learning problem. Even with careful feature engineering, the learned control model risks misclassification when applied to unseen functions or new dimensionalities, limiting the robustness of such adaptive hybridizations.
In short, the generalization of ML-enhanced metaheuristics remains a delicate balancing act between expressivity and overfitting. The tension is particularly pronounced in global optimization, where problems often differ widely in scale, modality, and ruggedness. Even when powerful representation learning methods are used—as in the deep landscape characterization framework by Seiler et al. [
30]—generalization can suffer in high-dimensional or noisy settings due to model overfitting or lack of interpretability. Future work is needed to design learning strategies that incorporate robustness, transferability, and domain adaptation mechanisms, potentially by drawing inspiration from meta-reinforcement learning or few-shot learning paradigms.
In the long run, promising avenues include meta-reinforcement learning, domain adaptation, and theoretically grounded distribution-shift frameworks, which could allow learned optimizers to retain performance even when problem distributions evolve. Without such safeguards, the field risks producing optimizers that excel on benchmarks but collapse in deployment.
6.2. Sample Efficiency and Computational Overhead
In global optimization problems where objective evaluations are costly or time-consuming, reducing the number of evaluations without sacrificing solution quality is a primary concern. Machine learning-enhanced metaheuristics have tackled this challenge through a variety of strategies, including surrogate modeling, reinforcement learning, and adaptive parameter control. However, these improvements often come with their own computational overhead, raising new questions about trade-offs between evaluation cost, learning cost, and optimization performance.
Jin [
17] provides one of the foundational reviews on surrogate-assisted evolutionary computation, describing how surrogate models such as Gaussian processes, radial basis functions, and support vector machines can be integrated into metaheuristics to approximate objective functions and reduce the evaluation burden. The paper also discusses the importance of model management strategies in controlling approximation error and in preventing misleading search behavior. Notably, Jin identified persistent challenges related to scalability in high-dimensional problems, efficient sampling strategies, and the computational cost of maintaining accurate surrogate models.
To address some of these limitations, Yu et al. [
41] propose a Two-Stage Dominance-Based Surrogate-Assisted Evolutionary Algorithm (TSDEA) for expensive, high-dimensional multi-objective optimization. Their approach uses radial basis function surrogates to pre-rank candidates and applies an angle-penalized distance metric to balance convergence and diversity. Only the most promising solutions are passed to the exact evaluation phase, significantly reducing the number of function calls. This method exemplifies a careful balance between surrogate precision and computational feasibility in large-scale settings.
Monteiro and Sau [
54] offer an alternative to conventional surrogate modeling with their Landscape-Sketch-Step (LSS) algorithm, which employs reinforcement learning to build a state-value approximation of the optimization landscape. Rather than explicitly modeling the objective function, LSS uses interpolated reward signals from past evaluations to guide stochastic search, yielding a lightweight and scalable search mechanism. This surrogate-free strategy emphasizes sample efficiency and demonstrates robust performance on rugged low-dimensional functions, suggesting that full model construction is not always necessary for informed decision-making.
A different perspective is provided by Reijnen et al. [
55], who apply deep reinforcement learning to adapt key parameters in differential evolution during multi-objective optimization. Their DRL-APC-DE framework learns to adjust scaling factors and crossover probabilities based on optimization state features, thereby improving efficiency without relying on surrogate modeling. While this method enhances performance and adaptability, it also introduces additional computational overhead from policy training and inference, highlighting the ongoing tension between learning sophistication and runtime budget.
A deeper challenge is that learning itself can become the bottleneck. For instance, training a deep RL controller or large surrogate may consume orders of magnitude more compute than the optimization problem it seeks to accelerate. Balancing these competing costs will require principled cost–benefit analyses and the integration of energy-aware metrics (see
Section 7). Future research should explore adaptive resource allocation, where optimizers dynamically decide when to invest in learning versus direct search.
In summary, while ML techniques have introduced powerful tools to reduce evaluation cost in global optimization, they also raise new challenges in terms of computational efficiency and model management. Future work may benefit from hybrid strategies that combine surrogate-assisted methods with lightweight, policy-driven control, and from designing metaheuristics that can dynamically balance learning effort with available computational resources.
6.3. Interpretability and Trust
As machine learning components become more tightly coupled with metaheuristic-based global optimizers, interpretability and user trust are emerging as first-order design constraints. High-capacity learners, such as deep neural networks, can capture rich regularities in black-box landscapes, yet their opaque reasoning makes it difficult to validate, debug, or gain insight, an issue that becomes acute in safety-critical domains such as engineering design, medical modeling, or scientific discovery.
Seiler et al. [
30] systematically evaluated two feature-free deep architectures—a lightweight convolutional network (ShuffleNet V2) and a point-cloud transformer—for characterising single-objective continuous fitness landscapes. Across the BBOB test suite, the deep models delivered performance comparable to (and occasionally slightly below) that of handcrafted exploratory landscape analysis features, demonstrating that end-to-end representation learning can match manual descriptors without problem-specific feature design. At the same time, the latent embeddings produced by these networks lack a straightforward semantic interpretation, underscoring an inherent trade-off between representational flexibility and transparency.
To make surrogate decisions more explainable, Li et al. [
100] embed SHAP analysis within a surrogate-assisted evolutionary algorithm. SHAP values quantify each design variable’s contribution to the surrogate’s prediction, and this information is fed back into variation operators so that crossover and mutation focus on the most influential dimensions. The result is a method that not only accelerates convergence but also provides actionable, instance-specific explanations of why certain variables are being emphasised—thereby increasing practitioner trust.
A complementary perspective adopts simple and transparent models—such as decision trees, support vector machines,
k-nearest neighbours, and Naïve Bayes—to detect phase transitions between exploration and exploitation in adaptive hybrid metaheuristics [
78]. Because these classifiers are readily visualised and audited, their control rules can be inspected, debugged, and ported across problem domains with minimal effort.
The classic survey of Jin [
17] cautions that low-fidelity surrogates may introduce spurious optima or bias the search trajectory if their uncertainty is not properly managed. Although written in the pre-deep-learning era, this warning remains pertinent: delegating critical decisions to black-box learners without safeguards can erode reliability and user confidence.
Ultimately, progress will hinge on combining interpretability with performance: hybrid pipelines could couple interpretable, rule-based surrogates with opaque but powerful deep components, exposing human-auditable decision layers without sacrificing efficiency. Such efforts would not only increase trust but also accelerate the adoption of ML-enhanced metaheuristics in engineering, energy, and scientific domains where safety and accountability are non-negotiable.
In sum, achieving state-of-the-art optimisation performance with machine learning-enhanced metaheuristics must go hand in hand with mechanisms that render their behaviour intelligible. Promising directions include hybrid frameworks that couple interpretable surrogate models with post hoc explanation tools, and modular algorithm designs that expose internal learning signals for selective introspection.
6.4. Benchmarking and Reproducibility
Reliable benchmarking and reproducibility remain central concerns in the evaluation of ML-enhanced metaheuristics for global optimization. However, the increasing algorithmic complexity introduced by learning components—such as surrogate models, adaptive policies, and meta-level controllers—compounds long-standing issues related to evaluation standards, comparability, and result validity.
Bartz-Beielstein et al. [
25] provide a comprehensive discussion of benchmarking principles and common pitfalls in optimization research. They emphasize that despite progress in platforms like COCO and BBOB, many studies in evolutionary computation still lack standardized reporting of experimental protocols, fail to distinguish clearly between stochastic variability and structural algorithmic changes, and offer limited information about initialization, termination, or performance measures. The paper argues that benchmarking is often treated as a routine post-processing step rather than an integral methodological component, which can weaken the reliability and generalizability of conclusions.
Kerschke and Trautmann [
65] echo these concerns in the context of automated algorithm selection for continuous black-box optimization. Their study demonstrates that subtle differences in instance selection, feature computation (e.g., through exploratory landscape analysis), and function preprocessing can substantially affect performance comparisons. They note that reproducibility is especially challenging when ML-based selectors are involved, since learned models can be sensitive to training noise and benchmark composition.
López-Ibáñez et al. [
56] address reproducibility from a complementary perspective: the configuration of optimization algorithms themselves. The irace framework applies a racing-based approach to compare parameter configurations on sets of instances under controlled budgets. The authors discuss the importance of training/testing separation, careful seed control, and repeated evaluation to prevent overfitting and ensure fair comparison. While their work focuses on automatic configuration, the broader implication is that rigorous evaluation practices are essential even before benchmarking comparisons are made.
Lastly, Ma et al. [
39] draw attention to emerging issues in meta-black-box optimization (MetaBBO), where learned optimizers are applied across tasks rather than single problem instances. They highlight the difficulty of benchmarking such methods consistently, since performance depends not only on the test functions used but also on the distribution of training tasks and the nature of the meta-level adaptation. The paper points out that few current benchmarks support this level of variability, which limits our ability to evaluate generalization in learned metaheuristics.
Collectively, these works indicate that reproducibility in ML-enhanced global optimization requires more than access to code or data. It demands clear descriptions of algorithmic components and evaluation conditions, consistency across studies, and the use of robust experimental design. These considerations are especially important when comparing hybrid or learned optimizers whose behavior may be data-dependent, history-sensitive, or influenced by random seeds in subtle ways.
Looking forward, reproducibility will require community-level infrastructure, benchmarks that cover both optimization functions and meta-training distributions, standardized reporting formats for learned optimizers, and artifact-sharing protocols that include surrogates, controllers, and logs. Without such open science practices, progress risks being fragmented and difficult to verify.
6.5. Balancing Innovation with Practical Deployment
A final cross-cutting challenge is bridging the gap between innovation and practice. Many ML-enhanced metaheuristics demonstrate state-of-the-art performance in benchmark studies, yet remain difficult to adopt in industrial workflows. Barriers include hyperparameter sensitivity, lack of robust software implementations, and high computational cost. For deployment in engineering design, logistics, or energy systems, optimizers must be not only accurate but also stable, reproducible, and easily integrated with existing toolchains. Addressing this challenge may require standardized application programming interfaces (APIs), modular algorithm libraries, and closer collaboration between academic researchers and domain practitioners. Recent efforts toward open optimization platforms and reproducible benchmarking suites [
25,
104] illustrate promising first steps, but widespread adoption still lags behind adjacent areas such as AutoML [
81]. Without these infrastructural supports, even the most sophisticated learned optimizers risk remaining proof-of-concept contributions rather than practical tools. Closing this gap will require not only technical innovation but also community-level initiatives that promote software sustainability, documentation standards, and integration with domain-specific simulation environments.
6.6. Synthesis
It is important to recognize that the limitations of ML-enhanced metaheuristics are not secondary issues but central obstacles. Computational overhead, high data requirements, risks of overfitting, and interpretability challenges are recurring themes across
Section 6.1,
Section 6.2 and
Section 6.3. A balanced perspective requires acknowledging these drawbacks alongside the advantages, since they currently constrain the reliability, scalability, and wider adoption of learning-enhanced approaches.
These challenges highlight that ML-enhanced metaheuristics are not yet a solved technology, but rather an evolving ecosystem with both opportunities and pitfalls. Generalization and sample efficiency question whether methods can scale; interpretability and reproducibility question whether they can be trusted; and deployment challenges question whether they can be used at all outside academic benchmarks. Tackling these issues collectively will determine whether the field matures into a reliable scientific discipline or remains a patchwork of promising but isolated prototypes.
Taken together, the above challenges outline a broad research agenda for the field. Among them, several stand out as particularly urgent priorities: (1) improving data efficiency and ensuring robust generalization across diverse problem types, (2) reducing computational and implementation overhead to make ML-enhanced metaheuristics deployable at scale, (3) developing stronger theoretical underpinnings—including convergence analysis and generalization guarantees—to guide principled algorithm design, and (4) advancing reproducibility and benchmarking practices by building shared infrastructures and standardized protocols. Progress on these fronts will not only address immediate obstacles but also accelerate the transition from promising prototypes to reliable, widely adopted optimization frameworks.
7. Future Directions
The intersection of machine learning and metaheuristics is not only a theoretical frontier but also a practical one, with demonstrated impact across diverse real-world domains. From engineering design and energy systems to logistics, AI/ML itself, and the natural sciences, ML–metaheuristics are already serving as enabling tools for data-intensive, high-stakes optimization.
Figure 3 summarizes these domains and cross-cutting themes, highlighting both the breadth of impact and the unifying methodological advances that drive them. This breadth underscores the importance of forward-looking research: the next generation of methods must balance theoretical soundness, computational efficiency, and domain adaptability.
As the intersection of machine learning and metaheuristics continues to evolve, the next breakthroughs will likely emerge from creative recombination, architectural innovation, and a rethinking of foundational assumptions. This section outlines several forward-looking directions that build on the trends and techniques reviewed in
Section 3 and
Section 5, offering high-reward avenues for shaping the next generation of optimization systems.
7.1. Neuro-Symbolic Hybridization: Learning with Logic and Structure
While much of the recent progress in ML-enhanced optimization has been driven by deep learning (see
Section 3.6), symbolic approaches offer complementary strengths in structure, interpretability, and inductive bias. The integration of neural models with symbolic reasoning—through mechanisms such as logic programming, symbolic regression, or grammar-guided search—could produce metaheuristics capable of abstract reasoning and explainable decision-making. Embedding algebraic constraints or programmatic priors into learned optimizers may yield search strategies that generalize better and offer more transparent control mechanisms.
7.2. Foundation Models and Multi-Modal Optimization
Recent work has explored learned optimizers using Transformers and sequence models (
Section 5.1), but the broader class of foundation models remains underutilized. These large, pretrained models (e.g., GPT, CLIP) open up possibilities for multi-modal optimization, where textual descriptions, diagrams, or simulation logs guide the search process. Integrating foundation models into metaheuristics could enable zero-shot or few-shot optimization in real-world design tasks, with the model suggesting variation strategies or shaping fitness landscapes based on contextual understanding.
7.3. Co-Evolution in Agent-Based Optimization
Building on ideas from adaptive control and dynamic search policies (see
Section 3.1 and
Section 3.3), future work may adopt agent-based models in which populations of optimizers evolve in parallel with candidate solutions. Inspired by ecological dynamics, this paradigm enables emergent search behaviors through competition, cooperation, or specialization. Such systems could support continual learning, dynamic role assignment, and online strategy discovery—offering an alternative to hand-designed or pretrained policies.
7.4. Green Optimization and Energy-Aware Learning
The computational cost of intelligent optimizers—especially those using deep models or meta-learning (
Section 3.6)—is becoming a practical constraint. Future research will need to consider energy efficiency as a first-class objective. Techniques such as anytime optimization, sparse models, and thermodynamically motivated algorithm design could give rise to green metaheuristics: methods that adapt their complexity to budget constraints. This is particularly relevant for embedded systems, edge devices, or climate-conscious computation.
To make the notion of green optimization operational, it is crucial to adopt measurable definitions of energy cost. Recent work in Green AI emphasizes reporting computational efficiency alongside accuracy, e.g., energy-to-solution and carbon emission equivalents [
105]. For optimization algorithms, several concrete metrics are available: (i) hardware-level monitoring through runtime power profiling of CPU/GPU wattage and memory access patterns [
106], (ii) algorithmic proxies such as floating point operations (FLOPs), training iterations, or communication cost in distributed runs [
107], and (iii) normalized performance indicators, exemplified by frameworks like Cost-Aware Bayesian Optimization (CArBO), which measure convergence in terms of a cost budget—such as energy—rather than iteration counts [
108]. These measures can be integrated into benchmarking protocols, enabling the research community to compare optimizers not only in terms of accuracy and convergence speed, but also in terms of their ecological footprint. Such practices would allow the development of genuinely green metaheuristics, where efficiency is explicitly balanced with solution quality.
7.5. Open Science and Reproducibility as Infrastructure
Challenges in reproducibility and benchmarking (
Section 6.4) are increasingly limiting the pace of scientific progress in ML-enhanced metaheuristics. Moving forward, the field would benefit from infrastructure similar to OpenML or Hugging Face, but tailored to optimization—hosting optimizers, benchmark suites, training traces, and diagnostic tools. Such resources would enable standardized comparisons, facilitate reuse, and promote the transition from fragmented experimentation to cumulative, inspectable science.
Concretely, reproducibility in optimization can be advanced by adopting protocols already gaining traction in the ML community. Initiatives like MLPerf Logging advocate for machine-readable experiment descriptors that capture seeds, hardware context, and full parameter settings, ensuring exact replication across environments [
109]. Containerization solutions (Docker, Singularity) and workflow managers (Nextflow, Snakemake) have also proven effective for encapsulating dependencies and automating end-to-end pipelines [
110]. In the specific context of optimization, frameworks such as Benchopt provide collaborative, language- and hardware-agnostic benchmarking tools that facilitate experiment sharing, enforce standard protocols, and support reproducible performance tracking [
104,
111]. Proposals for standardized benchmark repositories with version control, inclusion of raw evaluation traces, and mandatory release of trained models are beginning to emerge [
104]. Embedding such practices into an open infrastructure would move beyond aspirational sharing toward a verifiable ecosystem where ML-enhanced metaheuristics can be reliably compared, extended, and reused.
7.6. Real-World Applications
While much of the current literature on ML-enhanced metaheuristics has been benchmark-driven, an important future direction is the systematic application of these methods to real-world optimization problems. As the field matures, extending beyond synthetic test suites will be essential to demonstrate robustness, scalability, and impact. Promising examples already exist: Chung et al. [
52] validated surrogate-assisted frameworks on engineering design tasks; Johnn et al. [
49] applied graph-based reinforcement learning to vehicle routing in logistics; energy applications include wind farm layout and microgrid optimization [
112]; and in bioinformatics, hybrid evolutionary–deep learning methods have supported protein structure prediction and gene regulatory network inference [
113,
114]. Expanding this line of work will require closer collaboration with domain experts, integration with physical simulators and experimental pipelines, and new strategies for handling noisy, high-dimensional, and heterogeneous data typical of applied settings. Systematically advancing ML-enhanced metaheuristics from benchmark success to domain impact therefore represents one of the most important opportunities for the field.
8. Conclusions
While this review has focused on how machine learning techniques enhance metaheuristics for global optimization, it is important to acknowledge the complementary direction. Metaheuristic algorithms have also been widely applied to improve machine learning models themselves, including tasks such as hyperparameter tuning, feature selection, and data partitioning. These applications are beyond the scope of the present review but are well-documented in the literature. For surveys on this complementary perspective, see [
115,
116,
117].
This review has explored the rapidly evolving intersection between machine learning and metaheuristic optimization—a convergence that is reshaping how complex, black-box, and resource-constrained problems are approached. Motivated by the limitations of traditional metaheuristics in dynamic, high-dimensional, and data-scarce settings, researchers are increasingly incorporating learning mechanisms at every level of the optimization process.
We proposed a taxonomy that categorizes machine learning contributions according to their functional role within metaheuristics—ranging from operator selection and parameter control to surrogate modeling, landscape learning, and meta-learning. This framework helps clarify how learning enhances adaptability, scalability, and efficiency across diverse optimization scenarios. At the same time, our taxonomy should be viewed in the broader context of AI optimization. Reinforcement learning and domain-specific AI optimizers are already advancing industrial and energy applications. By focusing on the functional role of learning inside metaheuristics, this review highlights methodological innovations that can complement and eventually generalize those application-driven advances.
Throughout the paper, we reviewed a wide spectrum of techniques: surrogate models that guide search with predictive accuracy and interpretability; reinforcement learning agents that control variation and hybridize strategies; representation learning techniques that uncover useful structure in decision spaces; and meta-learned optimizers that transfer knowledge across tasks. Emerging paradigms such as diffusion models, foundation models, and neuro-symbolic hybrids signal an expansion into new modes of optimization that blend reasoning, generation, and adaptation.
Beyond performance improvements, these developments challenge long-standing boundaries: between algorithm and model, between offline configuration and online control, and between static heuristics and self-improving systems. As ML-enhanced metaheuristics become more autonomous and context-aware, they open the door to more general and intelligent optimization frameworks.
Realizing this vision will require greater collaboration across communities, along with a sustained focus on benchmarking, reproducibility, and theoretical foundations. With shared infrastructure and principled abstractions, the field can evolve from a collection of isolated heuristics into a cumulative science of adaptive optimization.
In the long run, the most impactful optimizers may not be those with the best hand-crafted rules—but those that learn how to learn and adapt how to search.