Exploring Tabu Tenure Policies with Machine Learning

Konovalenko, Anna; Hvattum, Lars Magnus

doi:10.3390/electronics14132642

Open AccessArticle

Exploring Tabu Tenure Policies with Machine Learning

by

Anna Konovalenko

and

Lars Magnus Hvattum

^*

Faculty of Logistics, Molde University College, 6410 Molde, Norway

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(13), 2642; https://doi.org/10.3390/electronics14132642

Submission received: 13 May 2025 / Revised: 26 June 2025 / Accepted: 27 June 2025 / Published: 30 June 2025

(This article belongs to the Special Issue Advances in Algorithm Optimization and Computational Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Tabu search is a well-known local search-based metaheuristic, widely used for tackling complex combinatorial optimization problems. As with other metaheuristics, its performance is sensitive to parameter configurations, requiring careful tuning. Among the critical parameters of tabu search is the tabu tenure. This study aims to identify key search attributes and instance characteristics that can help establish comprehensive guidelines for a robust tabu tenure policy. First, a review different tabu tenure policies is provided. Next, critical baselines to understand the fundamental relationship between tabu tenure settings and solution quality are established. We verified that generalizable parameter selection rules provide value when implementing metaheuristic frameworks, specifically showing that a more robust tabu tenure policy can be achieved by considering whether a move is improving or non-improving. Finally, we explore the integration of machine learning techniques that exploits both dynamic search attributes and static instance characteristics to obtain effective and robust tabu tenure policies. A statistical analysis confirms that the integration of machine learning yields statistically significant performance gains, achieving a mean improvement of 12.23 (standard deviation 137.25,

n =

10,000 observations) when compared to a standard randomized tabu tenure selection (p-value < 0.001). While the integration of machine learning introduces additional computational overhead, it may be justified in scenarios where heuristics are repeatedly applied to structurally similar problem instances, and even small improvements in solution quality can accumulate to large overall gains. Nonetheless, our methods have limitations. The influence of the tabu tenure parameter is difficult to detect in real time during the search process, complicating the reliable identification of when and how tenure adjustments impact search performance. Additionally, the proposed policies exhibit similar performance on the chosen instances, further complicating the evaluation and differentiation of policy effectiveness.

Keywords:

tabu search; parameter tuning; metaheuristic; local search; learning-based optimization

1. Introduction

Tabu search (TS) is a widely recognized metaheuristic used to address complex optimization problems in areas such as scheduling, routing, facility location, and resource allocation [1,2,3,4]. The metaheuristic falls under the category of local search-based algorithms that explore the solution space by iteratively making small changes to a current solution. Unlike traditional local search-based methods, TS can effectively escape local optima through its distinctive memory mechanisms, enabling the search process to be guided using historical information. A common memory element is a tabu list that stores forbidden attributes of moves or solutions. The algorithm refers to this list to avoid revisiting previously explored solutions, enabling efficient exploration of the solution space. A crucial parameter in this process, called tabu tenure, represents the number of iterations that an attribute remains in the tabu list. By determining how long the attribute remains forbidden, the tabu tenure balances between intensifying the search in promising regions and diversifying into unexplored areas.

Numerous variants and extensions of the basic TS algorithm have been proposed in the literature [5]. These extensions enhance different aspects of the search process, particularly the organization of neighborhood structures and strategies to select moves within these neighborhoods. Among these variations, methods for setting the tabu tenure parameter are important as well as the search direction of the algorithm relies on this parameter. Varying the tabu tenure during the search may maintain a balance between intensification and diversification of the search, allowing for a thorough examination of one region before moving to a different part of the solution space. The approaches proposed for the implementation of tabu tenure policies range from random selection within a predefined interval [5] to approaches that adapt based on the quality of the solution [6] or the impact of the last move of the search [7]. While these policies can improve search dynamics for some instances, determining general rules for effective tenure policies remains challenging, and a universally optimal tabu tenure policy has been established across different problem types and instances.

This paper provides a study of different tabu tenure policies in TS and proposes methods to identify key elements of a robust tabu tenure policy. The research is motivated by the findings of Glover and Laguna [5], who applied target analysis on TS by first establishing high-performing solutions and then evaluating how varying decision mechanisms influence search trajectories. Their analysis revealed that improving moves have a higher probability to introduce attributes of optimal solutions compared to non-improving moves. Moves made when getting closer to a local optimum tend to make the current solution more similar to optimal solutions, while moves made when moving away from a local optimum do not demonstrate this property. Therefore, we believe that the type of local search move—whether it is classified as improving or non-improving—can be used to formulate the robust tabu tenure policy. A recent study by Sugimura and Parizy [8] further strengthens the argument that tabu tenure policy should depend on specific search characteristics. They propose a new metric called “unique move”, which is based on the number of decision variables changed within a defined number of search iterations. Their analysis reveals characteristic patterns in how this metric responds to changes in tabu tenure and demonstrates that using this metric to set tabu tenure improves the overall performance of tabu search. Our study investigates how to automatically identify the key factors that may influence optimal tabu tenure settings and proposes methods to optimize tenure policies based on these factors. To achieve this objective, we explore the integration of machine learning (ML) techniques to find an effective tabu tenure policy and identify search attributes with instance characteristics that can contribute to policy effectiveness. To test our approaches, we implement these policies within a TS algorithm for the optimum satisfiability problem (OptSAT) [9]. The problem is an optimization version of the well-known Boolean satisfiability (SAT) problem [10]. OptSAT’s mathematical formulation falls under the category of binary optimization problems [11].

The remainder of the paper is organized as follows. Section 2 presents theoretical background of the main components of the study. Section 3 reviews related studies on tabu tenure policies in the TS algorithm and parameter tuning with ML. Section 4 discusses the methodology of our proposed ML approaches. Section 5 details our computational experiments and examines the limitations of these methods. Finally, Section 6 concludes the paper with a summary and limitations of the findings.

2. Background

This section provides a brief summary of OptSAT, TS, and ML.

2.1. Optimum Satisfiability Problem

OptSAT was first presented by Davoine et al. [9], who at that time referred to it as the Boolean optimization problem. The problem extends the classical SAT problem by adding an objective function to be optimized. The standard SAT determines whether there exists an assignment of truth values to a set of Boolean variables that satisfies a given Boolean formula. The SAT problem focuses on finding any satisfying assignment that fulfills a Boolean formula, whereas OptSAT goes further by optimizing a specified objective function while still maintaining the Boolean constraint requirements.

The problem can be expressed as follows. Let

x_{j}

,

j \in {1, \dots, n}

represent Boolean variables with possible assignments in

{t r u e, f a l s e}

. The problem is subject to satisfying a Boolean formula

ϕ

:

ϕ (x_{1}, \dots, x_{n}) = C_{1} \land \dots \land C_{m} = t r u e,

(1)

where each

C_{i}

,

i \in {1, \dots, m}

is called a clause. A clause is a disjunction of Boolean variables

x_{j}

and their negations

\bar{x_{j}}

. In particular, for each clause

C_{i}

, we can write

C_{i} = (\underset{j \in P_{i}}{⋁} x_{j}) \lor (\underset{j \in N_{i}}{⋁} \bar{x_{j}}),

(2)

where

P_{i} \subset {1, \dots, n}

denotes the set of indices for variables appearing in their positive form, and

N_{i} \subset {1, \dots, n}

indicates the set of indices for variables appearing in their negated form.

In addition to satisfying logical constraints, OptSAT introduces an optimization component. OptSAT seeks to maximize a weighted sum of the variables that are set to

t r u e

:

max z = \sum_{j = 1}^{n} h (x_{j}),

(3)

where

h : {x_{1}, \dots, x_{n}} \mapsto R^{+}

is a function defined as follows:

h (x_{j}) = \{\begin{matrix} c_{j}, & if x_{j} = t r u e, \\ 0, & if x_{j} = f a l s e . \end{matrix}

(4)

Here, each Boolean variable

x_{j}

corresponds to a non-negative coefficient

c_{j} \in R^{+}

,

j \in {1, \dots, n}

, which adds to the objective function only when the variable receives the value

t r u e

.

With the following modifications, OptSAT can be written as an integer linear programming problem (ILP). Each clause becomes a constraint by replacing Boolean variables with binary values (

t r u e = 1

,

f a l s e = 0

), negated variables

\bar{x_{j}}

with

(1 - x_{j})

, and disjunction operations with +. The objective function is similarly translated by using the binary representation of variables, with each coefficient

c_{j}

serving as the weight in the linear objective. This produces the following model:

\begin{matrix} max z & = \sum_{j = 1}^{n} c_{j} x_{j}, \end{matrix}

(5)

\begin{matrix} \sum_{j \in P_{i}} x_{j} + \sum_{j \in N_{i}} (1 - x_{j}) & \geq 1, & i \in {1, \dots, m}, \end{matrix}

(6)

\begin{matrix} x_{j} & \in {0, 1}, & j \in {1, \dots, n} \end{matrix}

(7)

2.2. Tabu Search

TS [12,13,14] is a local search-based metaheuristic that uses memory structures to escape local optima and explore the search space more effectively. The algorithm selects the best available move from the current solution’s neighborhood, even if it temporarily worsens the objective function value. To prevent cycling back to recently visited solutions, it maintains a tabu list that records attributes changed in recent moves. We apply the tabu search algorithm to address the OptSAT problem. Our implementation is based on a tabu search version with dynamic move evaluation previously developed by Bentsen et al. [15]. The following subsection presents the key components of the algorithm.

In the context of the OptSAT transformed to an ILP, a solution x represents a complete assignment of binary values to all variables in the problem, where each variable

x_{j} \in {0, 1}

is for

j \in {1, . . ., n}

. A move is an operation that transforms one solution into another by changing the value of one variable. The algorithm uses a simple flip-neighborhood, consisting of all solutions that can be reached by changing the value of a single variable. If x a solution and

\hat{x}

represents its neighboring solution, the flip-neighborhood

F (x)

of solution x is

F (x) = {\hat{x} : \sum_{j = 1}^{n} | x_{j} - {\hat{x}}_{j} | = 1}

(8)

which ensures that any neighboring solution

\hat{x}

can be obtained from x by flipping exactly one binary variable.

When a move is executed, and some variable

x_{j}

is flipped, that variable becomes tabu for a number of iterations determined by the tabu tenure. The tabu status prevents the variable from being flipped again until its tenure expires. As this mechanism is restrictive, it may prevent the exploration of high-quality solutions. Therefore, an aspiration criterion is implemented: the tabu status of a variable is ignored if flipping it creates a solution better than the best one found so far.

To enable exploration through infeasible intermediate solutions, a dynamic move evaluation function is employed. For a solution x, the evaluation of flipping variable

x_{j}

is given by

Δ_{j} (x) = Δ_{j}^{Z} (x) + w Δ_{j}^{V} (x),

(9)

where

Δ_{j}^{Z} (x)

represents the change in the objective function value, and

Δ_{j}^{V} (x)

represents the change in the violations of constraints. The parameter

w \in R^{+}

controls the trade-off between improving the objective function and maintaining feasibility.

The algorithm operates iteratively, maintaining a sequence of solutions

{x^{1}, x^{2}, . . ., x^{t}, . . .}

, where

x^{t}

represents the solution at iteration t, and

f (x^{t})

denotes its objective function value. The algorithm begins with an initial solution

x^{1}

, which is generated randomly by assigning each variable

x_{j}^{1}

a value of 0 or 1 with equal probability.

At each iteration t, the algorithm selects the best move

j_{t}^{*}

as

j_{t}^{*} = arg max_{j \in Ω (x^{t})} Δ_{j} (x^{t})

(10)

where

Ω (x^{t})

is the set of variables that can be flipped at iteration t (non-tabu variables or those satisfying the aspiration criterion). The algorithm selects moves based on their dynamic move evaluation function values, choosing to flip the variables that produce the highest value. The value of the best move at iteration t can be denoted as

Δ_{t}^{*} = Δ_{j_{t}^{*}} (x^{t})

. The search process continues until a stopping criterion is met, such as reaching a maximum number of iterations of the search episode, with the algorithm returning the best feasible solution encountered

x^{*}

with objective value

f (x^{*})

. While this description covers the basic TS elements, the complete algorithm by Bentsen et al. [15] incorporates additional components, including an exponential extrapolation memory and restart strategies, which are detailed in the original research paper.

2.3. Machine Learning

ML is a subfield of artificial intelligence that provides systems with the ability to “learn” and improve their performance from experience automatically without being specifically programmed [16]. ML algorithms can find complex patterns from existing data and use these patterns to make predictions on unseen data. According to Bishop and Nasrabadi [17], ML algorithms can be categorized into supervised learning (SL), unsupervised learning, and reinforcement learning (RL). In SL, the input values (features) and their associated output values (labels) are provided in advance, with the algorithm working to discover the mapping between them to make predictions for unseen inputs, while unsupervised learning algorithms discover hidden patterns and structures in unlabeled data. RL is based on a system that learns to make good decisions through trial and error. This system operates through an agent–environment interaction, where an agent is the decision-making entity that observes the environment, selects actions, and receives feedback to improve its performance. The environment represents the problem space in which the agent operates. The framework uses Markov decision processes (MDPs) as its mathematical foundation. RL consists of several key components: environmental states representing current settings, possible agent actions, reward mechanisms providing quality feedback, and transition functions determining how states evolve after actions. The agent takes actions, observes the consequences in the form of rewards, and adjusts its strategy to maximize long-term rewards. In the end, the agent develops a policy for selecting actions that lead to the best outcomes in different situations.

3. Literature Review

3.1. Tabu Tenure

Methods for determining the tabu tenure can be classified into three categories: static, dynamic, and reactive. A static tabu tenure is one that does not change during the search process. This method is used in the original version of tabu search [12] where the tabu tenure is chosen once and remains fixed during the search. Despite its ease of implementation, static tabu tenure has notable drawbacks: small values often lead to cycling, recognizable by repeated objective function values, while large values can lead to an insufficient level of intensification in the search and thereby prevent the discovery of high-quality solutions.

In contrast to a static approach, a dynamic tabu tenure implies that the tabu tenure applied in each iteration changes throughout the search. The simplest approach is to select a value from a predefined interval at each iteration. This value can be chosen in several ways: as a random value from a fixed interval determined through experimentation [15,18], as a value from a sequence of predetermined values [5], or as a random value from a range based on instance characteristics, where the interval depends on the number of variables [19]. Due to its ease of implementation while ensuring the dynamic nature of the tenure, selecting a random value from a fixed interval is often implemented in tabu search algorithms. Another type of dynamic tenure is time-dependent, where the tabu tenure value is adjusted according to elapsed time or iteration count, typically decreasing over time to reduce the diversification level [20].

Advanced dynamic approaches consider setting tabu tenure based on the current search state. Blöchliger [21] classifies advanced dynamic tabu tenures into two main categories. The first category determines tenure as a function of solution quality (solution-based), where the tenure is proportional to either the current solution’s constraint violations [22] or the cost function [6]. The second category sets tenure as a function of the last move’s impact (move-based), measured by the number of affected constraints or contributions to the objective function. For example, in the article by Vasquez and Hao [23], the number of iterations during which the move is classified as “tabu” is dynamically defined based on the number of constraints containing the variable of the move and the number of times the variable of the move is flipped from the beginning of the search. Some of the literature has focused on combining solution-based and move-based approaches. Lü and Hao [24] proposed a hybrid strategy for scheduling problems where the tabu tenure adapts based on both current solution quality and move frequency. Their method normalizes the move frequency using a coefficient that relates the number of constraints involved with the variable of the move to the total number of variables. Løkketangen and Olsson [7] discovered an integrated approach where the tabu tenure depends on three factors: changes in the objective value, changes in the number of constraint violations, and the objective value coefficient for the move’s variable.

The last category of defining tabu tenure policy is called reactive. Reactive tabu tenure represents a more sophisticated approach to determining tabu tenure by considering the entire search history, rather than just the most recent move or solution. Introduced by Battiti and Tecchiolli [25], this approach tracks previously visited solutions, allowing the algorithm to verify after each move whether the current solution has already been found. If this is the case, the tabu tenure is increased. Otherwise, the tabu tenure value is decreased when no repetition has occurred for a sufficiently long time. Since storing all previously visited solutions is computationally expensive in terms of memory usage, some alternative schemes have been proposed in the literature. Blöchliger [21] proposed two simplified reactive tabu tenure schemes. In the first scheme, the algorithm monitors changes in the objective function value over a set number of iterations. If the difference between maximum and minimum values falls below a threshold, the tabu tenure is increased; otherwise, it is decreased. The second scheme uses a reference solution and a distance measurement to detect cycles without storing previous solutions. When the distance between the reference and current solution is zero, the tabu tenure is increased; otherwise, it is gradually decreased. Devarenne et al. [6] proposed a reactive tabu tenure scheme that determines tenure based on the search history of individual variable. The approach identifies loops when a variable is frequently selected within recent iterations. This method combines elements of both random and reactive tabu tenure schemes. Additionally, Glover and Hao [26] mention a reactive tabu tenure strategy based on oscillation patterns. This approach periodically increases the tenure by increments until reaching a maximum value and then returns it to its normal range, either immediately or gradually. However, no specific implementation rules are provided.

To provide a structured overview of the literature discussed above, we summarize the various tabu tenure policies and their components in Table 1. The table is organized into two main sections. The first section (columns 2–4) identifies the fundamental types of tabu tenure categories: static, dynamic, and reactive. The second section (columns 5–9) outlines the specific components involved in defining the tabu tenure policy. These components include “Fixed” (using a constant in the policy function), “Random” (selecting random values from predetermined ranges), “Solution” (considering the solution’s quality either at the current state for dynamic or throughout the entire search for reactive policies), “Move” (incorporating characteristics related to the last move/all moves of the search), and “Time” (modifying the policy based on temporal aspects of the search). Some studies combine multiple components, providing complex functions as tabu tenure policies.

Importantly, no single tenure strategy has been shown to consistently outperform the others across all problem types or instances. The effectiveness of a particular tabu tenure policy is highly dependent on the characteristics of the solution space and the specific instance being solved. Choosing an appropriate tabu tenure strategy requires careful consideration of factors such as how well it adapts to the search landscape, the ease of implementation, and the computational resources it demands. While a static tabu tenure is easy to implement, it lacks adaptability and is highly sensitive to parameter settings. A dynamic tabu tenure offers better flexibility by adjusting to the search process, but it often requires careful parameter tuning and can behave unpredictably if it is not properly calibrated. Reactive tabu tenure strategies provide the highest level of responsiveness to the search history but are more complex to implement and typically demand greater computational resources. Ultimately, the choice and configuration of the tabu tenure play a role in guiding the balance between exploration and exploitation and have a direct impact on the efficiency and quality of the search.

3.2. Machine Learning for Parameter Setting

Any metaheuristic has a set of parameters that can be adjusted before the search process begins. Karimi-Mamaghan et al. [27] differentiated between two main ways of determining parameters: parameter tuning and parameter control. Parameter tuning identifies the appropriate parameter levels before using the algorithm to solve the problem at hand, and the parameters remain unchanged during the search. ML can then, for example, be used to predict which heuristic will perform best on a given instance of an optimization problem [28]. Parameter control, in turn, adjusts parameter levels during the search and can be developed based on the behavior of the search as different parameter settings might be more suitable at different stages of the search process.

Determining the tabu tenure during the search process can be approached through parameter control methods by defining a policy (a function that returns appropriate tabu tenure values). ML techniques can be employed to control the tabu tenure parameter by using feedback information on the performance during the search process. The underlying reason is that the techniques can learn complex relationships between search states and parameter settings from historical performance data. Among ML techniques applied to parameter control during search, RL approaches have been used most frequently [27]. The advantage of this approach is that an RL agent can approximate an optimal parameter policy from interactions with its environment where parameters evolve during the search. For example, in a study by Quevedo et al. [29], RL is used for parameter control in a genetic algorithm to solve a vehicle routing problem. In Benlic et al. [30], the authors propose a new parameter control mechanism in breakout local search for the vertex separator problem. SL techniques can also be applied for parameter control, as shown by Aleti et al. [31]. Although Niroumandrad et al. [32] used classification algorithms to guide move selection during different phases of TS, their approach also provides inspiration for classification-based parameter control.

In general, no single methodology for defining tabu tenures has been shown to be superior across all types of problem and instances. Instead, the literature has defined a wide range of tabu tenure policies. In this study, our aim is to investigate whether ML techniques can contribute to establishing good policies for determining tabu tenures. To the best of our knowledge, our study is a novel contribution to the literature in terms of investigating the setting of the tabu tenure policy during the search with ML techniques.

4. Methodology

This study addresses the determination of robust tabu tenure policies for TS through ML methods. Our aim is to identify factors that influence tabu tenure policy effectiveness and enhance TS performance on OptSAT problems. We first categorize search and instance characteristics that potentially influence tabu tenure policy performance and then establish critical baselines to understand how different tabu tenure settings affect solution quality. Following this, we introduce two ML methods: (1) an RL framework to formulate a self-adaptive tabu tenure policy, and (2) an SL model to predict an effective tabu tenure policy for the next search period. First, we propose that RL can be used for the task of learning adaptive tabu tenure policy as it specializes in sequential decision-making problems where the impact of decisions may not be immediately visible. The self-adaptive nature of this approach allows learning the policy based on feedback from search performance, while the choice of tabu tenure at each step affects not only the immediate search direction but also future search trajectories. Second, we formulate an SL approach to determine the best-performing policy for the next search stage, taking into account the characteristics of the ongoing search process and instance. A search stage is defined as a sequence of search iterations. The problem is considered as a classification task in which the model learns to select the best-performing policy for the next search stage from a set of promising policies, based on historical performance patterns.

4.1. Input Characteristics for Tabu Policies

Identifying the key factors that influence the effectiveness of tabu tenure is fundamental to developing adaptive policies. This helps determine which elements significantly impact policy performance and can later be generalized to other optimization problems. In ML, these factors serve as input characteristics (state representation in RL or features in SL). In this research, we propose that ML methods make predictions based on selected characteristics of the instance and the current search progress. We identify and categorize these characteristics into four groups, as shown in Table 2. They capture different aspects of the search process and instance structure that may affect the choice of tabu tenure policy. Instance-related characteristics capture the structural aspects of the instance, including size, complexity, and constraint properties. The number of variables, n, represents the problem size and search space dimensionality. Larger instances typically require longer tabu tenures to help the search escape from local attractors in the search space. The number of clauses, m, reflects the constraint density and tightness of the feasible region. Highly constrained problems may require shorter tabu tenures for flexibility in finding feasible solutions, while less-constrained problems may benefit from longer tenures to avoid cycling. The number of non-zeros, z, measures constraint matrix sparsity and indicates variable participation in constraints. Denser matrices create more variable interdependencies, suggesting that tabu decisions may have broader implications and require longer tenure periods for effective exploration. The left-hand side density,

ρ_{l h s}

, quantifies the overall constraint structure complexity as the ratio of non-zero coefficients to total possible coefficients. A higher density indicates complex variable interactions that may require adaptive tenure strategies that can respond to local constraint structures. The average number of literals per clause,

l_{c}

, captures individual constraint complexity. Clauses with more literals create complex logical relationships that may require longer tabu tenures to prevent entrapment in locally optimal configurations.

Search-related characteristics monitor algorithm behavior and provide real-time search performance feedback. The best value of the dynamic move evaluation function,

Δ_{t}^{*}

, indicates the quality of the best available move, helping to determine whether the current search region offers good opportunities for improvement or requires more aggressive exploration strategies. The number of local optima in recent iterations,

n_{i n t}

, tracks search stagnation. High values may suggest that the search has become trapped, thus requiring longer tabu tenures for diversification, while low values may indicate good progress warranting shorter tabu tenures to enable more intensification. The frequency of applying the aspiration criterion,

a s p_{t}

, measures how often the aspiration mechanism overrides the tabu restrictions. Frequent use may suggest that lower tabu tenures should be used, while infrequent use indicates that longer tabu tenures could be possible. The number of variables currently being tabu,

t a b_{t}

, tracks how current tabu restrictions impact the accessible search space. High numbers may imply that few alternative moves are available to the search, potentially requiring the decrease in the tabu tenure to improve search flexibility. The search progress is coded as t. In a search, early iterations may benefit from shorter tabu tenures while making quick improvements, while later iterations might require longer tabu tenures to escape well-explored regions. The final search-related characteristic represents the recently used tabu tenure,

a_{t - 1}

, which allows learning algorithms to understand temporal policy sequences and their cumulative effects. Characteristics such as the time, t, and the recent policy,

a_{t - 1}

, have different implementations for each of the methods and are described later.

We start our search from a random initial solution using the policy proposed by Bentsen et al. [15], which selects the tabu tenure as a random value from the interval

[T_{m i n}, T_{m a x}]

for

t_{i n i t}

iterations. Evaluation of our tabu search performance begins at the

t_{i n i t} + 1

iteration of the search. This approach is designed to decrease the influence of the initial TS solution on the search process and to better evaluate the influence of the tabu tenure policy on the search.

4.2. Baseline Analysis of Tabu Tenure Policies

Before developing our ML approaches, we establish critical baselines to understand the fundamental relationship between tabu tenure settings and solution quality. These baselines serve two essential purposes:

1.: First, we analyze how different tabu tenure ranges affect solution quality to establish baseline performance and determine promising ranges for further investigation. This analysis identifies the sensitivity of the OptSAT problem to tabu tenure parameters. The baseline follows the dynamic tabu tenure policy proposed by Bentsen et al. [15], which draws the tabu tenure randomly from an interval defined by two fixed parameters $[T^{M I N}, T^{M A X}]$ with $T^{M I N} = 7$ and $T^{M A X} = 22$ . However, this range is tuned to solve a wide variety of binary integer problems, of which the OptSAT is only one example. Therefore, the parameters $T^{M I N}$ and $T^{M A X}$ are varied to check the influence of different interval ranges on performance. Without understanding this baseline performance, we would lack the context needed to evaluate whether our ML approaches offer meaningful improvements. We have also presented the best solution obtained by a commercial MIP solver to understand the potential room for improvements.
2.: Second, we investigate how specific search characteristics can be leveraged to formulate better tabu tenure policies. We focus particularly on incorporating the type of move (improving vs. non-improving) into policy formulation, motivated by findings from the analysis made by Glover and Laguna [5]. Based on the importance of move types, we have formulated a novel move-type-based policy. It also implements a dynamic tabu tenure, which is drawn randomly from an interval defined through two fixed parameters $[T^{M I N}, T^{M A X}]$ but with differentiating ranges based on move type. To find the policy, we assign different tenure ranges based on whether a move is improving or non-improving and determine which configuration gives better performance. The policy can serve as a baseline that can be compared against ML-derived policies and potentially included among promising policies for the SL method.

4.3. Self-Adaptive Policy with Reinforcement Learning

We develop a self-adaptive policy that automatically selects an appropriate tabu tenure value based on current search characteristics. To apply RL to this problem, we use its standard formalization as a MDP, which provides the mathematical framework for modeling sequential decision-making. The key components of our MDP formulation are the following:

Decision point: For this method, we propose predicting tabu tenure value at each search iteration, where each iteration corresponds to a timestep $t \in {1, 2, . . ., t_{e p i s o d e}}$ , where $t_{e p i s o d e} + t_{i n i t}$ is the total number of iterations.
State: A state $s_{t} \in S$ at iteration t consists of input characteristics defined in Table 2:

$s_{t} = (n, m, z, ρ_{l h s}, l_{c}, Δ_{t}^{*}, l o_{t}, a s p_{t}, t a b_{t}, t, a_{t - 1})$

(11)

where t defines for current iteration, and $a_{t - 1}$ represents the previously used tabu tenure.
Action: The action $a_{t} \in A$ is to select a tabu tenure value for the current move as a discrete value from $[T^{M I N}, T^{M I N} + 1, . . ., T^{M A X}]$ interval.
Reward: The reward $r_{t}$ is given only at the end of the episode and reflects the quality of solution found during the search.

$r_{t} = \{\begin{matrix} 0 & if t < t_{e p i s o d e} \\ \frac{f (x^{*})}{f_{u b}} & if t = t_{e p i s o d e} \end{matrix}$

(12)

where $f (x^{*})$ is the best solution value found during the episode, and $f_{u b}$ is the best known upper bound for the current problem instance.
Transition function: The state transition preserves the static instance parameters $(n, m, z, ρ_{l h s}, l_{c})$ while updating the search parameters $(Δ_{t}^{*}, l o_{t}, a s p_{t}, t a b_{t})$ . The previous action $a_{t - 1}$ transitions to $a_{t}$ , and the time step increases by one. Thus, the next state is as follows:

$s_{t + 1} = (n, m, z, ρ_{l h s}, l_{c}, Δ_{t + 1}^{*}, l o_{t + 1}, a s p_{t + 1}, t a b_{t + 1}, t + 1, a_{t})$

(13)

To learn an effective policy for dynamically adjusting the tabu tenure, we employ a deep RL algorithm that uses neural networks to approximate the policy function [33]. The learning process involves training the agent on a set of OptSAT instances, allowing it to generalize its tabu tenure policy across different instances and search characteristics. We formulate the policy as a mapping function

π : S \to A

, where S represents the state space containing all possible search states

s_{t}

, and A is the action space consisting of all possible

a_{t}

values. The policy selects the most appropriate tabu tenure value

a_{t}

given the current state of the search

s_{t}

and is represented by a neural network that takes the current state of the search as input and outputs a probability distribution over possible tenure values. The agent gradually refines its policy through iterative updates based on the rewards received, ultimately learning to adapt the tabu tenure dynamically in response to the changing search landscape.

4.4. Policy Selection with Supervised Learning

SL offers an alternative approach to finding robust tabu tenure policies. Rather than directly predicting tabu tenure values, this method predicts the most effective policy type for each search stage based on historical performance data. Figure 1 illustrates how historical observations are collected to enable policy selection. First, we extract search characteristics (features) at the beginning of a search stage. During this stage, various policies are evaluated, and the best-performing policy serves as the label for the features of that stage. Each combination of (features, label) represents an observation of the stage. The search continues for

n_{s t a g e s}

stages, with a new observation being added to the dataset after each stage. However, when all policies achieve identical performance during a search stage, no policy can be identified as the best performer, and thus the stage does not yield an observation.

We end up with a multiclass classification model trained on the collected observations. For each new search stage, we extract the current state characteristics and input them to the trained classifier, which predicts the policy expected to perform best in the upcoming stage. The classifier outputs a probability distribution over all candidate policies, and we select the one with the highest probability. The model is trained offline using observations from various problem instances. Afterwards, the classifier makes policy decisions based on current search characteristics.

Based on our literature review of tabu tenure policies in Section 2, together with a novel policy from Section 4.2, we formulate policies that we believe are the most relevant to our problem. While in the literature review we classified policies into broader categories to distinguish different paradigms, here, we select more specific policies tailored to our context. To formulate each policy, we focus on a single search attribute to better measure its influence. We limit the number of policies to avoid complicating the classification with too many classes. The following set of promising policies serves as labels for our classification model:

Random policy: The tabu tenure value for each iteration is drawn at random from a uniform distribution ${T^{M I N}, T^{M I N + 1}, . . ., T^{M A X}}$ .
Objective coefficient-based policy: The tabu tenure value is dynamically determined using the objective coefficient value $c_{j}, j \in {1, \dots, n}$ of the variable which becomes tabu. The policy scales the coefficient value between its minimum and maximum values of all objective coefficients in the instance and later maps it onto the range between $T^{M I N}$ and $T^{M A X}$ . If the calculated value is less than the average of $T^{M I N}$ and $T^{M A X}$ , it picks a random integer value between $T^{M I N}$ and $⌊\frac{T^{M I N} + T^{M A X}}{2}⌋$ . Otherwise, it picks a random integer between $⌈\frac{T^{M I N} + T^{M A X} + 1}{2}⌉$ .
Frequency-based policy: The tabu tenure value is determined by considering how frequently each variable has been designated as tabu in the past. If a normalized variable’s tabu frequency (normalized by the maximum frequency observed across all variables) exceeds $f r e q_{t h r}$ , the policy assigns a larger tabu tenure by randomly selecting an integer value between $⌈\frac{T^{M I N} + T^{M A X} + 1}{2}⌉$ and $T^{M A X}$ . Otherwise, it assigns a smaller tenure by randomly selecting an integer value between $T^{M I N}$ and $⌊\frac{T^{M I N} + T^{M A X}}{2}⌋$ and $T^{M A X}$ .
Move-type-based policy: The tabu tenure value is based on the type of the move in the current iteration (see Section 4.2). The specific rules of this policy are established empirically through a series of experiments presented in Section 5.2.

For this approach, we modify the feature from Table 2 representing the best value of the dynamic move evaluation function at iteration t (denoted as

Δ_{t}^{*}

). Since we are now operating at the stage level rather than individual iterations, this feature becomes a list of values spanning the entire stage. To reduce dimensionality and capture meaningful patterns, we extract summary statistics from these values—specifically, the mean, standard deviation, minimum, and maximum values of the move evaluation function across the stage. These statistics provide comprehensive information on the search performance in the preceding stage. For time and policy characteristics, t indicates the stage number within the current episode, and

a_{t - 1}

represents the policy used previously for the previous stage. Before training the model with these features, we applied standard scaling to normalize all features to the same range [34].

4.5. Machine Learning Algorithms

For our RL-based self-adaptive approach, we use a proximal policy optimization (PPO) algorithm, a policy gradient algorithm for deep RL [35]. PPO enhances the policy through direct optimization while ensuring that new policy updates do not deviate a lot from previous policies. The algorithm operates through two functions which interact iteratively: the “actor” policy function that guides action selection based on states, and the “critic” value function that evaluates expected rewards for various states. Both functions are parameterized with multi-layer perceptron neural networks. The network processes the tabu search’s current state (represented as a one-dimensional vector) and produces a probabilistic distribution across all available actions. To enhance learning efficiency, we implement state-space normalization, ensuring input features maintain comparable scales. The algorithm incorporates an exploration strategy that balances between exploiting the current best-known policy and exploring better policies.

For our SL-based approach to policy selection, we implement random forest (RF) multiclass classification [36]. This ensemble learning algorithm constructs multiple decision trees during training and outputs the class prediction that is the mode of the individual trees’ predictions. RF is well suited for our multiclass classification problem as it handles non-linear relationships between features and labels. Additionally, its ability to measure feature importance provides insights into which search characteristics most strongly influence policy selection. We evaluate the performance of our SL method using accuracy, which measures the overall correctness of predictions, and the

F 1

-score, which represents the harmonic mean of the model’s ability to identify relevant cases and its ability to avoid false classifications [37].

4.6. Design of Computational Study

The experiments are conducted on a custom-built desktop workstation running Windows 11 Home (Build 22631). The system utilized an AMD Ryzen 9 5950X processor (Advanced Micro Devices, Inc., sourced in London, United Kingdom, purchased via Amazon UK) with 16 physical cores and 32 logical processors operating at a base clock of 3.4 GHz. We utilized Python 3.10.5 for coding, along with the Stable Baselines 3 library [38] for implementing the PPO algorithm and scikit-learn for the RF algorithm [39].

Our computational experiments are structured to demonstrate the influence of tabu tenure parameter on TS performance and to evaluate strategies for setting tabu tenure policies with ML as described in the methodology section. The following experiments are performed:

Examination of how tabu tenure parameter adjustments affect TS performance (Section 5.1);
Assessment of the performance of move-type-based policy (Section 5.2);
Development of self-adaptive policy with RL (Section 5.3);
Evaluation of policy selection approach with ML (Section 5.4).

Before proceeding with our experiments, we present the used instances and the parameter values.

The computational study is performed using benchmark OptSAT instances, which have been used in the existing literature [9,11]. For the experiments, five problem classes are selected: Class 25, Class 27, Class 28, Class 36, and Class 37, each containing five instances. All of these classes have

n = 200

variables; therefore, the instance characteristic n has been removed for experiments in ML models. Class 25 has 400 clauses with each clause containing exactly 10 literals, Class 27 has 1000 clauses with each clause containing exactly 10 literals, Class 28 has 1000 clauses with clause lengths varying between 20 and 60 literals, Class 36 has 1000 clauses with each clause containing exactly 10 literals, and Class 37 has 1000 clauses with clause lengths varying between 20 and 60 literals. The percentage of clauses containing negated variables varies across classes, with Class 25, Class 27, and Class 28 having

25 %

negated variables, while Class 36 and Class 37 have

50 %

negated variables. The training set consists of four instances from each problem class, for a total of 20 instances, while the test set consists of one instance from each problem class, resulting in five instances.

The parameters of the methods are

f r e q_{t h r} = 0.5, t_{i n i t} = 300

, and

t_{e p i s o d e} = {500, 1000}

for the RL method and

t_{e p i s o d e} = 300

and

n_{s t a g e s} = 6

for the SL method. For the RL method, we used hyperparameters corresponding to a training batch

= 128

, learning rate =

4.0 \times 10^{- 4}

, and discount factor =

0.99

, with 64 nodes and two connected layers for each multi-layer perceptron model. For the SL method, the number of trees

= 200

, and the maximum depth of the trees

= 15

. Solutions from a commercial MIP solver were obtained using CPLEX 12.9 [40] with a time limit of four hours (14,400 s).

5. Results and Discussion

In this section, we present our experimental results followed by a discussion of each experiment.

5.1. Influence of Tabu Tenure on Algorithm Performance

To explore the impact of tabu tenure variations, we conducted experiments with various ranges presented in Table 3. The first two columns specify the ranges for tabu tenure selection, while the others specify the family class of the instance. The corresponding values represent the mean of the best-found solution during an episode, with standard deviations over runs for each instance in parentheses. These values are calculated over 2000 different random initial solutions. Each TS episode runs for a fixed number of iterations. The total length of each search is 2100 iterations (with the first

t_{i n i t} = 300

iterations using the policy of Bentsen et al. [15] before switching to the experiment configuration). The last column presents the overall average across all test instances for each tabu tenure range configuration. Higher objective values indicate better solutions. Additionally, to establish an upper bound for potential improvement, we solved the instances using the commercial solver CPLEX. The “CPLEX”-row in Table 3 reports the best solution obtained during the run, and the “Best gap”-row shows the percentage difference between CPLEX and the best-performing tabu tenure interval.

The results of range values from Bentsen et al. [15] for selected instances of the OptSAT are summarized in the first row of Table 3. The values in the second and third rows are selected to explore both wider and narrower ranges of tabu tenure. Based on the results, for further experiments, the best-performing range with

T^{M I N} = 1

and

T^{M A X} = 15

has been selected. The overall influence of the tabu tenure on the solution quality appears relatively moderate. Even significant modifications to the tabu tenure range produced evident but not substantial improvements in objective function values. For example, by changing the interval from

[7, 22]

to

[1, 15]

, we observe an improvement of

0.13 %

. However, the tabu tenure remains crucial for preventing cycling and balancing exploration versus exploitation. The modest improvements reflect the nature of our test instances, where the tabu search is highly effective and obtains near-optimal solutions even with very few search iterations, resulting in small gaps of only

0.6 %

to

1.28 %

compared to the best solutions found by CPLEX.

5.2. Move-Type-Based Policy

Table 4 presents the results of four different tabu tenure configurations tested on test instances based on the best-performing range in Section 5.1. The first four columns specify the ranges for tabu tenure selection:

T^{M I N}

and

T^{M A X}

for improving moves (columns 1–2) and non-improving moves (columns 3–4). The experimental configurations and parameters are the same as those used for the experiments in Table 3. The results indicate that for improving moves, the tabu tenure should be larger than for non-improving moves. Therefore, we formulate the novel move-type-based policy in the following way: for non-improving moves, select a random integer value from a smaller range (where

T^{M I N} = 1

and

T^{M A X} = 8

), while for improving moves, select a random integer value from a larger range (where

T^{M I N} = 9

and

T^{M A X} = 15

).

5.3. Policy with Reinforcement Learning

In this section, we present the results of experiments based on the proposed RL method formulated in Section 4.3. Figure 2 shows the mean reward for episodes during the training period for two different episodes length settings

t_{e p i s o d e}

= 500 (green line) and

t_{e p i s o d e}

= 1000 (red line). The reward level shown on the y-axis corresponds to the measure defined in our reward function, where values closer to 1 indicate solutions approaching the best known upper bound for the problem instance, while the x-axis tracks the training progress in minutes. For a longer episode duration, the training period has been increased proportionally to the increased episode length to ensure a similar number of episodes for comparable training curves. The red line maintains consistently higher performance around the

0.8

–

0.85

reward level with minimal fluctuations, while the green line operates at a lower performance level around

0.6

with higher variance.

This performance difference can be explained by the algorithm’s ability to perform a more thorough search of the solution space during longer episodes, thus finding higher values of the objective function. Neither training curve demonstrates an upward trend over time, suggesting that no substantial learning is occurring in either configuration. There could be different reasons for this, as discussed in the following. Despite theoretical promise, the deep RL approach can struggle to demonstrate learning improvement over time in experiments, which aligns with the practical implementation of RL with function approximation [41,42]. In this study, we identify several factors that probably contributed to this outcome for our problem:

1.: Problem complexity: As shown in Section 5.1, the impact of using different tabu tenure policies can be relatively small. The nature of local search means that promising policies may be lost during training due to the many factors that affect search trajectories.
2.: Delayed and sparse reward structure: With rewards provided only at episode end, the algorithm struggles to correlate specific tabu tenure choices with their long-term effects, creating a credit assignment problem [43,44].
3.: Action similarity: Adjacent tabu tenure values often produce similar search behaviors, making it difficult for the algorithm to distinguish performance differences between similar actions.
4.: State representation limitations: Our state representation uses general search metrics rather than detailed move histories. This abstraction may lack information to fully characterize the search state.

Each of these factors or a combination of them can be a reason why RL training struggle to improve over time. Hence, we did not pursue the RL approach for determining a policy of setting tabu tenures further.

5.4. Policy with Supervised Learning

In this section, we present the results of our experiments for multiclass classification to predict the best-performing tabu tenure policy for the next search stage. The data collection process took approximately 600 min, during which we executed 200 independent search runs for each training instance, each starting from a different random initial solution. As mentioned in Section 5.4, we excluded stages where all policies achieved identical performance since no clear winner could be identified. This resulted in a final dataset of 3120 valid observations. Each TS episode consisted of 2100 iterations in total, calculated as

t_{i n i t} + t_{e p i s o d e} \times n_{s t a g e s} = 300 + 300 \times 6 = 2100

. The class distribution of the policies in our dataset is

26 %

of the total dataset for random policy,

22 %

for objective coefficient-based policy,

23 %

for frequency-based policy, and

27 %

for move-type-based policy. We split our dataset, where we used

80 %

of the data for training and

20 %

for testing. The multiclass classifier achieved

28 %

accuracy across the four classes. To evaluate the performance of each individual class, we calculated class-specific

F 1

-scores [45]. The class representing the move-type-based showed the strongest performance with an

F 1

-score of

0.36

, while the class representing the random policy performed with an

F 1

-score of

0.32

. The model struggled more with classifying the two other policies (

F 1

-scores of

0.22

and

0.14

, respectively). To understand which features most influenced our model’s predictions, we analyze feature importance values derived from our classifier. We found that the added features related to the move evaluation function are the most influential (mean value shows the highest importance at

16 %

). Following these, the search characteristics (

l o_{t}

,

a s p_{t}

, and

t a b_{t}

) show moderate importance. Instance characteristics had minimal influence on the predictions, probably due to the similar structural properties of the selected OptSAT benchmark instances.

Table 5 presents results from applying each of the tabu tenure policies on the test instances. As in the previous experiments, the corresponding values represent the mean of the best-found solution during the search (with standard deviations in parentheses). These values are calculated over 2000 different random initial solutions. The first row shows performance when using the predicted policy for each stage, while other rows display results when consistently applying a single policy from our list of promising policies throughout the search process. To ensure fair comparison with experiments in Section 5.1 and Section 5.2, we maintained a consistent total of 2100 iterations per episode during the evaluation of both the SL method and the individual policies. The computational time when using the SL model for tabu tenure policy predictions increases by approximately 1.5 times per episode due to the additional processing time required to make a prediction at each stage. The values in the table align with the class distribution observed in our dataset, where move-type-based policy and random policy represent the largest portions, followed by objective coefficient-based and frequency-based policies.

Predicting the best tabu tenure policy using a classification model shows an improvement compared to the benchmark method of always selecting the tabu tenure randomly from a fixed interval. A two-sided paired t-test was used to evaluate the significance of this improvement. The test was performed by looking at the difference in solution quality between the two methods, yielding a mean difference of 12.23 and a standard deviation of 137.25 across 10,000 observations, resulting in a test statistic

T = 8.91

and a p-value much smaller than 0.001. This indicates that the improvement is statistically significant and unlikely to have occurred by chance. Despite the improvement, there are several factors that could hinder even greater progress:

1.: Insufficient training data: The relationship between search state and optimal tabu tenure policy is complex, requiring more data to capture meaningful patterns. The current dataset is not enough to build this relationship.
2.: Similar performance across policies: The dataset class distribution shows that different tabu tenure policies perform similarly across problem instances. This small performance gap creates a difficult classification problem where a correct prediction offers minimal benefit. Like in the RL approach, the tabu tenure selection impact involves complex dependencies that can be difficult to capture.
3.: Feature representation limitations: The current feature set may lack expressiveness to distinguish between states where different policies would be optimal. More search-specific features might be necessary to capture information representing search states.
4.: Instance similarity: As instance characteristics had a minimal impact on prediction, we can also attribute this to the similar structural properties of the selected OptSAT benchmark instances.

6. Conclusions

The idea of combining ML with metaheuristics for parameter tuning remains promising [46]. Using ML to learn policy parameters based on data from multiple runs and then applying this policy can also potentially improve heuristic performance. However, several critical factors must be evaluated before implementing this idea. Incorporating ML methods into heuristics requires additional time and resources, affecting the overall efficiency of the heuristic method. Despite this overhead, the ML integration becomes worthwhile when heuristics are applied repeatedly to solve similar problems. In such scenarios, even modest improvements in solution quality can justify the additional computational investment. Furthermore, while the ML component may increase solution times when implemented directly, the research should focus on deriving generalizable rules that can be applied by other researchers to different problem instances, thereby amortizing the learning cost across multiple applications.

Our study reveals that determining a robust tabu tenure policy for OptSAT is a complex problem that extends beyond simple parameter adjustment. The challenge lies not only in identifying optimal parameter ranges but also in understanding the intricate relationships between search behavior, problem characteristics, and parameter influence. Nevertheless, our research supports and verifies the theoretical ideas established by Glover and Laguna [5], suggesting that generalizable parameter selection rules do indeed provide value when implementing metaheuristic frameworks. We demonstrated that a robust tabu tenure policy must account for move characteristics, with our empirical analysis confirming that non-improving moves require shorter tabu tenures compared to improving moves. Furthermore, our statistical analysis confirms that the performance improvements achieved through ML integration are statistically significant when compared to the benchmark approach of random tabu tenure selection. In particular, a paired t-test showed a mean improvement of 12.23 (standard deviation 137.25), resulting in a test statistic of 8.91 and a p-value < 0.001, confirming the robustness of our approach.

Several ML methods for determining robust tabu tenure policy have been investigated, and their limitations have been identified. A key limitation is that the effects of changing tabu tenures are not immediately visible but manifest after some time during the search process. This delayed feedback makes it challenging to evaluate the impact of tenure adjustments in real time and hampers the ability to learn better policies during the search as the algorithm cannot immediately assess whether a particular tenure modification has improved or degraded performance. Additionally, the inherent limitations of ML must be acknowledged, including sensitivity to noise and small datasets. Limited explainability has been pinpointed as another issue when using ML [47,48] but is also a challenge for complex heuristic algorithms [49].

While our results are specific to OptSAT instances and the tabu tenure parameter in tabu search, the methodological framework established in this study provides guidance for future research. The ML-integrated parameter tuning methodology can be extended to broader applications in future research. For example, one can explore the application of the approach to other combinatorial optimization problems beyond OptSAT, investigating whether one can find robust policies for other tabu search parameters using ML techniques, or applying the same methodology to alternative metaheuristics.

Author Contributions

Conceptualization, A.K.; methodology, A.K. and L.M.H.; software, A.K.; validation, A.K.; writing—original draft preparation, A.K.; writing—review and editing, L.M.H.; visualization, A.K.; supervision, L.M.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data are available within the manuscript.

Acknowledgments

The authors thank four anonymous reviewers and the editors for their valuable inputs that helped to improve the initial version of this manuscript. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. During the preparation of this work, the authors used Claude 3 Opus, a large language model provided by Anthropic, in order to improve readability and grammar. After using this service, the authors reviewed and edited the content as needed and take full responsibility for the content.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TS	Tabu Search
ML	Machine Learning
OptSAT	Optimum Satisfiability Problem
SAT	Boolean Satisfiability Problem
ILP	Integer Linear Programming Problem
SL	Supervised Learning
RL	Reinforcement Learning
MDPs	Markov Decision Processes
PPO	Proximal Policy Optimization
RF	Random Forest

References

Laguna, M.; Barnes, J.W.; Glover, F.W. Tabu search methods for a single machine scheduling problem. J. Intell. Manuf. 1991, 2, 63–73. [Google Scholar] [CrossRef]
Taillard, É.; Badeau, P.; Gendreau, M.; Guertin, F.; Potvin, J.Y. A tabu search heuristic for the vehicle routing problem with soft time windows. Transp. Sci. 1997, 31, 170–186. [Google Scholar] [CrossRef]
Sun, M. Solving the uncapacitated facility location problem using tabu search. Comput. Oper. Res. 2006, 33, 2563–2589. [Google Scholar] [CrossRef]
Belfares, L.; Klibi, W.; Lo, N.; Guitouni, A. Multi-objectives tabu search based algorithm for progressive resource allocation. Eur. J. Oper. Res. 2007, 177, 1779–1799. [Google Scholar] [CrossRef]
Glover, F.; Laguna, M. Tabu search. In Handbook of Combinatorial Optimization; Du, D.Z., Pardalos, P.M., Eds.; Springer: Boston, MA, USA, 1998; Volumes 1–3, pp. 2093–2229. [Google Scholar] [CrossRef]
Devarenne, I.; Mabed, H.; Caminada, A. Adaptive tabu tenure computation in local search. In Evolutionary Computation in Combinatorial Optimization: 8th European Conference, EvoCOP 2008, Naples, Italy, 26–28 March 2008; Springer: Berlin/Heidelberg, Germany, 2008; Volume 4972, pp. 1–12. [Google Scholar] [CrossRef]
Løkketangen, A.; Olsson, R. Generating meta-heuristic optimization code using ADATE. J. Heuristics 2010, 16, 911–930. [Google Scholar] [CrossRef]
Sugimura, M.; Parizy, M. A3TUM: Automated Tabu Tenure Tuning by Unique Move for Quadratic Unconstrained Binary Optimization. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, GECCO ’24 Companion, Melbourne, Australia, 14–18 July 2024; pp. 1963–1971. [Google Scholar] [CrossRef]
Davoine, T.; Hammer, P.L.; Vizvári, B. A heuristic for Boolean optimization problems. J. Heuristics 2003, 9, 229–247. [Google Scholar] [CrossRef]
Schaefer, T.J. The complexity of satisfiability problems. In Proceedings of the 10th Annual ACM Symposium on Theory of Computing, San Diego, CA, USA, 1–3 May 1978; pp. 216–226. [Google Scholar]
da Silva, R.; Hvattum, L.M.; Glover, F. Combining solutions of the optimum satisfiability problem using evolutionary tunneling. MENDEL 2020, 26, 23–29. [Google Scholar] [CrossRef]
Glover, F. Tabu search—Part I. ORSA J. Comput. 1989, 1, 190–205. [Google Scholar] [CrossRef]
Arntzen, H.; Hvattum, L.M.; Løkketangen, A. Adaptive memory search for multidemand multidimensional knapsack problems. Comput. Oper. Res. 2006, 33, 2508–2525. [Google Scholar] [CrossRef]
Hvattum, L.M.; Løkketangen, A.; Glover, F. Adaptive memory search for Boolean optimization problems. Discrete Appl. Math. 2004, 142, 99–109. [Google Scholar] [CrossRef]
Bentsen, H.; Hoff, A.; Hvattum, L.M. Exponential extrapolation memory for tabu search. EURO J. Comput. Optim. 2022, 10, 100028. [Google Scholar] [CrossRef]
Sarker, I.H. Machine Learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
Bishop, C.M.; Nasrabadi, N.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006; Volume 4. [Google Scholar]
Danielsen, K.; Hvattum, L.M. Solution-based versus attribute-based tabu search for binary integer programming. Int. Trans. Oper. Res. 2025, 32, 3780–3800. [Google Scholar] [CrossRef]
Bachelet, V.; Talbi, E.G. COSEARCH: A co-evolutionary metaheuristic. In Proceedings of the 2000 Congress on Evolutionary Computation, La Jolla, CA, USA, 16–19 July 2000; Volume 2, pp. 1550–1557. [Google Scholar]
Montemanni, R.; Moon, J.N.J.; Smith, D.H. An improved tabu search algorithm for the fixed-spectrum frequency-assignment problem. IEEE Trans. Veh. Technol. 2003, 52, 891–901. [Google Scholar] [CrossRef]
Blöchliger, I. Suboptimal Colorings and Solution of Large Chromatic Scheduling Problems. Ph.D. Thesis, EPFL, Lausanne, Switzerland, 2005. [Google Scholar] [CrossRef]
Galinier, P.; Hao, J.K. Hybrid evolutionary algorithms for graph coloring. J. Comb. Optim. 1999, 3, 379–397. [Google Scholar] [CrossRef]
Vasquez, M.; Hao, J.K. A “logic-constrained” knapsack formulation and a tabu algorithm for the daily photograph scheduling of an earth observation satellite. Comput. Optim. Appl. 1999, 20, 137–157. [Google Scholar] [CrossRef]
Lü, Z.; Hao, J.K. Adaptive tabu search for course timetabling. Eur. J. Oper. Res. 2010, 200, 235–244. [Google Scholar] [CrossRef]
Battiti, R.; Tecchiolli, G. The reactive tabu search. ORSA J. Comput. 1994, 6, 126–140. [Google Scholar] [CrossRef]
Glover, F.; Hao, J.K. The case for strategic oscillation. Ann. Oper. Res. 2011, 183, 163–173. [Google Scholar] [CrossRef]
Karimi-Mamaghan, M.; Mohammadi, M.; Meyer, P.; Karimi-Mamaghan, A.M.; Talbi, E.G. Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar] [CrossRef]
Kärcher, J.; Meyr, H. A machine learning approach for predicting the best heuristic for a large scaled capacitated lotsizing problem. OR Spectrum 2025, 1–43. [Google Scholar] [CrossRef]
Quevedo, J.; Abdelatti, M.; Imani, F.; Sodhi, M. Using reinforcement learning for tuning genetic algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference Companion, Lille, France, 10–14 July 2021; pp. 1503–1507. [Google Scholar]
Benlic, U.; Epitropakis, M.G.; Burke, E.K. A hybrid breakout local search and reinforcement learning approach to the vertex separator problem. Eur. J. Oper. Res. 2017, 261, 803–818. [Google Scholar] [CrossRef]
Aleti, A.; Moser, I.; Meedeniya, I.; Grunske, L. Choosing the appropriate forecasting model for predictive parameter control. Evol. Comput. 2014, 22, 319–349. [Google Scholar] [CrossRef] [PubMed]
Niroumandrad, N.; Lahrichi, N.; Lodi, A. Learning tabu search algorithms: A scheduling application. Comput. Oper. Res. 2024, 170, 106751. [Google Scholar] [CrossRef]
Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
Zheng, A.; Casari, A. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2018. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Hossin, M.; Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. Int. J. Data Min. Knowl. Manag. Process 2015, 5, 1. [Google Scholar]
Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable reinforcement learning implementations. J. Mach. Learn. Res. 2021, 22, 1–8. [Google Scholar]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
CPLEX Optimizer. 2019. Available online: https://www.ibm.com/support/pages/downloading-ibm-ilog-cplex-enterprise-server-v1290 (accessed on 28 June 2025).
Nikanjam, A.; Morovati, M.M.; Khomh, F.; Braiek, B.B. Faults in deep reinforcement learning programs: A taxonomy and a detection approach. Autom. Softw. Eng. 2022, 29, 8. [Google Scholar] [CrossRef]
Thrun, S.; Schwartz, A. Issues in using function approximation for reinforcement learning. In Proceedings of the 1993 Connectionist Models Summer School; Psychology Press: Hove, UK, 2014; pp. 255–263. [Google Scholar]
Sutton, R.S. Temporal Credit Assignment in Reinforcement Learning. Ph.D. Thesis, University of Massachusetts Amherst, Amherst, MA, USA, 1984. [Google Scholar]
Devidze, R.; Kamalaruban, P.; Singla, A. Exploration-guided reward shaping for reinforcement learning under sparse rewards. Adv. Neural Inf. Process. Syst. 2022, 35, 5829–5842. [Google Scholar]
Sokolova, M.; Lapalme, G. A systematic analysis of performance measures for classification tasks. Inf. Process. Manag. 2009, 45, 427–437. [Google Scholar] [CrossRef]
Birattari, M.; Kacprzyk, J. Tuning Metaheuristics: A Machine Learning Perspective; Springer: Berlin/Heidelberg, Germany, 2009; Volume 197. [Google Scholar]
Malik, M.M. A hierarchy of limitations in machine learning. arXiv 2020, arXiv:1709.06560. [Google Scholar]
Lones, M.A. Avoiding common machine learning pitfalls. Patterns 2024, 5, 101046. [Google Scholar] [CrossRef]
Yates, W.B.; Keedwell, E.C.; Kheiri, A. Explainable optimisation through online and offline hyper-heuristics. ACM Trans. Evol. Learn. 2025, 5, 1–29. [Google Scholar] [CrossRef]

Figure 1. SL policy selection for one search stage.

Figure 2. Mean episode rewards over training time.

Table 1. Classification of selected articles based on tabu tenure policy.

Study	Static	Dynamic	Reactive	Fixed	Random	Solution	Move	Time
Glover [12]	✓			✓
Bentsen et al. [15]		✓			✓
Glover and Laguna [5]		✓			✓
Bachelet and Talbi [19]		✓			✓
Montemanni et al. [20]		✓						✓
Galinier and Hao [22]		✓				✓
Devarenne et al. [6]		✓				✓
Vasquez and Hao [23]		✓					✓
Lü and Hao [24]		✓				✓	✓
Løkketangen and Olsson [7]		✓				✓	✓
Battiti and Tecchiolli [25]			✓			✓
Blöchliger [21]			✓			✓
Devarenne et al. [6]			✓				✓
Glover and Hao [26]			✓					✓

Table 2. Input characteristics.

Category	Characteristic	Variable
Instance	Number of variables	n
	Number of clauses	m
	Number of non-zeros	z
	Left-hand-side density	$ρ_{l h s}$
	Average number of literals in the clause	$l_{c}$
Search	The best value of the dynamic move evaluation function at iteration t	$Δ_{t}^{*}$
	Number of local optima in the last $n_{i n t}$ iterations	$l o_{t}$
	Number of times applying the aspiration criteria in the last $n_{i n t}$ iterations	$a s p_{t}$
	Number of variables currently being tabu	$t a b_{t}$
Time	Identification of the search progress	t
Policy	Policy (tabu tenure) that has been recently used	$a_{t - 1}$

Table 3. Effect of tabu tenure ranges on solution quality, with the best average performance highlighted in bold.

$T^{MIN}$	$T^{MAX}$	Class 25	Class 27	Class 28	Class 36	Class 37	Avg.
7	22	21,904	20,456	22,534	20,120	22,619	21,527
7	22	(101)	(71)	(413)	(66)	(434)
1	15	21,940	20,505	22,546	20,157	22,624	21,554
1	15	(124)	(58)	(446)	(60)	(450)
1	3	21,881	20,481	22,463	20,148	22,567	21,508
1	3	(270)	(94)	(490)	(81)	(480)
CPLEX		22,072	20,598	22,833	20,255	22,917	21,735
Best gap %		$0.60$	$0.45$	$1.26$	$0.48$	$1.28$	$0.83$

Table 4. Impact of move type in the tabu tenure policy, with the best average performance highlighted in bold.

Impr.		Non-Impr.		Class 25	Class 27	Class 28	Class 36	Class 37	Avg.
$T^{MIN}$	$T^{MAX}$	$T^{MIN}$	$T^{MAX}$	Class 25	Class 27	Class 28	Class 36	Class 37	Avg.
1	8	9	15	21,931	20,490	225,36	20,148	22,613	21,544
1	8	9	15	(118)	(63)	(444)	(63)	(453)
9	15	1	8	21,958	20,521	22,551	20175	22,628	21,567
9	15	1	8	(103)	(53)	(448)	(58)	(453)
1	8	1	8	21,960	20,518	22,533	20,173	22,610	21,559
1	8	1	8	(158)	(62)	(461)	(57)	(464)
9	15	9	15	21,926	20,480	22,541	20,137	22,626	21,542
9	15	9	15	(90)	(63)	(429)	(63)	(450)

Table 5. Evaluation of SL approach performance.

Policy	Class 25	Class 27	Class 28	Class 36	Class 37	Avg.
SL model	21,959	20,521	22,554	20,176	22,628	21,568
SL model	(103)	(54)	(446)	(51)	(454)
Random	21,940	20,505	22,546	20,157	22,624	21,554
Random	(124)	(58)	(446)	(60)	(450)
Objective coefficient-based	21,944	20,513	22,527	20,167	22,605	21,551
Objective coefficient-based	(146)	(59)	(460)	(69)	(464)
Frequency-based	21,946	20,509	22,540	20,166	22,615	21,555
Frequency-based	(120)	(58)	(446)	(60)	(456)
Move-type-based	21,958	20,521	22,551	20,175	22,628	21,567
Move-type-based	(103)	(53)	(448)	(58)	(453)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Konovalenko, A.; Hvattum, L.M. Exploring Tabu Tenure Policies with Machine Learning. Electronics 2025, 14, 2642. https://doi.org/10.3390/electronics14132642

AMA Style

Konovalenko A, Hvattum LM. Exploring Tabu Tenure Policies with Machine Learning. Electronics. 2025; 14(13):2642. https://doi.org/10.3390/electronics14132642

Chicago/Turabian Style

Konovalenko, Anna, and Lars Magnus Hvattum. 2025. "Exploring Tabu Tenure Policies with Machine Learning" Electronics 14, no. 13: 2642. https://doi.org/10.3390/electronics14132642

APA Style

Konovalenko, A., & Hvattum, L. M. (2025). Exploring Tabu Tenure Policies with Machine Learning. Electronics, 14(13), 2642. https://doi.org/10.3390/electronics14132642

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Tabu Tenure Policies with Machine Learning

Abstract

1. Introduction

2. Background

2.1. Optimum Satisfiability Problem

2.2. Tabu Search

2.3. Machine Learning

3. Literature Review

3.1. Tabu Tenure

3.2. Machine Learning for Parameter Setting

4. Methodology

4.1. Input Characteristics for Tabu Policies

4.2. Baseline Analysis of Tabu Tenure Policies

4.3. Self-Adaptive Policy with Reinforcement Learning

4.4. Policy Selection with Supervised Learning

4.5. Machine Learning Algorithms

4.6. Design of Computational Study

5. Results and Discussion

5.1. Influence of Tabu Tenure on Algorithm Performance

5.2. Move-Type-Based Policy

5.3. Policy with Reinforcement Learning

5.4. Policy with Supervised Learning

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI