Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation

Cakebread, Jack; Jackson, Warren G.; Karapetyan, Daniel; Parkes, Andrew J.; Özcan, Ender

doi:10.3390/a18080516

Open AccessArticle

Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation

by

Jack Cakebread

,

Warren G. Jackson

,

Daniel Karapetyan

^*

,

Andrew J. Parkes

and

Ender Özcan

^*

School of Computer Science, University of Nottingham, Nottingham NG8 1DY, UK

^*

Authors to whom correspondence should be addressed.

Algorithms 2025, 18(8), 516; https://doi.org/10.3390/a18080516

Submission received: 7 July 2025 / Revised: 8 August 2025 / Accepted: 10 August 2025 / Published: 15 August 2025

(This article belongs to the Special Issue Advances in Algorithms Through Heuristics: Theory, Applications, and Innovations)

Download

Browse Figures

Versions Notes

Abstract

This study introduces a novel train-and-test approach referred to as apprenticeship learning (AL) for generating selection hyper-heuristics to solve the Quadratic Unconstrained Binary Optimisation (QUBO) problem. The primary goal is to automate the design of hyper-heuristics by learning from a state-of-the-art expert and to evaluate whether the apprentice can outperform that expert. The proposed method collects detailed search trace data from the expert and trains the apprentice based on the machine learning models to predict heuristic selection and parameter settings. Multiple data filtering and class balancing techniques are explored to enhance model performance. The empirical results on unseen QUBO instances show that indeed, “student surpasses the teacher”; the hyper-heuristic with the generated heuristic selection not only outperforms the expert but also generalises quite well by solving unseen QUBO instances larger than the ones on which the apprentice was trained. These findings highlight the potential of AL to generalise expert behaviour and improve heuristic search performance.

Keywords:

hyper-heuristics; metaheuristics; heuristics; machine learning; data science

1. Introduction

Many practitioners and researchers often resort to heuristic search methods rather than exact methods while tackling computationally hard optimisation problems. Hyper-heuristics have emerged as general-purpose adaptive search algorithms capable of solving instances with varying characteristics from a domain or even multiple domains [1,2]. Instead of operating directly on the problem instance in question, a selection hyper-heuristic operates on a set of heuristics (low-level heuristics). At each iteration, the hyper-heuristic selects which low-level heuristic to apply to the problem and decides whether to accept the solution generated by the application of the heuristic based on its acceptance criteria (Figure 1). A single-point-based search hyper-heuristic stores a single active solution to the optimisation problem in memory [2], which the low-level heuristics operate on in each iteration. The objective function value of the best solution found across multiple runs is commonly used when comparing the performance of different hyper-heuristics.

HyFlex [3] is a Java framework for designing and implementing selection hyper-heuristics. It provides multiple problem domains (e.g., Bin Packing, Vehicle Routing) and a set of low-level heuristics and problem instances for each domain. The Cross Domain Heuristic Search Challenge (CHeSC) [4] was a competition where competitors were tasked with developing a general selection hyper-heuristic that performed well across multiple HyFlex problem domains. The competition winner, AdapHH [5], is seen as a state-of-the-art HyFlex-implemented hyper-heuristic. A Quadratic Unconstrained Binary Optimisation Problem (QUBO) domain has been implemented for the HyFlex framework [6]. QUBO is a model used to represent NP-hard optimisation problems with a wide range of applications varying from graph partitioning to finance [7,8]. In addition to the long-standing general interest, there is a recent increase in fascination with QUBO because it basically serves as the input language for the D-Wave machines [9], which aim to solve problems using quantum annealing [8,10]. This makes QUBO a good domain for comparisons between classical metaheuristics and quantum annealing methods.

Machine learning (ML) refers to the study of models which can learn how to carry out a task without being explicitly programmed to do so [11]. A supervised classification model takes in a training dataset, which consists of features (input variables) with labelled outputs for each data point, and it is trained to make predictions on the category of the label for unseen data points. A supervised regression model is similar, but the output is a continuous value. Apprenticeship learning (AL), a type of machine learning commonly used in robotics, is the process of learning from an expert how to conduct a certain task in a particular domain [12]. Implementing a selection hyper-heuristic requires expert knowledge and can be time-consuming to develop and improve. Additionally, there are many factors that contribute to the success of a hyper-heuristic, and these may be difficult to identify or model appropriately. This study introduces a set of AL-generated selection hyper-heuristics combining various ML techniques for the HyFlex QUBO problem domain, with at least one of them matching the performance of the expert. The idea is that the AL hyper-heuristics will be able to use the data automatically generated by the expert to discover hidden patterns in their search process, potentially even capturing its best behaviours and leading to the construction of a better-performing hyper-heuristic which can be used in its place.

Section 2 provides the background covering the hyper-heuristic software framework, including the QUBO problem domain and state-of-the-art selection hyper-heuristic, the tools for machine learning that are all used in the experiments, and related work. Section 3 introduces the proposed generative apprenticeship learning selection hyper-heuristic approach. Then, the data collection process is detailed in Section 4. Section 5 explains the training process, presents empirical results demonstrating the performance of the generated unseen selection hyper-heuristics, learned from small instances and applied to other instances of the same size and larger ones, and contains discussions. Finally, the conclusions are provided in Section 6.

2. Background

2.1. HyFlex

HyFlex [13] is a Java framework that enables rapid implementation, testing, and analyses of selection hyper-heuristic designs. Each HyFlex problem domain represents an NP-hard problem to be solved, such as Personnel Rostering, and comes with a set of low-level heuristics and problem instances. Initially, six problem domains were implemented in HyFlex, which was then extended by including three more domains [14].

HyFlex contains four types of low-level heuristics: local search, mutational, ruin–recreate, and crossover heuristics. Local search heuristics search neighbouring solutions to the current solution to exploit the region being searched; mutational heuristics randomly modify a solution to explore different regions of the search space; ruin–recreate heuristics “destroy” the assignment of variables in the solution to a certain extent and then reconstruct it; and crossover heuristics combine two existing solutions into a single solution. HyFlex allows the hyper-heuristic to set the values for the Depth of Search (DOS) and the Intensity of mutation (IOM). DOS is traditionally the extent to which a local search low-level heuristic searches neighbouring solutions to the current solution, and the IOM usually refers to the proportion of a solution that mutational/ruin–recreate/crossover heuristics change.

HyFlex has been chosen for this study as it is actively used in the hyper-heuristic research field [5,15,16,17]. Additionally, there are a variety of HyFlex-compatible hyper-heuristics that have been proposed and compared experimentally via the CHeSC competition, which allows for an informed choice to be made for the AL expert.

2.1.1. HyFlex QUBO Problem Domain

QUBO is a newly developed HyFlex problem domain [6]. Many different types of problems can be represented as a QUBO problem, such as Graph Colouring, Graph Partitioning, and the Max-Cut problem [18]. In a QUBO problem instance, a solution to the problem is represented as a binary string, and the goal is to maximise the function [19]:

f (x) = x^{'} Q x = \sum_{i = 1}^{n} \sum_{j = 1}^{n} q_{i j} x_{i} x_{j}

(1)

where Q is an

n \times n

matrix of weights, x is the binary solution vector being explored, and

x_{i}

and

x_{j}

are individual bits of the solution that are multiplied by their respective weight,

q_{i j}

.

In the context of HyFlex, the goal is for a hyper-heuristic to find a binary solution vector, via the use of low-level heuristics, that maximises the objective function. QUBO is an NP-hard problem where the search space grows exponentially as the size of the binary solution vector increases. In the domain, there are 2 mutational operators (bit-flip, bit-swap), 6 local search types (steepest ascent, steepest ascent ARN, random mutation, first improvement, 2-opt, tabu search), and 4 binary operators (categorised as crossover) (uniform crossover, improving path relinking, path relinking with simulated annealing, Wang et al. 2012 [19] path relinking), with 12 in total [6]. By default, the HyFlex QUBO problem domain implementation supports the OR-Library 250-variable two-dimensional cutting/packing instances as well as the other variations of the QUBO instances with up-to 2500 variables.

The HyFlex QUBO problem domain has been chosen for this study as it has not yet been explored in an AL context, and the domain allows for multiple different types of problems to be modelled with it. This allows the AL hyper-heuristics to learn the behaviour of an expert on a single problem domain and solve multiple different types of problems, whereas the other HyFlex problem domains only allow for a single type of problem to be solved. Effectively, the AL hyper-heuristics can act as a “cross-domain” hyper-heuristic while being designed for a single problem domain.

2.1.2. LeanGIHH

AdapHH [5] was the winner of the CHeSC competition and is considered to be a state-of-the-art HyFlex-implemented hyper-heuristic. LeanGIHH [20] is an adaptation of AdapHH that has been implemented for the HyFlex framework. LeanGIHH was designed to remove unnecessary complexity from AdapHH, in which Accidental Complexity Analysis was used to simplify the hyper-heuristic while retaining its performance.

The low-level heuristic selection mechanism used by LeanGIHH is a form of roulette wheel selection, where each low-level heuristic is assigned a probability of being selected based on data such as the number of new best solutions found using it, the time spent applying it, and the fraction of time remaining in the search.

LeanGIHH also has a restart mechanism, which works alongside its acceptance mechanism. The acceptance mechanism accepts all equal and improving solutions generated by the application of a low-level heuristic; However, it only accepts worsening solutions within a threshold based on the objective function values of the solutions found after the most recent restart. This threshold is gradually relaxed as more worsening solutions are encountered, and if an improving solution is found, the threshold is reset. However, if enough worsening solutions are encountered without an improving solution being detected at a given stage, the restart mechanism moves the search to a different region of the search space. This ensures a balance between exploration and exploitation during the search.

Additionally, LeanGIHH handles the value for the IOM or DOS of each heuristic separately. If an improving solution is found, the value for the low-level heuristic that was applied is increased by a scalar value, and if an equal or worsening solution is found, the value is decreased. The extent to which the value is increased or decreased depends on the quality of the solution generated by the heuristic.

LeanGIHH was chosen as the AL expert as it is a state-of-the-art HyFlex hyper-heuristic, and the removal of the accidental complexity allows for only the most important elements of the expert to be learnt by the apprentice hyper-heuristics.

2.2. WEKA and C4.5 Decision Trees

WEKA [21], created by the University of Waikato, is a Java framework for training and generating machine learning models. It offers a comprehensive variety of different machine learning models, as well as tools for feature selection and data processing.

Weka offers C4.5 decision trees, which are supervised classification models. When generating a decision tree, the dataset is split into disjoint subsets at each node based on the attribute that has the highest Information Gain. Each node represents a decision where the next branch to follow is based on the value of the attribute that the node splits on. To classify a data point, the root node of the decision tree is first considered, and the attribute of the data point that the root node splits on is evaluated, with the resultant branch being followed to the next node. The next node is then seen as the root node, with this process happening recursively until a leaf node is found. Leaf nodes of a decision tree contain a class label that is subsequently assigned to the data point.

Decision trees are powerful as they are intuitive and interpretable by design, due to being able to visualise each attribute that they split on and the values that they choose. Additionally, they can be much faster at classifying data points than other classification models as they do not require any direct comparisons between the data point being classified and data points from training. Interpretability is very important in this study as it enables us to understand which features are the most important in learning the behaviour of the expert. Additionally, as decisions need to be made by a hyper-heuristic on the fly whilst it is solving a problem, it is very important that models chosen to make these decisions can make fast predictions in order to keep up pace with the expert hyper-heuristic. Therefore, decision trees are used in this study.

2.3. Related Work

There is a growing body of work on the application of data science methods to improve the performance of other search and optimisation algorithms in problem solving [22], including data mining [23,24], artificial neural networks [25], tensor analysis [26,27], and reinforcement learning [28,29]. Additionally, a range of other approaches have been employed for building new effective algorithms or their components, including genetic programming [30], grammatical evolution [31], Monte Carlo tree search [32], gene expression programming [33], and conditional Markov chain search [34].

The focus of this study was the generation of selection hyper-heuristic using an apprenticeship learning (AL) approach. As an overview of previous work, in Asta et al. [15], the concept of using AL in combination with hyper-heuristics was introduced. In that instance, the expert was near-optimal policies produced by a hyper-heuristic for the Online Bin Packing problem, where the problem is to minimise the number of bins used while sorting items on the fly. A k-means classification model [35] was built, with six features describing the search state, to generate a policy which evaluated the potential actions that could be taken and pick the best one. It was found that the generalised policy “often performed better than the standard best-fit heuristic” for the problem.

In Asta and Özcan [16], an AL selection hyper-heuristic was generated for the Vehicle Routing problem domain, with the expert being AdapHH [5], the winner of the CHeSC competition. AdapHH was run for a short amount of time on a single instance of the Vehicle Routing domain, and the data collected were used to generate a low-level heuristic selection model, a model to predict the IOM or DOS for the selected low-level heuristic, and models for each low-level heuristic that decided whether or not to accept a solution generated by them. The feature set for each model contained the ID of the last low-level heuristic that was selected and the last eight changes in the objective function value between the current solution in memory and the candidate solution generated by the application of the selected low-level heuristic in the given iteration. The C4.5 algorithm was used to create decision tree classifiers for the low-level heuristic selection and acceptance models, and linear regression models were used for predicting the values of the IOM and DOS. The AL hyper-heuristic was tested on unseen instances, and it was found that it performed similarly to the expert and even outperformed it in some cases.

In Tyasnurita et al. [17], a similar study to Asta and Özcan [16] took place. However, Multilayer Perceptrons (MLPs) were used for the classification and regression tasks. MLPs are a type of feedforward Artificial Neural Network [36] that requires parameter tuning via trial and error to find the best configuration. The models were trained using the same methods as the previous study, and in the experiments, the MLP-based hyper-heuristic was compared to the C4.5 decision tree-based hyper-heuristic. It was found that in 7 out of 10 instances, the MLP hyper-heuristic outperformed the C4.5 hyper-heuristic. The information used for the training of ML models was only based on objective values in this study.

Tyasnurita et al. [37] built on previous work applying AL to tackle the Open Vehicle Routing problem. A Time Delay Neural Network (TDNN) was used as the low-level heuristic selection model. A TDNN differs from an MLP in that there are delays added to the inputs to the network, which enables information from past iterations to be carried forward. The expert hyper-heuristic chosen was Modified Choice Function All Moves (MCFAM), and the feature set was expanded to add the last eight distances between the candidate and the current solution. During the experiments, it was found that the generated hyper-heuristic performed better than MCFAM on 10 out of 14 instances.

Extending the previous studies further, Tyasnurita et al. [38] tested the idea of mixing data gathered from multiple experts for the training of the apprentice, automatically constructing a new selection hyper-heuristic using TDNN as a classifier for both heuristic selection and move acceptance. AdapHH [5] and Multi-Stage Hyper-heuristic [39] were used as the experts. The DOS and IOM parameters were fixed for the generated hyper-heuristics. The results demonstrated the success of the selection hyper-heuristic generated by the proposed approach observing both experts when compared to each constituent hyper-heuristic.

In this study, a novel framework for generating AL hyper-heuristics is introduced. The key differences between this framework and past ones are as follows:

The approach is applied to a new problem domain with new components.
A new data collection technique is proposed.
A wider range of initial features is engineered and refined via observation, feature selection, and experimental results.
Different model configurations to the previous approaches are explored.
Different data sources are created and used to generate multiple different AL hyper-heuristics.
Additional to the classifiers modelling heuristic selection, a regression model is used to predict the best IOM and DOS parameter settings.
The performance of automatically generated AL-based selection hyper-heuristics, trained on small problem instances, are tested on larger unseen problem instances (learn-from-small-apply-to-big).

3. The Proposed Approach

3.1. Raw Data from Observing the Expert

During a run of the expert, we can collect all the information that is passed through the interface between the domain and the hyper-heuristic. Specifically, for a single-point hyper-heuristic, we can extract the data described in Table 1 in each iteration.

Having these data, we can train predictors for the next low-level heuristic that the expert heuristic is likely to choose at each iteration, and the parameter values it is likely to use. By using these predictors for low-level heuristic and parameter selection, our hyper-heuristic attempts to mimic the expert. Furthermore, we can filter the data before training, for example, choosing the data representing the successful phases of search, hence biasing the predictors to make successful decisions.

3.2. Design of the AL Hyper-Heuristics

Figure 2 illustrates the structure of the AL hyper-heuristic that embeds a generated heuristic selection method, keeping the move acceptance as it is. This particular selection hyper-heuristic performs a single-point-based search as usual; however, a classification and regression model, trained on data from LeanGIHH, selects which low-level heuristic to apply to the current solution and the value for the IOM/DOS, respectively, based on the current state of the search.

3.2.1. Low-Level Heuristic Selection Classifier

The selection of an appropriate low-level heuristic at each iteration of the search process is crucial in advancing the search towards a global minimum. Our low-level heuristic algorithm selection mechanism is implemented as a classifier. In a naive implementation, this classifier predicts in each iteration of the hyper-heuristic what low-level heuristic would be chosen by the expert hyper-heuristic.

Data about the state of the search for a given iteration, t, are represented as a feature vector,

ϕ_{t}

, which is defined as:

\begin{matrix} ϕ_{t} = & {δ_{n_o b j_{t - 1}}, \dots, δ_{n_o b j_{t - 8}}, {\bar{δ}}_{n_o b j}, q_{t - 1}, \dots, q_{t - 8}, \hat{q}, h_{t - 1}, \\ \dots, h_{t - 8}, \hat{h}, v_{t - 1}, \dots, v_{t - 8}, \bar{v}, a_{t - 1}, \dots, a_{t - 8}, \hat{a}, n_i m p, \\ n_e q, n_w o r} \end{matrix}

(2)

where

t - k

refers to the value of the given variable k iterations in the past, and

\bar{x}

and

\hat{x}

refer to the mean and mode of the given variable over the last 8 iterations of the search, respectively, depending on the variables’ data type. Each feature is described in Table 2.

Based on the trace of the previous 8 moves, the ML tries to predict which operator the expert will pick next. However, it is not attempting to directly pick which operator would give the best immediate or long-term gain. It is merely trying to learn from the expert, except that the filtering means that it is trying to learn from the expert during the times that the expert happened to be doing well.

The change in the objective function value between the candidate and current solution at a given iteration is normalised on the fly between 0 and 1 using the formula:

\begin{matrix} δ_{n_obj} = \frac{({obj}_{candidate} - {obj}_{current}) + δ_{m a x}}{2 δ_{\max}} \end{matrix}

(3)

where

δ_{\max}

is the difference between the maximum and minimum objective values observed so far during the run of the algorithm. This value is updated dynamically whenever a new maximum or minimum objective value is encountered (if

δ_{\max} = 0

, then we set

δ_{n_obj} = 0.5

). Hence, the difference

({obj}_{candidate} - {obj}_{current})

is always bounded by

\pm δ_{m a x}

, ensuring that the normalised value of

δ_{n_obj}

remains within the

[0, 1]

interval throughout the run. Therefore, a normalised value of 0.5 indicates that there has been no change in the objective function value between the candidate and the current solution, with a value less than 0.5 indicating a worsening solution and one above 0.5 indicating an improving solution.

The iterations since an event has occurred is normalised on the fly between 0 and 1 using the formula:

\begin{matrix} n_e = \frac{x}{x_{\max}} \end{matrix}

(4)

where

x_{m a x}

is the maximum number of iterations since the event has occurred. A value close to 0 indicates that the event has occurred recently, and a value close to 1 indicates that the number of iterations since the event has occurred is approaching the maximum that has been encountered so far during the run.

These additional, novel features have been engineered as they are not computationally expensive to calculate during the run, and in unison, they offer extensive information about the state of the search to better capture the variation in the search behaviour of the expert. Capturing the variation in the expert’s selection of low-level heuristics is crucial as different search scenarios will require different low-level heuristics to advance the search, and these additional features allow for better separation of the classes (low-level heuristic choices) in the feature space. For example, if the mode quality of the solution is worsening, the mode acceptance decision is accepted, and the normalised number of iterations since an improving solution is 1.0, then it indicates that the search may be stuck, and the hyper-heuristic is currently accepting worsening moves. In that case, the expert may be selecting a combination of mutational and local search heuristics in order to find a new promising region of the search space.

3.2.2. IOM and DOS Regression Models

The choice of the value for the IOM or the DOS can have a large impact on the search, as they directly influence how the low-level heuristic operates on the current solution. As we knew that LeanGIHH handled the value for the IOM/DOS separately for each low-level heuristic, a design choice was made to have a model for each low-level heuristic that was trained only on data from when the given low-level heuristic was selected. The feature vector for these models,

ϕ_{t}

, is defined as:

\begin{matrix} ϕ_{t} = & {δ_{n_o b j_{t - 1}}, \dots, δ_{n_o b j_{t - 8}}, {\bar{δ}}_{n_o b j}, q_{t - 1}, \dots, q_{t - 8}, \hat{q}, a_{t - 1} \\ \dots, a_{t - 8}, \hat{a}, n_i m p, n_e q, n_w o r} \end{matrix}

(5)

Compared with the feature vector for the low-level heuristic classifier, the last eight low-level heuristics chosen were not included due to LeanGIHH not taking this into consideration when deciding on the values for the IOM/DOS. Additionally, the last eight values for the IOM/DOS were not included, as these could be generated by multiple different low-level heuristics. Linear regression models were chosen for this task due to their interpretability and fast prediction time.

3.2.3. Move Acceptance and Restart Mechanism

A choice was made to not create models for the acceptance and restart mechanism of LeanGIHH. The AL hyper-heuristic just uses the LeanGIHH policy for the move acceptance and restart. This is simply because the restart and acceptance mechanisms are already tuned to work in synergy with the selection mechanism.

4. Data Collection

The performance of an AL hyper-heuristic significantly depends on the training data. By filtering the dataset, we can train the AL hyper-heuristic on the best decisions made by the experts.

To train an AL hyper-heuristic, we need a collection of 13 datasets: 12 for training the IOM/DOS models for each low-level heuristic and 1 for training the low-level heuristic selection classifier. Our data collection consisted of collecting raw data and then filtering them. To experiment with several filtering settings, we produced six dataset collections: All-15, All-10, IE-15, IE-10, Imp-15, and Imp-10. The prefix in the dataset collection name refers to the move outcome filtering (see Section 4.3), whereas the suffix refers to the number of shortest trials selected (see Section 4.4).

All the experiments in this paper were run on an Intel i5 3570K (3.4 GHz) Windows 11 machine with 32 GB of RAM. Selection hyper-heuristic algorithms were implemented in Java respecting the HyFlex API.

4.1. Raw Data Collection from LeanGIHH

The QUBO training instances arbitrarily chosen for this study were Instance 2 and Instance 7 of the 1000-variable OR-Library problem instances. The 1000-variable problem instances were chosen for training as they are complex enough that LeanGIHH does not trivially find the best solution to the problems, while small enough to keep the running times practical. Also, by using 1000-variable instances, we avoid over-fitting to the 2500-variable problem instances, enabling fair evaluation.

LeanGIHH was run for 31 trials on each training instance, with the search terminating when the best-known solution for the instance was found. The best-known objective values used in our experiments were 354,932 for Instance 2 and 371,193 for Instance 7; see, e.g., [40].

4.2. Restart and Acceptance Filtering

Once the data had been collected, each trial was filtered to remove all data up until the final run, which occurred after the last restart. Not only did the final run represent a successful search that produced the desired solution, but the parameters were also tuned by the expert due to the selection probabilities and IOM/DOS values for each heuristic being retained when restarting. As well as greatly reducing the size of the data, and thus the training times of the models, it also filtered out the noise generated from LeanGIHH while finding the best configurations in the unsuccessful runs.

4.3. Move Outcome Filtering

One of the data filtering methods we experimented with was based on the move outcome. We considered filtering out the moves that did not improve the solution quality. Specifically, we used three filtering approaches:

All: Include all the moves;
IE: Filter out the moves that made the solution quality worse (but keep the moves that did not change the solution quality);
Imp: Filter out the moves that did not improve the solution.

4.4. Shortest Trials Filtering

We also filtered the data based on the length of a trial (recall that we collected data for 31 trials for each instance). We selected only a few shortest trials for the training datasets. By selecting only the shortest trials, we significantly reduced the dataset size and biased the dataset to the more successful trials.

5. Configuration and Evaluation of the AL Hyper-Heuristic Approach

In this section, we present computational experiments that explore key components of the machine learning pipeline, namely, training data filtering, class balancing techniques, and feature selection, to identify the most effective configurations. Table 3 summarises the design of the study and the components varied within the AL hyper-heuristic approach. Eventually, the top-five best-performing generated AL hyper-heuristics, trained on small instances, are then tested and evaluated on both unseen instances of the same size and larger ones.

5.1. Baseline Performance and Modifications

We trained six baseline AL hyper-heuristics, one for each dataset collection, see Section 4. The performance of these baseline AL hyper-heuristics is reported in Table 4. For these experiments, we chose the time budget of 90 s per run. This time budget was sufficient for LeanGIHH to obtain high-quality solutions but still distinguish the baseline AL hyper-heuristics.

Through observation of the heuristic selection choices during the run, it was found that the hyper-heuristics heavily favoured low-level heuristic 3. One factor that contributes to this is illustrated in Figure 3, which shows the class imbalance for the All-15 low-level heuristic selection dataset. This imbalance is observed across all dataset collections. Considering that the C4.5 decision tree training algorithm uses classification accuracy as the evaluation metric, such an imbalance causes a bias towards low-level heuristic 3. Therefore, we applied dataset balancing techniques, see Section 5.2.

Additionally, several features were removed from the low-level heuristic selection datasets:

We observed that the root of the decision trees was either splitting on the mode low-level heuristic or the average IOM/DOS over the last eight iterations. To avoid an over-reliance on the previous low-level heuristics chosen, $h_{t - 2}$ to $h_{t - 8}$ and $\hat{h}$ were removed, as previous work has shown $h_{t - 1}$ to be sufficient [16,17,37].
Furthermore, after careful consideration, we removed $v_{t - 1}, \dots, v_{t - 8}, \bar{v}$ , as these features are related to IOM/DOS.
Additionally, for heuristic 3, all data points where $n_i m p = 1.0$ were removed, since, when the search got stuck, this heuristic was never able to lead to further improvement.

5.2. Class Balancing

As noted above, the C4.5 classifier is sensitive to the imbalance of classes. We experimented with the following techniques to reduce the size of the largest class (heuristic 3) and increase the sizes of the smaller classes.

Cost-sensitive: Cost-sensitive classification, where all the data points are preserved, and the cost of misclassifying a data point belonging to a minority class is higher than that of the majority class.
Under-sample largest: Under-sampling the largest class to the size of the second-largest class.
Hybrid: A hybrid technique that under-samples the largest class to the size of the second-largest class. The other classes are over-sampled to the size of the second-largest class using Synthetic Minority Over-sampling TEchnique (SMOTE), i.e., by generating synthetic data points for the remaining classes, based on the existing data points in the class. We under/over-sampled to the size of the second-largest class to keep the dataset size manageable. We chose SMOTE over simpler over-sampling techniques to avoid overfitting.
Remove largest: Completely removing the largest class.

Table 5 shows the results of using each of the above techniques for the All-15 dataset. Class balancing improved the performance of the hyper-heuristics; every hyper-heuristic could consistently find the best-known solution within the time limit. Therefore, the mean time to best-known solutions was used as the main performance metric. Here and below, the reported time to best-known solutions is averaged over 31 trials.

The “cost-sensitive” technique was a clear winner according to the time-to-best-known metric, with the “hybrid” technique also showing competitive results. Interestingly, the training accuracy and the F-score with this technique were the lowest.

Based on the above results, we focused on the “cost-sensitive” and “hybrid” techniques, testing them with all the dataset collections. Table 6 reports the results for all the combinations of the class-balancing techniques and the dataset collections that produced hyper-heuristics capable of reliably finding the best-known solutions within the time budget (we call such hyper-heuristics “successful”).

From the table, we can see that the “hybrid” technique was, on average, more successful than the “cost-sensitive” technique. Additionally, we can observe that the Imp-15 and Imp-10 dataset collections did not produce any successful hyper-heuristics. We hypothesise that not all good moves necessarily produced an instant result (e.g., consider mutation heuristics), and thus the equal and/or worsening moves may have been needed to advance the search.

5.3. Wrapper Feature Selection

We also experimented with wrapper feature selection to find a subset of the features that potentially led to the hyper-heuristics more accurately learning the behaviour of the expert. C4.5 decision trees were chosen for wrapper feature selection, where every combination of attributes was evaluated by training a decision tree, and the feature subset that produced the best training accuracy was selected. Due to the process being highly computationally expensive, it was only carried out on the best-performing combination, i.e., IE-15 hybrid.

We found that the best classification accuracy was achieved by removing four features:

δ_{n_o b j_{t - 1}}

,

δ_{n_o b j_{t - 3}}

,

δ_{n_o b j_{t - 7}}

, and

{\bar{δ}}_{n_o b j}

. We trained new hyper-heuristics on the datasets with the reduced feature set. The results are reported in Table 7. Only the “successful” hyper-heuristics are included in the table.

Surprisingly, IE-15 hybrid was significantly less effective when trained with the reduced set of features. However, the hyper-heuristics “successful” in this experiment were notably better than their equivalents trained on the full set of features (see Table 6).

5.4. Evaluation

In the final experiments, the best AL hyper-heuristics were compared directly to LeanGIHH. We selected the top-five generated AL hyper-heuristics according to their performance on the training instances (see Table 6 and Table 7) and named them HH1, HH2, …, HH5. The details are summarised in Table 8, which maps each of the five AL hyper-heuristics (HH1–HH5) to their corresponding configurations used to generate them, including the filtering strategy, number of trials used, class balancing technique, and feature set.

To evaluate these hyper-heuristics, we tested them on the eight remaining 1000-variable OR-Lib instances (Table 9). Based on their performance on 1000-variable instances, we selected the top two of them (HH1 and HH5) and tested them on ten 2500-variable OR-Lib instances (Table 10) to assess whether the knowledge learned from smaller instances could be applied successfully to larger ones. The AL hyper-heuristics were compared to LeanGIHH and a Gurobi-based solver. The Gurobi-based solver used the quadratic formulation of QUBO (1) and termination based on the time budget, returning the best feasible solution found within the time budget. Default settings were used except for restricting Gurobi to exactly one CPU core for a fair comparison.

Each entry in Table 9 and Table 10 is the relative percentage gap between the obtained solution objective value (

obtained

) and the best-known objective value (

best

):

Gap = \frac{best - obtained}{best} \times 100 % .

One can see from Table 9 and Table 10 that AL hyper-heuristics consistently outperformed LeanGIHH. We also conducted a statistical analysis using Wilcoxon Signed-Rank Test on the results. Impressively, we can see that the AL hyper-heuristics managed to outperform LeanGIHH on every problem instance, with HH1 statistically significantly outperforming LeanGIHH on all 18 testing instances within a confidence interval of 95%. The performance of the Gurobi-based solver was on average higher than that of the AL hyper-heuristics; however, the gap was relatively small, particularly for the larger instances.

5.5. Discussion

The proposed apprenticeship learning approach in this study follows a “learn-small-apply-big” paradigm, which carries inherent limitations. Its success relies on the assumption that the structural and behavioural patterns observed in smaller instances are representative of those in larger instances. In domains where this assumption does not hold, due to reasons such as increased complexity, different constraint interactions, or scale-induced dynamics, the learned models may fail to generalise effectively. Furthermore, the feature representations and model configurations that work well on small instances may require adaptation to maintain performance at scale. These considerations are important for researchers seeking to apply this framework to other domains or significantly larger problem instances.

It should be emphasised that the apprentice is not directly attempting to find sequences that would be better at improving the objective function but is essentially trying to find a system that would do “what the expert would do” but without direct mimicry. Hence, improvement in performance might be viewed as a side effect. That there is an improvement is intriguing and deserves further study. A hypothesis (and the motivation for the filtering we apply to the data) could be that the expert has good patches where it does the right thing (possibly partially accidentally), and bad patches that are less-informed and consistent: the apprentice might then learn to more consistently follow the good predictable patterns and less of the poor patterns as they are more random and would not be learned.

Naturally, the work also raised many questions regarding how the framework can be refined and applied to further applications, with different model configurations and methodologies still to be explored. One particularly promising direction is the concept of iterative apprenticeship learning, where a second-generation apprentice is trained not only on the original expert but also on the first apprentice. This could potentially lead to a cascade of increasingly refined hyper-heuristics, each learning from the best behaviours of its predecessor while filtering out suboptimal patterns. Such an approach could yield increasingly general and powerful search strategies with minimal human intervention. Another compelling extension could be the development of ensemble AL hyper-heuristics, where multiple apprentices trained on different experts, datasets, or filtering strategies are combined. These ensembles could leverage the diversity of learned behaviours to improve robustness and adaptability across problem instances and domains [41]. Taking all these potential future research directions into account, as well as machine learning constantly evolving and advancing, this study can serve as a foundation for future work in this area.

6. Conclusions

In this study, a novel data-driven approach to generating hyper-heuristics was designed, tested, and evaluated in the context of the QUBO domain that is relevant in general and to quantum annealing in particular. In the final experiments, the best-performing configuration of the apprenticeship learning (AL) hyper-heuristic was identified as HH1. This model was trained on the All-10 dataset, which included all move types, both worsening and non-worsening, and used the ten shortest search traces. It employed a hybrid class balancing technique combining SMOTE and under-sampling, along with a reduced feature set selected via wrapper-based feature selection.

The apprentice learned from the low-level heuristics choices made by the expert and yielded impressive performance and relatively high accuracy for both models. We speculate that the apprenticeship learning framework successfully allows the hyper-heuristics to effectively find hidden successful patterns in the heuristic selection choices of the expert, resulting in multiple different heuristic selection behaviours based on the source of the data and the data preparation and machine learning techniques. Additionally, it was shown via experiments that this framework was able to produce hyper-heuristics that could learn complex behaviours on small instances and effectively translate them to much larger ones, that is, the learned behaviour showed good generalisation.

Author Contributions

Conceptualisation, D.K., A.J.P. and E.Ö.; Formal Analysis, A.J.P.; Investigation, J.C.; Methodology, J.C. and D.K.; Software, J.C., W.G.J. and D.K.; Supervision, E.Ö.; Writing—original draft, J.C.; Writing—Review and Editing, W.G.J., D.K., A.J.P. and E.Ö. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets presented in this article are not readily available due to technical limitations. Requests for access to HyFlex should be directed to Warren Jackson. Requests concerning the QUBO problem instances and Gurobi model/solver should be directed to Daniel Karapetyan, while requests regarding the training datasets should be directed to Jack Cakebread.

Acknowledgments

The manuscript was collaboratively written and edited using Overleaf, which comes with Writefull, an AI tool commonly used for receiving feedback on typos, grammar, vocabulary, and style.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cowling, P.; Kendall, G.; Soubeiga, E. A Hyperheuristic Approach to Scheduling a Sales Summit. In Practice and Theory of Automated Timetabling III; Springer: Berlin/Heidelberg, Germany, 2001; pp. 176–190. [Google Scholar]
Drake, J.H.; Kheiri, A.; Özcan, E.; Burke, E.K. Recent advances in selection hyper-heuristics. Eur. J. Oper. Res. 2020, 285, 405–428. [Google Scholar] [CrossRef]
Burke, E.; Curtois, T.; Graham, H.; Gabriela, K.; Sanja, O.; Jos, P. HyFlex: A Flexible Framework for the Design and Analysis of Hyper-heuristics. In Proceedings of the Multidisciplinary International Scheduling Conference (MISTA 2009), Dublin, Ireland, 10–12 August 2009. [Google Scholar]
Burke, E.; Gendreau, M.; Hyde, M.; Kendall, G.; McCollum, B.; Ochoa, G.; Parkes, A.; Petrovic, S. The Cross-Domain Heuristic Search Challenge–An International Research Competition. In LION 2011: Learning and Intelligent Optimization; Springer: Berlin/Heidelberg, Germany, 2011; pp. 631–634. [Google Scholar]
Mısır, M.; Verbeeck, K.; De Causmaecker, P.; Vanden Berghe, G. An Intelligent Hyper-Heuristic Framework for CHeSC 2011. In Proceedings of the Learning and Intelligent OptimizatioN, Paris, France, 16–20 January 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 461–466. [Google Scholar]
Warren, J.; Karapetyan, D.; Özcan, E.; Parkes, A.J. Automated Algorithm Configuration for the Quadratic Unconstrained Binary Optimisation Problem (Presented at OR62); Technical Report; 2022. [Google Scholar]
Kochenberger, G.; Hao, J.K.; Glover, F.; Lewis, M.; Lü, Z.; Wang, H.; Wang, Y. The unconstrained binary quadratic programming problem: A survey. J. Comb. Optim. 2014, 28, 58–81. [Google Scholar] [CrossRef]
Lewis, M.; Glover, F. Quadratic unconstrained binary optimization problem preprocessing: Theory and empirical analysis. Networks 2017, 70, 79–97. [Google Scholar] [CrossRef]
McGeoch, C.C. The D-Wave Platform. In Adiabatic Quantum Computation and Quantum Annealing: Theory and Practice; Springer International Publishing: Cham, Switzerland, 2014; pp. 43–57. [Google Scholar] [CrossRef]
Boixo, S.; Rønnow, T.F.; Isakov, S.V.; Wang, Z.; Wecker, D.; Lidar, D.A.; Martinis, J.M.; Troyer, M. Evidence for quantum annealing with more than one hundred qubits. Nat. Phys. 2014, 10, 218–224. [Google Scholar] [CrossRef]
Samuel, A.L. Programming Computers to Play Games. In Advances in Computers, 1; Elsevier: Amsterdam, The Netherlands, 1960; pp. 165–192. [Google Scholar]
Abbeel, P.; Ng, A.Y. Apprenticeship Learning via Inverse Reinforcement Learning. In Proceedings of the Twenty-First International Conference on Machine Learning, Banff, AB, Canada, 4–8 July 2004; Association for Computing Machinery: New York, NY, USA, 2004. [Google Scholar]
Ochoa, G.; Hyde, M.; Curtois, T.; Vazquez-Rodriguez, J.A.; Walker, J.; Gendreau, M.; Kendall, G.; McCollum, B.; Parkes, A.J.; Petrovic, S.; et al. HyFlex A Benchmark Framew. Cross-Domain Heuristic Search. In Proceedings of the European Conference on Evolutionary Computation in Combinatorial Optimization, Málaga, Spain, 11–13 April 2012; Volume 7245, pp. 136–147. [Google Scholar]
Adriaensen, S.; Ochoa, G.; Nowé, A. A benchmark set extension and comparative study for the HyFlex framework. In Proceedings of the 2015 IEEE Congress on Evolutionary Computation (CEC), Sendai, Japan, 25–28 May 2015; pp. 784–791. [Google Scholar]
Asta, S.; Özcan, E.; Parkes, A.J.; Etaner-Uyar, A.Ş. Generalizing Hyper-heuristics via Apprenticeship Learning. In Proceedings of the Evolutionary Computation in Combinatorial Optimization, Vienna, Austria, 3–5 April 2013; Middendorf, M., Blum, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 169–178. Available online: https://www.researchgate.net/publication/259287080_Generalizing_Hyper-heuristics_via_Apprenticeship_Learning (accessed on 8 May 2025).
Asta, S.; Özcan, E. An Apprenticeship Learning Hyper-Heuristic for Vehicle Routing in HyFlex. In Proceedings of the 2014 IEEE Symposium on Evolving and Autonomous Learning Systems (EALS), Orlando, FL, USA, 9–12 December 2014. [Google Scholar]
Tyasnurita, R.; Özcan, E.; Shahriar, A.; John, R. Improving performance of a hyper-heuristic using a multilayer perceptron for vehicle routing. In Proceedings of the 15th Annual Workshop on Computational Intelligence, Madrid, Spain, 11–15 July 2015. [Google Scholar]
Ushijima-Mwesigwa, H.; Negre, C.F.A.; Mniszewski, S.M. Graph Partitioning Using Quantum Annealing on the D-Wave System. In Proceedings of the Second International Workshop on Post Moores Era Supercomputing, New York, NY, USA, 12 November 2017; PMES’17. pp. 22–29. [Google Scholar] [CrossRef]
Wang, Y.; Lü, Z.; Glover, F.; Hao, J.K. Path relinking for unconstrained binary quadratic programming. Eur. J. Oper. Res. 2012, 223, 595–604. [Google Scholar] [CrossRef]
Adriaensen, S.; Now’e, A. Case Study: An Analysis of Accidental Complexity in a State-of-the-art Hyper-heuristic for HyFlex. In Proceedings of the 2016 IEEE Congress on Evolutionary Computation (CEC), Vancouver, BC, Canada, 24–29 July 2016. [Google Scholar]
Witten, I.H.; Frank, E.; Hall, M.A.; Pal, C.J. The WEKA Workbench. Online Appendix for “Data Mining: Practical Machine Learning Tools and Techniques”, 4th ed.; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2016. [Google Scholar]
Song, H.; Triguero, I.; Özcan, E. A review on the self and dual interactions between machine learning and optimisation. Prog. Artif. Intell. 2019, 8, 143–165. [Google Scholar] [CrossRef]
Thabtah, F.; Cowling, P. Mining the data from a hyperheuristic approach using associative classification. Expert Syst. Appl. 2008, 34, 1093–1101. [Google Scholar] [CrossRef]
Zhou, Y.; Zhang, X.; Geng, N.; Jiang, Z.; Wang, S.; Zhou, M. Frequent Itemset-Driven Search for Finding Minimal Node Separators and Its Application to Air Transportation Network Analysis. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8348–8360. [Google Scholar] [CrossRef]
Tapia-Avitia, J.M.; Cruz-Duarte, J.M.; Amaya, I.; Ortiz-Bayliss, J.C.; Terashima-Marin, H.; Pillay, N. A Primary Study on Hyper-Heuristics Powered by Artificial Neural Networks for Customising Population-based Metaheuristics in Continuous Optimisation Problems. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
Asta, S.; Özcan, E. A tensor-based selection hyper-heuristic for cross-domain heuristic search. Inf. Sci. 2015, 299, 412–432. [Google Scholar] [CrossRef]
Asta, S.; Özcan, E.; Curtois, T. A tensor based hyper-heuristic for nurse rostering. Knowl.-Based Syst. 2016, 98, 185–199. [Google Scholar] [CrossRef]
Lin, J.; Li, Y.Y.; Song, H.B. Semiconductor final testing scheduling using Q-learning based hyper-heuristic. Expert Syst. Appl. 2022, 187, 115978. [Google Scholar] [CrossRef]
Zhang, Z.Q.; Wu, F.C.; Qian, B.; Hu, R.; Wang, L.; Jin, H.P. A Q-learning-based hyper-heuristic evolutionary algorithm for the distributed flexible job-shop scheduling problem with crane transportation. Expert Syst. Appl. 2023, 234, 121050. [Google Scholar] [CrossRef]
El Yafrani, M.; Martins, M.; Wagner, M.; Ahiod, B.; Delgado, M.; Lüders, R. A hyperheuristic approach based on low-level heuristics for the travelling thief problem. Genet. Program. Evolvable Mach. 2018, 19, 121–150. [Google Scholar] [CrossRef]
Fontoura, V.D.; Pozo, A.T.; Santana, R. Automated design of hyper-heuristics components to solve the PSP problem with HP model. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2017), Donostia, Spain, 5–8 June 2017; pp. 1848–1855. [Google Scholar]
Sabar, N.R.; Kendall, G. Population based Monte Carlo tree search hyper-heuristic for combinatorial optimization problems. Inf. Sci. 2015, 314, 225–239. [Google Scholar] [CrossRef]
Sabar, N.R.; Ayob, M.; Kendall, G.; Qu, R. Automatic design of a hyper-heuristic framework with gene expression programming for combinatorial optimization problems. IEEE Trans. Evol. Comput. 2015, 19, 309–325. [Google Scholar] [CrossRef]
Karapetyan, D.; Punnen, A.P.; Parkes, A.J. Markov Chain methods for the bipartite Boolean quadratic programming problem. Eur. J. Oper. Res. 2017, 260, 494–506. [Google Scholar] [CrossRef]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. 1967. Available online: https://www.scirp.org/reference/referencespapers?referenceid=2232949 (accessed on 8 May 2025).
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall PTR: Hoboken, NJ, USA, 1994. [Google Scholar]
Tyasnurita, R.; Özcan, E.; John, R. Learning heuristic selection using a Time Delay Neural Network for Open Vehicle Routing. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia, Spain, 5–8 June 2017; pp. 1474–1481. [Google Scholar]
Tyasnurita, R.; Özcan, E.; Drake, J.H.; Asta, S. Constructing selection hyper-heuristics for open vehicle routing with time delay neural networks using multiple experts. Knowl.-Based Syst. 2024, 295, 111731. [Google Scholar] [CrossRef]
Kheiri, A.; Özcan, E. An iterated multi-stage selection hyper-heuristic. Eur. J. Oper. Res. 2016, 250, 77–90. [Google Scholar] [CrossRef]
Boros, E.; Hammer, P.; Tavares, G. Local search heuristics for Quadratic Unconstrained Binary Optimization (QUBO). J. Heuristics 2007, 13, 99–132. [Google Scholar] [CrossRef]
de Santiago Júnior, V.A.; Özcan, E.; de Carvalho, V.R. Hyper-Heuristics based on Reinforcement Learning, Balanced Heuristic Selection and Group Decision Acceptance. Appl. Soft Comput. 2020, 97, 106760. [Google Scholar] [CrossRef]

Figure 1. Flowchart showing how an iterative selection hyper-heuristic operates.

Figure 2. High-level diagram showing the general structure of an AL hyper-heuristic.

Figure 3. The class sizes in log-scale in the All-15 low-level heuristic dataset.

Table 1. Directly observable features for the low-level heuristic selection classifier.

Direct Features	Description
h	The low-level heuristic chosen (categorical).
v	The IOM/DOS chosen (numerical).
a	The acceptance decision (accepted, not accepted) (categorical).
$δ_{n_o b j}^{'}$	The change in the objective function value between the candidate solution and the current solution (numerical).

Table 2. The derived features used for the selection.

Derived Feature	Description
$δ_{n_obj}$	The normalised change in the objective function value between the candidate solution and the current solution (numerical).
q	The quality (improving, equal, worsening) of the candidate solution (categorical).
$n_imp$	The normalised number of iterations since an improving solution (numerical).
$n_eq$	The normalised number of iterations since an equal solution (numerical).
$n_wor$	The normalised number of iterations since a worsening solution (numerical).

Table 3. Summary of the computational study design, outlining the main experimental components and their corresponding sections.

Baseline performance evaluation	Assess ALHH performance on six training datasets	Section 5.1
Class balancing	Evaluate the impact of various class balancing techniques on ALHH performance	Section 5.2
Wrapper feature selection	Apply wrapper feature selection to the most successful training dataset(s)	Section 5.3
Final evaluation	Compare the most successful variants of hyper-heuristics to detect the best-performing one	Section 5.4

Table 4. The mean of the objective values that the AL hyper-heuristics produced during the baseline experiments over 31 trials, ranked in order of performance.

Baseline	Mean Objective Value $μ$
ALHH	Instance 2	Instance 7
IE-10	353,624.55	370,171.13
All-10	353,030.87	369,697.58
Imp-10	353,030.87	369,723.74
Imp-15	353,028.52	369,697.59
IE-15	353,088.94	36,544.51
All-15	352,954.00	369,594.23

Table 5. Comparison of the class balancing techniques on the All-15 dataset collection.

Class-Balancing Technique	Time to Best Known, ms		Training Metrics
Class-Balancing Technique	Instance 2	Instance 7	Accuracy	F-Score
Cost-sensitive	970	3015	23.3%	0.211
Under-sample largest	4054	8934	50.7%	0.490
Hybrid	2233	3775	68.9%	0.689
Remove largest	5687	12,569	53.3%	0.513

Table 6. Comparison of the combinations of the dataset collections and class-balancing techniques. Only the “successful” hyper-heuristics are included.

Dataset	Class-Balancing	Time to Best Known, ms		Training Metrics
Dataset	Class-Balancing	Instance 2	Instance 7	Accuracy	F-Score
All-15	Cost-sensitive	970	3015	23.3%	0.211
All-15	Hybrid	2233	3775	68.9%	0.689
IE-15	Hybrid	318	739	83.2%	0.831
All-10	Hybrid	3964	9637	78.4%	0.784
All-10	Cost-sensitive	4212	11,400	25.7%	0.213
IE-10	Hybrid	8031	18,196	88.9%	0.888

Table 7. Evaluation of the effect of the reduced feature set. Only the “successful” hyper-heuristics are included.

Dataset	Class-Balancing	Time to Best Known, ms		Training Metrics
Dataset	Class-Balancing	Instance 2	Instance 7	Accuracy	F-Score
All-15	Cost-sensitive	659	1078	23.9%	0.219
All-15	Hybrid	720	1569	68.9%	0.689
All-10	Hybrid	1119	2243	77.5%	0.775

Table 8. The list of the five best hyper-heuristics based on their performance on the training instances.

HH ID	Dataset	Filtering Strategy	Trials Used	Class Balancing	Feature Set
HH1	All-10	All moves (worsening + non-worsening)	10 shortest	Hybrid (SMOTE + under-sampling)	Reduced (via wrapper selection)
HH2	All-15	All moves	15 shortest	Hybrid	Reduced
HH3	All-15	All moves	15 shortest	Cost-sensitive	Reduced
HH4	All-15	All moves	15 shortest	Cost-sensitive	Full
HH5	IE-15	Ignore worsening moves	15 shortest	Hybrid	Full

Table 9. Evaluation on the 1000-variable OR-Lib instances. (Instances 2 and 7 are skipped as they were used for training.) The time budget was 30 s. The best-known objective values were obtained from [40].

Instance	Best Known	Gap, %
Instance	Best Known	Gurobi	LeanGIHH	HH1	HH2	HH3	HH4	HH5
1	371,438	0.000	0.018	0.000	0.010	0.003	0.001	0.001
3	371,236	0.000	0.013	0.002	0.012	0.003	0.005	0.003
4	370,675	0.000	0.025	0.004	0.018	0.008	0.008	0.006
5	352,760	0.000	0.034	0.000	0.008	0.007	0.004	0.004
6	359,629	0.000	0.061	0.018	0.024	0.021	0.014	0.010
8	351,994	0.000	0.067	0.026	0.057	0.038	0.034	0.037
9	349,337	0.000	0.059	0.008	0.024	0.015	0.013	0.008
10	351,415	0.000	0.069	0.016	0.027	0.025	0.012	0.017
Average		0.000	0.043	0.009	0.022	0.015	0.012	0.011

Table 10. Evaluation on 2500-variable OR-Lib instances. The time budget was 120 s. The best-known objective values were obtained from [19].

Instance	Best Known	Gap, %
Instance	Best Known	Gurobi	LeanGIHH	HH1	HH5
1	1,515,944	0.023	0.092	0.027	0.034
2	1,471,392	0.049	0.121	0.064	0.078
3	1,414,192	0.060	0.105	0.044	0.060
4	1,507,701	0.000	0.040	0.002	0.007
5	1,491,816	0.001	0.036	0.005	0.009
6	1,469,162	0.004	0.108	0.040	0.063
7	1,479,040	0.077	0.144	0.069	0.088
8	1,484,199	0.000	0.059	0.017	0.025
9	1,482,413	0.001	0.070	0.018	0.025
10	1,483,355	0.041	0.130	0.078	0.084
Average		0.026	0.091	0.036	0.047

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cakebread, J.; Jackson, W.G.; Karapetyan, D.; Parkes, A.J.; Özcan, E. Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation. Algorithms 2025, 18, 516. https://doi.org/10.3390/a18080516

AMA Style

Cakebread J, Jackson WG, Karapetyan D, Parkes AJ, Özcan E. Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation. Algorithms. 2025; 18(8):516. https://doi.org/10.3390/a18080516

Chicago/Turabian Style

Cakebread, Jack, Warren G. Jackson, Daniel Karapetyan, Andrew J. Parkes, and Ender Özcan. 2025. "Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation" Algorithms 18, no. 8: 516. https://doi.org/10.3390/a18080516

APA Style

Cakebread, J., Jackson, W. G., Karapetyan, D., Parkes, A. J., & Özcan, E. (2025). Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation. Algorithms, 18(8), 516. https://doi.org/10.3390/a18080516

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Student Surpasses the Teacher: Apprenticeship Learning for Quadratic Unconstrained Binary Optimisation

Abstract

1. Introduction

2. Background

2.1. HyFlex

2.1.1. HyFlex QUBO Problem Domain

2.1.2. LeanGIHH

2.2. WEKA and C4.5 Decision Trees

2.3. Related Work

3. The Proposed Approach

3.1. Raw Data from Observing the Expert

3.2. Design of the AL Hyper-Heuristics

3.2.1. Low-Level Heuristic Selection Classifier

3.2.2. IOM and DOS Regression Models

3.2.3. Move Acceptance and Restart Mechanism

4. Data Collection

4.1. Raw Data Collection from LeanGIHH

4.2. Restart and Acceptance Filtering

4.3. Move Outcome Filtering

4.4. Shortest Trials Filtering

5. Configuration and Evaluation of the AL Hyper-Heuristic Approach

5.1. Baseline Performance and Modifications

5.2. Class Balancing

5.3. Wrapper Feature Selection

5.4. Evaluation

5.5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI