2.1. HyFlex
HyFlex [
13] is a Java framework that enables rapid implementation, testing, and analyses of selection hyper-heuristic designs. Each HyFlex problem domain represents an NP-hard problem to be solved, such as Personnel Rostering, and comes with a set of low-level heuristics and problem instances. Initially, six problem domains were implemented in HyFlex, which was then extended by including three more domains [
14].
HyFlex contains four types of low-level heuristics: local search, mutational, ruin–recreate, and crossover heuristics. Local search heuristics search neighbouring solutions to the current solution to exploit the region being searched; mutational heuristics randomly modify a solution to explore different regions of the search space; ruin–recreate heuristics “destroy” the assignment of variables in the solution to a certain extent and then reconstruct it; and crossover heuristics combine two existing solutions into a single solution. HyFlex allows the hyper-heuristic to set the values for the Depth of Search (DOS) and the Intensity of mutation (IOM). DOS is traditionally the extent to which a local search low-level heuristic searches neighbouring solutions to the current solution, and the IOM usually refers to the proportion of a solution that mutational/ruin–recreate/crossover heuristics change.
HyFlex has been chosen for this study as it is actively used in the hyper-heuristic research field [
5,
15,
16,
17]. Additionally, there are a variety of HyFlex-compatible hyper-heuristics that have been proposed and compared experimentally via the CHeSC competition, which allows for an informed choice to be made for the AL expert.
2.1.1. HyFlex QUBO Problem Domain
QUBO is a newly developed HyFlex problem domain [
6]. Many different types of problems can be represented as a QUBO problem, such as Graph Colouring, Graph Partitioning, and the Max-Cut problem [
18]. In a QUBO problem instance, a solution to the problem is represented as a binary string, and the goal is to maximise the function [
19]:
where
Q is an
matrix of weights,
x is the binary solution vector being explored, and
and
are individual bits of the solution that are multiplied by their respective weight,
.
In the context of HyFlex, the goal is for a hyper-heuristic to find a binary solution vector, via the use of low-level heuristics, that maximises the objective function. QUBO is an NP-hard problem where the search space grows exponentially as the size of the binary solution vector increases. In the domain, there are 2 mutational operators (bit-flip, bit-swap), 6 local search types (steepest ascent, steepest ascent ARN, random mutation, first improvement, 2-opt, tabu search), and 4 binary operators (categorised as crossover) (uniform crossover, improving path relinking, path relinking with simulated annealing, Wang et al. 2012 [
19] path relinking), with 12 in total [
6]. By default, the HyFlex QUBO problem domain implementation supports the OR-Library 250-variable two-dimensional cutting/packing instances as well as the other variations of the QUBO instances with up-to 2500 variables.
The HyFlex QUBO problem domain has been chosen for this study as it has not yet been explored in an AL context, and the domain allows for multiple different types of problems to be modelled with it. This allows the AL hyper-heuristics to learn the behaviour of an expert on a single problem domain and solve multiple different types of problems, whereas the other HyFlex problem domains only allow for a single type of problem to be solved. Effectively, the AL hyper-heuristics can act as a “cross-domain” hyper-heuristic while being designed for a single problem domain.
2.1.2. LeanGIHH
AdapHH [
5] was the winner of the CHeSC competition and is considered to be a state-of-the-art HyFlex-implemented hyper-heuristic. LeanGIHH [
20] is an adaptation of AdapHH that has been implemented for the HyFlex framework. LeanGIHH was designed to remove unnecessary complexity from AdapHH, in which Accidental Complexity Analysis was used to simplify the hyper-heuristic while retaining its performance.
The low-level heuristic selection mechanism used by LeanGIHH is a form of roulette wheel selection, where each low-level heuristic is assigned a probability of being selected based on data such as the number of new best solutions found using it, the time spent applying it, and the fraction of time remaining in the search.
LeanGIHH also has a restart mechanism, which works alongside its acceptance mechanism. The acceptance mechanism accepts all equal and improving solutions generated by the application of a low-level heuristic; However, it only accepts worsening solutions within a threshold based on the objective function values of the solutions found after the most recent restart. This threshold is gradually relaxed as more worsening solutions are encountered, and if an improving solution is found, the threshold is reset. However, if enough worsening solutions are encountered without an improving solution being detected at a given stage, the restart mechanism moves the search to a different region of the search space. This ensures a balance between exploration and exploitation during the search.
Additionally, LeanGIHH handles the value for the IOM or DOS of each heuristic separately. If an improving solution is found, the value for the low-level heuristic that was applied is increased by a scalar value, and if an equal or worsening solution is found, the value is decreased. The extent to which the value is increased or decreased depends on the quality of the solution generated by the heuristic.
LeanGIHH was chosen as the AL expert as it is a state-of-the-art HyFlex hyper-heuristic, and the removal of the accidental complexity allows for only the most important elements of the expert to be learnt by the apprentice hyper-heuristics.
2.2. WEKA and C4.5 Decision Trees
WEKA [
21], created by the University of Waikato, is a Java framework for training and generating machine learning models. It offers a comprehensive variety of different machine learning models, as well as tools for feature selection and data processing.
Weka offers C4.5 decision trees, which are supervised classification models. When generating a decision tree, the dataset is split into disjoint subsets at each node based on the attribute that has the highest Information Gain. Each node represents a decision where the next branch to follow is based on the value of the attribute that the node splits on. To classify a data point, the root node of the decision tree is first considered, and the attribute of the data point that the root node splits on is evaluated, with the resultant branch being followed to the next node. The next node is then seen as the root node, with this process happening recursively until a leaf node is found. Leaf nodes of a decision tree contain a class label that is subsequently assigned to the data point.
Decision trees are powerful as they are intuitive and interpretable by design, due to being able to visualise each attribute that they split on and the values that they choose. Additionally, they can be much faster at classifying data points than other classification models as they do not require any direct comparisons between the data point being classified and data points from training. Interpretability is very important in this study as it enables us to understand which features are the most important in learning the behaviour of the expert. Additionally, as decisions need to be made by a hyper-heuristic on the fly whilst it is solving a problem, it is very important that models chosen to make these decisions can make fast predictions in order to keep up pace with the expert hyper-heuristic. Therefore, decision trees are used in this study.
2.3. Related Work
There is a growing body of work on the application of data science methods to improve the performance of other search and optimisation algorithms in problem solving [
22], including data mining [
23,
24], artificial neural networks [
25], tensor analysis [
26,
27], and reinforcement learning [
28,
29]. Additionally, a range of other approaches have been employed for building new effective algorithms or their components, including genetic programming [
30], grammatical evolution [
31], Monte Carlo tree search [
32], gene expression programming [
33], and conditional Markov chain search [
34].
The focus of this study was the generation of selection hyper-heuristic using an apprenticeship learning (AL) approach. As an overview of previous work, in Asta et al. [
15], the concept of using AL in combination with hyper-heuristics was introduced. In that instance, the expert was near-optimal policies produced by a hyper-heuristic for the Online Bin Packing problem, where the problem is to minimise the number of bins used while sorting items on the fly. A k-means classification model [
35] was built, with six features describing the search state, to generate a policy which evaluated the potential actions that could be taken and pick the best one. It was found that the generalised policy “often performed better than the standard best-fit heuristic” for the problem.
In Asta and Özcan [
16], an AL selection hyper-heuristic was generated for the Vehicle Routing problem domain, with the expert being AdapHH [
5], the winner of the CHeSC competition. AdapHH was run for a short amount of time on a single instance of the Vehicle Routing domain, and the data collected were used to generate a low-level heuristic selection model, a model to predict the IOM or DOS for the selected low-level heuristic, and models for each low-level heuristic that decided whether or not to accept a solution generated by them. The feature set for each model contained the ID of the last low-level heuristic that was selected and the last eight changes in the objective function value between the current solution in memory and the candidate solution generated by the application of the selected low-level heuristic in the given iteration. The C4.5 algorithm was used to create decision tree classifiers for the low-level heuristic selection and acceptance models, and linear regression models were used for predicting the values of the IOM and DOS. The AL hyper-heuristic was tested on unseen instances, and it was found that it performed similarly to the expert and even outperformed it in some cases.
In Tyasnurita et al. [
17], a similar study to Asta and Özcan [
16] took place. However, Multilayer Perceptrons (MLPs) were used for the classification and regression tasks. MLPs are a type of feedforward Artificial Neural Network [
36] that requires parameter tuning via trial and error to find the best configuration. The models were trained using the same methods as the previous study, and in the experiments, the MLP-based hyper-heuristic was compared to the C4.5 decision tree-based hyper-heuristic. It was found that in 7 out of 10 instances, the MLP hyper-heuristic outperformed the C4.5 hyper-heuristic. The information used for the training of ML models was only based on objective values in this study.
Tyasnurita et al. [
37] built on previous work applying AL to tackle the Open Vehicle Routing problem. A Time Delay Neural Network (TDNN) was used as the low-level heuristic selection model. A TDNN differs from an MLP in that there are delays added to the inputs to the network, which enables information from past iterations to be carried forward. The expert hyper-heuristic chosen was Modified Choice Function All Moves (MCFAM), and the feature set was expanded to add the last eight distances between the candidate and the current solution. During the experiments, it was found that the generated hyper-heuristic performed better than MCFAM on 10 out of 14 instances.
Extending the previous studies further, Tyasnurita et al. [
38] tested the idea of mixing data gathered from multiple experts for the training of the apprentice, automatically constructing a new selection hyper-heuristic using TDNN as a classifier for both heuristic selection and move acceptance. AdapHH [
5] and Multi-Stage Hyper-heuristic [
39] were used as the experts. The DOS and IOM parameters were fixed for the generated hyper-heuristics. The results demonstrated the success of the selection hyper-heuristic generated by the proposed approach observing both experts when compared to each constituent hyper-heuristic.
In this study, a novel framework for generating AL hyper-heuristics is introduced. The key differences between this framework and past ones are as follows:
The approach is applied to a new problem domain with new components.
A new data collection technique is proposed.
A wider range of initial features is engineered and refined via observation, feature selection, and experimental results.
Different model configurations to the previous approaches are explored.
Different data sources are created and used to generate multiple different AL hyper-heuristics.
Additional to the classifiers modelling heuristic selection, a regression model is used to predict the best IOM and DOS parameter settings.
The performance of automatically generated AL-based selection hyper-heuristics, trained on small problem instances, are tested on larger unseen problem instances (learn-from-small-apply-to-big).