Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning

Barros-Everett, Tomás; Montero, Elizabeth; Rojas-Morales, Nicolás

doi:10.3390/app15062946

Open AccessArticle

Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning

by

Tomás Barros-Everett

,

Elizabeth Montero

^*

and

Nicolás Rojas-Morales

Departamento de Informática, Universidad Técnica Federico Santa María, Avenida España 1680, Valparaíso 2390123, Chile

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(6), 2946; https://doi.org/10.3390/app15062946

Submission received: 28 November 2024 / Revised: 10 February 2025 / Accepted: 1 March 2025 / Published: 9 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Setting parameter values is crucial for the performance of metaheuristics. Tuning the parameters of a metaheuristic is a computationally costly task. Moreover, parameter tuning is difficult considering their inherent stochasticity and problem instance dependence. In this work, we explore the application of machine learning algorithms to suggest suitable parameter values. We propose a methodology to use k-nearest neighbours and artificial neural network algorithms to predict suitable parameter values based on instance features. Here, we evaluate our proposal on the Capacitated Vehicle Routing Problem with Time Windows (CVRPTW) using its state-of-the-art algorithm, Hybrid Genetic Search (HGS). Additionally, we use the well-known tuning algorithm ParamILS to obtain suitable parameter configurations for HGS. We use a well-known instance set that considers between 200 and 1000 clients. Three sets of features based on geographical distribution, time windows, and client clustering are obtained. An in-depth exploratory analysis of the clustering features is also presented. The results are promising, demonstrating that the proposed method can successfully predict suitable parameter configurations for unseen instances and suggest configurations that perform better than baseline configurations. Furthermore, we present an explainability analysis to detect which features are more relevant for the prediction of suitable parameter values.

Keywords:

automatic metaheuristic configuration; explainable artificial intelligence; machine learning; parameter values prediction

1. Introduction

Vehicle Routing Problems (VRPs) are about finding efficient delivery routes for a set of vehicles. The goal is to minimize travel times while dealing with constraints such as vehicle capacity and delivery time windows. The VRP family of problems is faced in real-world situations daily, such as with food and last-mile delivery. Moreover, novel and varied problem instances must be considered with changing road availability at different hours of the day. Metaheuristics are able to approximate the optimal solution to intractable problems such as VRPs [1]. To solve an instance of a VRP using a metaheuristic algorithm, it is first necessary to define the parameter values with which the algorithm will run. These parameters have a crucial effect on the quality of the algorithm’s solutions. Moreover, no unique parameter configuration works well for all possible problem instances. Consequently, finding a quality configuration is a difficult and time-consuming task that will be repeatedly executed when new or unseen routing instances are being solved.

Machine learning algorithms have been applied to design, complement, and enhance metaheuristic algorithms [2]. In this work, we propose a methodology to use machine learning (ML) algorithms to predict suitable parameter configurations for metaheuristic algorithms solving unseen instances. The objective of this work is to avoid the need to execute parameter-tuning processes every time a new problem instance is solved. For this, the ML algorithms will be trained with parameter values obtained by the well-known ParamILS tuner [3] and algorithmically extracted instance features. Here, we analyze the Capacitated VRP with Time Windows (CVRPTW) as a case study. We perform a clustering-based feature extraction focused on understanding client distribution characteristics using a set of benchmark instances. Moreover, we use a state-of-the-art metaheuristic to solve hard problem instances of the CVRPTW named Hybrid Genetic Search [4]. We evaluate the application of k-nearest neighbours and Artificial Neural Networks for the feature-based prediction of high-quality parameter configurations. To analyze which CVRPTW features are most relevant to the prediction of parameter configurations, we present an analysis using Explainable AI methods. The main contributions of this work are:

The proposal of a methodology that uses a machine learning approach to suggest suitable parameter values for unseen instances.
The definition of a set of clustering-based CVRPTW features to describe instances.
The assessment of our proposal applied to the CVRPTW using current state-of-the-art algorithms.
The analysis of the most relevant CVRPTW features in the prediction step using Explainable AI techniques.

It is important to mention that the objective of our work is not to propose a new machine learning technique. Rather, the main idea is to propose and evaluate a simple and novel methodology to predict suitable parameter values for unseen instances of a vehicle routing problem. The structure of the article is defined as follows: Section 2 presents details of the CVRPTW and previous work in the literature where machine learning techniques are used for metaheuristic algorithms. Section 3 describes our methodology to predict suitable parameter values using ML. Section 4 details the experimental setup applied in our study. Results are discussed in Section 5. Our main conclusions and paths for future work are presented in Section 6.

2. Background

2.1. Capacitated Vehicle Routing Problem with Time Windows

The Vehicle Routing Problem (VRP) is a classic combinatorial optimization problem. It involves finding an optimal route for a set of vehicles to fulfill the demands of a set of geographically distributed customers. Classical objective functions for VRP consider minimizing the total transportation costs, distance, or time.

CVRPTW includes two additional constraints: vehicles possess a maximum carrying capacity (C), and clients may only be served during predefined time windows (TW). Figure 1 shows a representative scheme of an example problem instance of five clients. Nodes represent the depot and the five clients. Edges represent connections between nodes with their respective travel times. In this case, we consider two vehicles. If we consider that the journey starts at time

t = 0

, the first vehicle visits client 1 at time

t_{1} = 2

and client 2 at time

t_{2} = 7

. This vehicle is back to the depot at

t = 10

. The second vehicle visits client 3 at

t_{3} = 2

, client 5 at

t_{5} = 7

, and client 4 at

t_{4} = 10

. This vehicle is back at the depot at

t = 18

. Considering a maximum capacity of three visits for each vehicle, the presented solution can be considered feasible for the problem at hand. Moreover, if each client is available at the time its corresponding vehicle arrives to visit it, the solution presented is also feasible concerning time windows. The quality of this solution can be computed as the total travel times = (2 + 5 + 3) + (2 + 5 + 3 + 8) = 28.

The problem is especially interesting because of the difficulty in dealing with time-window constraints. As far as we know, the state of the art algorithm for this problem is the Hybrid-Genetic Search (HGS).

The HGS algorithm [4] as implemented in [5] was first proposed as an algorithmic framework to address three VRP variants: multi-depot VRP, periodic VRP, and multi-depot periodic VRP with capacitated vehicles and constrained route duration. In ref. [6], this framework was extended to incorporate a large class of time-constrained VRPs including VRP with time windows, multi-depot VRP with time windows, and vehicle-site dependencies VRP with time windows. As we chose HGS to perform our experiments, further details of the algorithm will be presented in Section 4.1.

2.2. Literature Review

In recent years, machine learning techniques have been applied to assist in the resolution of intractable problems with metaheuristics [7], leveraging the high volume of data that can be obtained from solved problem instances to approximate both algorithmic decisions and design decisions. In ref. [8], supervised learning was applied to determine whether linearizing a quadratic programming problem would reduce solving time. With a similar approach, a metaheuristic’s computational load can be reduced by approximating costly calculations with a regression model. Supervised learning can also be used to evaluate parametrizations on novel instances, potentially producing a parametrization better adapted to the particular instance.

An unsupervised learning approach for combinatorial optimization is presented in [9]. Here, the authors evaluate the use of clustering in evolutionary algorithms to reduce the cost of evaluating populations (Fitness Imitation) and the use of clustering algorithms to group problem instances to obtain a set of quality metaheuristic parameters per group.

Relevant conclusions are presented in [10], where the authors propose investigating the application of machine learning techniques to predict the evolution of dynamic instances to efficiently update a metaheuristic’s parameters. The authors also reflect on the potential of using unsupervised learning to group problem instances to discriminate their underlying distributions to optimize the search of parameters.

ISAC (Instance Specific Algorithm Configuration) [9] proposes the use of a clustering algorithm to configure the parameters for each group of similar problem instances. When a new problem instance needs to be solved, the automatically configured parameters of the most similar instance group are used. Here, instances were clustered according to the similarity of their extracted features. Similarly, the authors of [11] propose the construction of a portfolio of parameter configurations. Novel instances would then be solved by sampling from said portfolio with several methods, including a reinforcement-learning agent.

For metaheuristic parameter tuning, ref. [2] presents three ways in which machine learning techniques can be applied. Here, a supervised learning approach is used, and a feature set must be constructed to train a machine-learning model for parameter prediction. While some feature sets for TSP and VRP problems have already been proposed [12,13], no current feature set includes both capacity-related and time-related features.

In the literature, there are some machine learning applications similar to our proposal, such as [14,15,16,17,18,19,20]. However, there are important differences in the selected machine learning algorithms. For example, in Dobslaw’s future work, it is mentioned that artificial neural networks could be considered in the design and comparison with their proposal. Moreover, no work was found that uses k-nearest neighbors to predict metaheuristic parameters.

3. Methodology

Routing problems appear daily in real-world situations. Moreover, several changes can occur in the availability of roads during the route of a vehicle, considering traffic collisions and traffic jams at specific hours/days. The road availability changes can be represented as new problem instances or modifications of existing ones. However, if we are interested in solving routing problems, the performance of algorithms can drastically change when modifications are incorporated into current instances, or when new instances are evaluated. For this, suitable parameter values can be searched through parameter-tuning or control methods to reach good solutions for these specific instances. However, finding good parameter values is a time-consuming task.

In this work, we are interested in using machine learning techniques to predict suitable parameter values for metaheuristics to solve unseen instances. Our main motivation is to suggest suitable parameter values without the need to execute new parameter-tuning processes every time we need to solve new instances. For this, we propose an algorithm configuration pipeline that considers the following four steps: (1) Instance feature extraction (problem-specific metrics are computed to characterize each instance), (2) Metaheuristic parameter tuning (to obtain suitable parameter values for the target metaheuristic), (3) Training of machine learning algorithms (focused on training ML algorithms considering the features extracted in the first step and the parameter configurations for the target metaheuristic obtained in the second step), and (4) Parameter prediction for novel instances (we predict high-quality parameter values for the target metaheuristic in unseen problem instances). We evaluate our approach on a specific VRP considering a set of well-known problem instances and a state-of-the-art target metaheuristic. We divide the problem instances into two sets, a training set and a testing set. The training set is used for parameter tuning, while the testing set is used to evaluate the machine learning models’ generalization capability.

In the following sub-section, we present an in-depth analysis and characterization of CVRPTW instances. Details of the metaheuristic parameter tuning and training steps will be presented in Section 4. The results obtained in the prediction step and a comparison with baseline configurations are reported in Section 5.

3.1. Instance Feature Extraction Step

The definition of features is critical for a good instance characterization, and therefore a proper learning process and high-quality predictions. In the literature, some approaches to characterizing vehicle routing problem instances can be found [12,13]. However, most previously studied features show high correlations, and have to be treated using principal component analysis, thereby losing interpretability. Here, we select a subset of the well-known vehicle routing problem features and introduce a set of new features based on clustering analysis using a variation of the OPTICS algorithm.

3.1.1. Problem Instances

In our experiments, we use the well-known CVRPTW Homberger set of 300 instances [21]. The instances consider a number of clients ranging from 200 to 1000, a number of vehicles proportional to the corresponding number of clients, and varying distributions of client locations, vehicle capacities, and time windows. The instances can be grouped into six subsets, as shown in Table 1. The main difference between the instances suffixed 1 and 2 in these subsets is their scheduling horizon. Lower vehicle capacities in C1, R1, and RC1 force a short scheduling horizon, reducing the number of clients that can be served by a single vehicle.

3.1.2. Feature Metrics

Three sets of features will be used to numerically characterize CVRPTW problem instances: Spatial, Clustering, and Time Window features.

Spatial Features

Mainly taken from [13], this set of features describes the client distribution and the level of vehicle burden of each problem instance. We define vehicle burden as the amount of work each vehicle must undertake for all clients to be served. Naturally, vehicle burden is inversely proportional to the number of vehicles and their capacity, while increasing with client demands and the number of clients. Here, we consider the following features:

Number of clients: Measures the instance size.
Distance from centroid to depot: Measures client distribution bias with regards to the depot.
Average distance between depot and clients: Measures the degree of clustering around the depot.
Average distance between centroid and clients: Measures client distribution density.
Coefficient of Variation (CV) of distance between centroid and clients: Measures client distribution heterogeneity. The coefficient of variation was used over standard deviation as it tended to be correlated with the average.
Ratio of average client demand to capacity: Measures vehicle burden.
Ratio of the standard deviation of client demands to capacity: Measures vehicle burden heterogeneity.
Ratio of clients to vehicles: Measures vehicle burden.
Average of distances to the nearest neighbor: Measures client distribution density.
CV of distances to the nearest neighbor: Measures uniformity of client distribution density.

All the previously distance-related features defined have been normalized by the longest possible distance in the corresponding problem instance. This value was computed as the diagonal of the instance’s bounding box. This is to prevent the instance size from influencing the values of other features.

For nearest neighbor based features, i.e., Average of distances to the nearest neighbor, and CV of distances to the nearest neighbor, a value of

k = 2

was chosen as presented in [13]. The authors initially decided to use average, standard deviation, skewness, kurtosis, and CV for all spatial features. They also applied a principal analysis component and reduced the initial set of 386 features to seven components, containing 71% of the data set’s variance. Here, we use only average and CV metrics to reduce the curse of dimensionality. We also avoid using the principal analysis component algorithm to maintain the interpretability of our results.

Clustering Features

All clustering features were computed by applying a variation of the OPTICS algorithm [22] we call dynamicOPTICS. The advantage of OPTICS over k-means is that k-means requires the prior definition of the number of clusters to compute, while OPTICS determines said amount heuristically. In dynamicOPTICS, a unique hyperparameter that defines the minimum number of samples to create a cluster (min_samples) must be set. To set this value we performed a linear search considering min_samples values between 2 and 50. For each instance, we stored the best min_samples value, optimizing the clusteringQuality measurement in Equation (1).

Q = α \times D_{o u t} - β \times D_{i n} - γ \times O R

(1)

where

D_{o u t}

is the average inter-cluster distance and is computed by Equation (2),

D_{i n}

is the average intra cluster distance as shown in Equation (3),

O R

is the Outlier Ratio computed by Equation (4),

α, β, γ \in R^{+}

,

N_{C}

is the number of clients in a cluster C,

C_{n e i g h b o u r}

is the closest cluster to C, and

c e n t r o i d_{C}

is the centroid of a cluster C computed by Equation (5).

D_{o u t} = \frac{\sum_{c}^{| C l u s t e r s |} d i s t (c e n t r o i d_{C}, c e n t r o i d_{C_{n e i g h b o u r}})}{| C l u s t e r s |}

(2)

D_{i n} = \frac{\sum_{i = 0}^{N_{C}} (d i s t (C l u s t e r C l i e n t s [i], c e n t r o i d_{C}))}{N_{C}}

(3)

O R = \frac{| O u t l i e r s |}{| C l i e n t s |} x |

(4)

c e n t r o i d_{C} = \frac{\sum_{i = 0}^{N_{C}} (C l u s t e r C l i e n t s [i] . x, C l u s t e r C l i e n t s [i] . y)}{N_{C}}

(5)

The idea is to maximize clustering quality by minimizing the number of outliers, the distance between clients in the same cluster, and the distance between clients in different clusters.

α, β

, and

γ

were manually set from preliminary experiments to 1, 2, and 1, respectively. Figure 2 shows the clusters obtained in Homberger instances.

The clustering features considered for this work are:

Optimal min_samples value: Measures clusterability.
Cluster Ratio: Number of clusters.
Outlier Ratio: Number of non-clustered clients.
Avg. clients per cluster: Measures cluster size (number of clients).
CV clients per cluster: Cluster size heterogeneity.
Avg. of intra-cluster distances: Cluster density.
Avg. of inter-cluster distances: Cluster spread.

Time Window Features

Additionally, we propose the study of four features related to the time windows of the CVRPTW instances:

Ratio of the highest number of overlapping windows to total: Measures the tightest time interval.
Ratio of average number of overlapping windows to total: Measures the average time tightness.
Ratio of average window length to the longest window: Measures the normalized time window length.
Ratio of the standard deviation of window length to the longest time window: Measures normalized time window heterogeneity.

Considering that time-window constraints greatly reduce the solution space and complicate the search for feasible and good-quality solutions, we expect instances with narrow time-windows to be more computationally expensive and produce more unfeasible solutions.

3.1.3. Analysis of Extracted Features

The feature extraction process aims to describe a problem instance through numerical values. A quality set of features must be enough to distinguish dissimilar instances from each other and to group similar instances. As Homberger instances are grouped into subsets with different characteristics, we can validate the quality of our proposed feature set by clustering the instances concerning different feature subsets and checking whether we obtain a grouping similar to the a priori grouping defined at the instance sets creation.

For this purpose, k-means (Lloyd’s Algorithm) was chosen to obtain high-quality clustering results for a given number of clusters on each execution. In preliminary experiments, spectral clustering [23] was briefly tested, but was discarded because it showed noisier results. For each execution, k-means was run with 1000 random centroid initializations, and the result of minimal inertia was chosen. All feature values were normalized to a [0, 1] range before clustering to equalize the weight of each feature.

We validated the extracted features on the Homberger instances set. We first validated each feature subset independently (spatial, demand, and clustering features). Clustering with time window features was attempted, but it produced noisy results due to the wide variance of the randomly allocated time-windows. We conclude by presenting the clustering results using all the extracted features.

Figure 3 shows the clustering results of all 300 Homberger instances using only spatial features, with

k = 3

. Both the average_distance_to_centroid and cv_distance_to_centroid features were discarded as they are perfectly correlated to depot-based features. In Figure 3, each column corresponds to a given number of clients, while each row shows five subsets of ten instances each. The instances in these subsets differ only in their time window allocation. From this, we notice a successful clustering process using spatial features to discriminate between small, medium, and large problem instances. Larger values of k do not produce more granular distinctions between instance sizes. This is likely due to the lack of inter-instance variance in depot- and centroid-related features.

Clustering concerning the two demand-related features yields a perfect separation between the three vehicle capacity values in the subsets (see Figure 4). We notice, for the

c a p a c i t y = 200

, subsets

C 1

,

R 1

, and

R C 1

are grouped together. Moreover, for the

c a p a c i t y = 700

, subset

C 2

is in its cluster, and for the

c a p a c i t y = 1000

subsets,

R 2

and

R C 2

are clustered together.

Figure 5 shows clustering regarding client-clustering-related features. Instances in yellow are easily clustered, those in blue are difficult to cluster, and the pink ones do not present clear client clustering. Most importantly, these clustering results are robust to the exclusion of the optimal_min_samples feature, indicating that the remaining clustering features adequately measure the clusterability of the instance’s client distribution.

The clustering results of Homberger instances with all extracted features and

k = 3

are shown in Figure 6 and

k = 4

are shown in Figure 7. The correct clustering of

C 1

instances is most noticeable, as these are the most different from all others due to their distinctive client distributions. In order to understand the differences between clusters 2, 3, and 4, we present Table 2, listing the most decisive features. Only features that present high inter-instance differences have been included. Low and High values are strictly relative. Cluster 4 correctly groups together instances that contain either large or sparse client-clusters; these instances also present high vehicle capacities. Finally, clusters 2 and 3 present very similar values for all features except Capacity, correctly distinguishing between type 1 and 2 instances, which are only different concerning their maximum capacity in the case of R and

R C

instances. Cluster 3 does, however, incorrectly include the smaller

C 2

instances, indicating that the feature set was insufficient to distinguish high-capacity, small, densely clustered instances from high-capacity, large, non-clustered instances. This situation may be solved by including more client-clustering features in the feature set.

3.1.4. Exploratory Data Analysis Conclusions

The Homberger instance set was designed with metaheuristic experimentation in mind [21]. By setting a static distribution of clients and varying the number of vehicles, capacity, and time-window allocation, it is possible to evaluate the efficacy of a metaheuristic as a function of such variables. In these instances, the depot location was selected to be the center of the client bounding box. While this placement is of approximately good quality for the reduction of total travel time, it is not realistic with regards to real client-depot distributions, and will also fare worse than algorithmically determined placements [24].

Random client distributions were obtained by a uniform sampling in the space. Thus, skewed client distributions need to be included. The number of clients determines the number of vehicles. While this guarantees enough vehicles to satisfy the client’s demands, it prevents analyzing instances where vehicles are either over or under-burdened. The lack of distribution variety in published CVRPTW instances will likely reduce the effectiveness of applying machine learning models, especially concerning their ability to generalize to arbitrary instances. According to [25], an approach to train machine learning models for metaheuristic selection is to randomly generate instances based on statistically varying features known to relate to instance difficulty (such as number of vehicles, vehicle capacity, and client distribution). This idea will be considered for our future paths of research.

3.2. Algorithms for Training and Prediction Steps

This section introduces the two machine learning algorithms used in the training and prediction steps of our proposal: K-nearest neighbors and artificial neural networks (ANN). For the training step, the ML algorithms are trained considering (1) the set of 22 extracted features for the instances in the first step and (2) the suitable parameter values per instance obtained in the second step. The ML algorithms learn a non-linear mapping between a feature vector and metaheuristic parameter values. Then, for the prediction step, the ML algorithms consider a fixed-length vector of extracted features of the unseen instances as input. The output is a fixed-length vector of the suggested parameter values.

K-nearest neighbors (KNN) is a supervised learning algorithm proposed by Evelyn Fix and J. L. Hodges, Jr. in [26]. KNN predicts the target value of an input by selecting the k closest data points from its training set and averaging their values. In our case, KNN selects the instances whose features are most similar to the novel instance being predicted and produces parameter values by averaging their optimal parameter values. The advantage of using the KNN algorithm is that it can produce good results considering low amounts of data for the training step, as long as the unseen inputs are not too different from what the model has seen [27].

KNN can suggest parameter vectors for novel instances by finding which of our known instances is similar. We used the KNeighborsRegressor implementation of sklearn library [28] for the KNN experiments.

A KNN model is first trained on the training set of instances. To determine the parametrization for a novel instance, we find its nearest neighbors and use the average of their optimal parameter vectors. Depending on the metaheuristic used, averaging parameters may not make sense, as a metaheuristic’s parameter space will, in all likelihood, be non-linear. These cases are nevertheless included to provide a more robust parameter prediction that is less influenced by noise in the instance set. We evaluate KNN models with

K \in {1, 3, 5, 7, 9, 11}

.

Artificial neural networks (ANN) are computational models inspired by interconnected neurons in the human brain [29]. Just as electrical signals propagate throughout the brain to transmit sensory and cognitive information, ANNs define propagation algorithms that process input and learn from expected output. A neuron is defined as a mathematical function that takes a numerical input and produces a numerical output. Usually, the input of a neuron will be a linear combination of other neuron’s outputs. An ANN can be defined as a sequence of interconnected neuron layers. For an input vector of length I, the first layer of an ANN must be of the same length, since each feature of the input vector must be fed to its corresponding neuron. In the case of a two-layer ANN, the remaining layer corresponds to the output layer. Two-layered ANNs may prove sufficient for very simple classification problems, but their modeling capacities can not be enough for more complex scenarios. To increase the complexity of the model, additional hidden layers can be added between the input and output layers. The addition of hidden layers, dropout units, and a variety of problem-oriented pre-processing layers allows neural networks to develop the capacity to learn datasets of arbitrary complexity.

The artificial neural network will take a vector of features as input and produce a vector of parameters as output. We design the neural network presented in Figure 8, where each layer of the proposed design is fully connected (denoted as dense in the figure).

The network consists of a 22-neuron input layer (one neuron per feature of the instances), six hidden layers, and a five-neuron output layer (one neuron per parameter tuned of HGS). All hidden layers use a hyperbolic tangent tanh) as their activation function, while the output layer uses a linear function. We trained the network with an 80/20 split using five-fold cross-validation over the training set of instances. All network (structural) parameters were manually tuned, minimizing the loss function on the validation set. These parameters include the number of neurons per hidden layer, the activation function used, and all parameters are listed below with their corresponding values: Optimizer: AdamW with

α = 0.0002

, epochs: 1000, batch size: 30, and patience: 200. We considered the Mean Squared Error as the loss function.

We used the Keras library [30] for the artificial neural network tests.

4. Experimental Setup

This section presents the experimental setup and details of the metaheuristic parameter tuning and training steps. We selected Parameter Iterated Local Search (ParamILS) proposed by [3] for the Metaheuristic parameter tuning step and the state-of-the-art VRP solver named the Hybrid-Genetic Search (HGS) algorithm [4] as the target metaheuristic. Here, we used the PyVRP implementation [5].

For our experiments, we split the Homberger instance set into a training and a testing set. The training set contains 240 of the 300 Homberger instances (80%), with the remaining 60 instances belonging to the test set. The training set was used to perform the parameter tuning and model training, while the testing set (the remaining 20%) was used to evaluate the models’ predictions. Due to the limited size of the Homberger instance set, we manually sampled the six subgroups of instances to construct a representative testing set that can then be used to evaluate our models’ generalization capabilities.

All experiments were executed on a PC with an Intel Xeon CPU E5-2680 v3 processor @2.50 GHz with 64 Gb of RAM under Ubuntu x64 16.10 distribution.

4.1. HGS

The Hybrid-Genetic Search algorithm [4] was first proposed as an algorithmic framework to address three VRP variants: multi-depot VRP, periodic VRP and multi-depot periodic VRP, with capacitated vehicles and constrained route duration. In [6], this framework was extended to incorporate time-constrained VRPs, including VRP with time windows. PyVRP is an open-source, state-of-the-art VRP solver that implements HGS in C++ through a Python interface. The pseudocode in Algorithm 1 shows the Hybrid-Genetic Search algorithm structure as implemented in PyVRP [5]. HGS is an evolutionary algorithm that manages both a feasible and infeasible population. Crossover and a local search procedure generate a novel solution for each iteration. A simple Ordered Crossover (OX) is used [31]. During local search, a neighborhood improvement operator is then systematically applied, followed by a repair phase, when the resulting solution is infeasible. Repair increases the penalty values by a factor of 10 and calls the improvement operator again. The process is repeated with a penalty increase of 100 if the offspring remains infeasible. Three main neighborhoods are evaluated: Swap and relocate, 2-opt inter routes, and intra routes. HGS attempts to maintain a static proportion of feasible/infeasible solutions to repair infeasible solutions. Parent solutions are selected based on their quality and their contribution to population diversity. Whenever the population size exceeds a threshold, the remaining solutions are removed until the minimum allowed size is reached. The objective of the algorithm is to find a feasible solution with the minimum total travel times.

Algorithm 1 Hybrid-Genetic Search (HGS).

Initialize population
while number of iterations without improvement

< I t_{N I}

and time

< T_{m a x}

do
Select parent solutions

P_{1}

and

P_{2}

;
Generate offspring C by crossing

P_{1}

and

P_{2}

;
    Educate offspring C using local search;
    if C is infeasible then
        Insert C into infeasible subpopulation;
        Repair with probability

P_{r e p}

;
    end if
    if C is feasible then
        Insert C into feasible subpopulation;
    end if
    if maximum subpopulation size is reached then
        Select survivors;
    end if
    Adjust penalty parameters for violating feasibility conditions
    if best solution has not been improved during

I t_{d i v}

iterations then
Diversify population;
end if
end while
Return best feasible solution;

4.2. Metaheuristic Parameter Tuning Step Details

In this step, we execute a tuning algorithm to obtain suitable parameter values for the HGS metaheuristic. A tuning process is individually executed on each problem instance considered in the training set. We record the best parameter configuration found per problem instance. Different time budgets for the target algorithm can require different parameter values [32,33]. Hence, in our experimental setup, we consider different computational efforts for the metaheuristic parameter tuning step in order to explore their effect on the prediction capabilities of the proposal.

This work uses ParamILS [3] to tune HGS, a well-known iterated local search tuning method. From an initial parameter configuration, ParamILS searches its neighborhood by changing one parameter value each time. A discretization of parameter values is required to define reduced neighborhoods. ParamILS defines a specialized domination-based quality comparison considering both the average performance and the number of executions of each parameter configuration on a set of problem instances/seeds.

We evaluate three parameter tuning scenarios, considering three runtime limits for HGS: 10, 30, and 60 s. The objective is to analyze HGS’ convergence capability for different execution scenarios and its possible effects on the prediction of suitable configurations. For each time setting, we performed 10 independent runs of ParamILS per problem instance in the training set. We then chose the configuration that provided the best results according to ParamILS evaluation. The budget for each ParamILS execution was set to a maximum of 1000 evaluations of HGS. The quality of each HGS execution is measured as the total travel time of the best feasible solution found. Hence, ParamILS searches for the parameter configuration that allows HGS to find feasible solutions with the minimum travel times. The total tuning time to individually tune HGS with the 240 training instances and the three runtime limits was roughly 1400 h of computer time using 40 threads.

In our experiments, we tuned five parameters of HGS:

$μ$ : Minimum population size; lower bound of the number of solutions that are allowed to exist in either the feasible or unfeasible subpopulation.
$λ$ : Generation size; the number of solutions over $μ$ allowed to exist. $λ$ solutions are eliminated through a survivor selection phase once the maximum number is reached.
$n_{Elite}$ : Number of elite solutions considered in the fitness calculation.
$n_{Closest}$ : Number of close solutions considered in the diversity-contribution measure of the fitness solution.
$ξ^{REF}$ : Target proportion of feasible individuals to total population size.

Whereas pyVRP’s HGS implementation features a few additional parameters for modulating the Penalty Manager, only the previous five will be taken into account for our experiments, as they are the algorithm’s most influential parameters.

The parameter search space that ParamILS will search through is summarized in Table 3.

The bold values show the best parameter values defined in [5]. The ranges of values for each parameter were obtained from the original paper [4].

5. Results

In this section, we present the results of HGS considering the following parameter vectors:

A.: Base parameter values for the HGS metaheuristic (as defined in [5]).
B.: Parameter values predicted by KNN with $K \in {1, 3, 5, 7, 9, 11}$ .
C.: Parameter values predicted by the artificial neural network.
D.: Parameter values samples from a random distribution.

The set of base parameter values (A) is constant for all instances as specified in the literature. We evaluate six different values for KNN.

K = 1

borrows the parameters from the instance’s most similar neighbor, while

K = 3

through

K = 11

takes the nearest neighbors and averages their parameter vectors. Interestingly, even when the predicted values are restricted to a specific set (for the tuning discretization required by ParamILS), the prediction of KNN allows the use of new values coming from the averages of the corresponding k neighbors’ values. Here, a complementary behavior between the discretization during the tuning process and the in-detail analysis of the prediction step can help reduce the tuning effort without losing the ability to select intermediate values for the parameters.

We also consider a set of random parameter values (D) to test the metaheuristic’s sensitivity to parameter selection. Parameter values were randomly sampled for each instance, using a uniform distribution that covered the discrete parameter space used for parameter prediction. The following results consider the testing set of instances. For each instance-vector pair, we perform ten independent executions of HGS, to reduce the effect of the stochasticity on the results. For each parameter vector, the metaheuristic was executed considering 10, 30, and 60 s.

Table 4 shows the results of evaluating the predicted parameter configurations on HGS. We show the average quality value obtained in each tuning scenario tested for each method (Base, KNN, Neural Network, and Random) and the subgroups of instances (C1, C2, R1, R2, RC1, and RC2). The last three columns summarize the method’s performance on the set of Homberger problem instances. Bold values show the best performance per instance subgroup, and an asterisk indicates the worst values per set.

The results show that the algorithm that suggested the parameter configurations that produce the best performance in HGS for a higher number of subgroups was KNN with

K = 7

for the 10 s execution time. These results were obtained in the

C 1

and

C 2

subgroups. For the 30 s time budget, KNN with

K = 1

performs best in two subgroups (

C 1

and

R C 1

). On the other hand, KNN with

K = 5

was the algorithm that suggested the best parameter configurations for 60 s of time budget.

Considering the algorithms related to the lowest quality performance of HGS, for 10 s, the base configuration and KNN5 produce the worst results in two subgroups (

C 1

and

R C 1

, and

C 2

and

R 1

, respectively). For 30 and 60 s, the random configuration produces the worst performance in five subgroups.

After analyzing the average of all the quality values, KNN1 has the best average quality value for 10 s, and KNN11 has the worst. Similar behavior occurs in the 30 s time budget, where KNN1 has the best average quality value, and KNN7 has the worst. The base configuration reaches the best average quality value for the 60 s time budget, and KNN1 obtains the worst one.

Regarding the performance of ANN, it is interesting to observe that the predicted parameter values are not the poorest or best suggested for any subgroup of instances. However, compared to the Base parameter set, with 10, 30, and 60 s of execution of HGS, the ANN suggested better configurations in five out of six subgroups of instances. Compared to KNN approaches, the ANN does not obtain better configurations on average. This situation is due to the low number of features obtained from the instances provided to the ANN to generalize and suggest configurations to unseen instances.

The quality of predicted parameters depends on the runtime limit, as seen in Figure 9. For a 10-second time limit, KNN with one neighbor provides the best result, with a 4% improvement over the base set. For a 30 s time limit, both KNN1, KNN9, and KNN11 provide similarly good results, though the difference between said parameter sets and the base set is nearly negligible. Finally, with a 60 s time limit, the base parameter set provides the best results, with all predicted parameter set evaluations landing within 2% of each other. When considering the three-time limits together, KNN1 provides the best average results (See Figure 10).

Figure 11 shows the loss curves for the training and validation sets during the training process of the neural network that uses the data concerning all the problem instances. A typical learning rate can be observed in these curves. Starting with a high loss value, the process converges to low loss levels for both training and validation data sets.

The main issue concerning predicting a metaheuristic’s optimal parameter configuration for a given instance is the stochastic nature of the tuning algorithm and the metaheuristic itself. When constructing a model to map an instance’s features to its optimal parameter vector, we must consider that an instance deterministically produces a feature vector but may produce different parameter vectors depending on the execution of the tuning method and the metaheuristic algorithm.

A significant disadvantage of our proposal is that it requires an extended period for parameter tuning to build the training data. As a tuning algorithm must run the metaheuristic thousands of times to obtain a result, increasing the metaheuristic’s execution time produces a multiplicative growth in parameter tuning time. Given the size of our improvement, computation time would likely be better spent increasing HGS’s run time. It is important to note that the Base parameters used for comparison here were sourced from a per instance-based tuning process whose time was not reported, and tuned the algorithm for run times ranging from 4 to 40 min.

The true efficiency of machine learning models is realized when we have overwhelmingly large amounts of training data at our disposal. In the theoretical scenario of having a widely spanning set of instances, the potential for parameter prediction through machine learning models could be a game-changer in tackling intractable problems. This is particularly beneficial if quick and suitable solutions are required.

Explainability Analysis

By performing an explainability analysis on our neural network model, we may determine which of our extracted features strongly correlate with the predicted parameters of HGS. We chose to apply this analysis to the neural network over the KNN models, as KNN is a relatively transparent algorithm, while ANNs function as black box models.

We applied SHapley Additive exPlanations (SHAP) [34] to our three neural network models (for 10, 30, and 60 s time limits). We considered an average of the three time limit results, to ease the visualization of the results. It is important to mention that this averaging did not substantially affect the ordering of features, as all three models presented similar results. SHAP values were computed with a sample size of 1000. Figure 12 and Figure 13 show our extracted features ordered by relevance for each predicted parameter. The SHAP charts show that the features at the top are the ones that most strongly impact model output for a particular parameter. The strength of said impact is proportional to the dispersion of points such that low-impact features will present many points clustered around the center axis, and high-impact features will present values clustered far from it. We note that some clustering features, such as the outlier ratio, cluster ratio, and CV of nearest neighbor distances (cv_NN_distances), are the most relevant for predicting the generation size parameter (see Figure 12a), though most features perform similarly well, as can be seen from each feature’s spread.

Both the number of close solutions (nc) and the number of elite solutions (ne) seem to be strongly correlated to the capacity features (ratio of the CV and mean of client demand w.r.t capacity), as both features are strongly inversely correlated to the parameters, as can be seen in Figure 12b,c. Strong results can also be seen for features such as CV of nearest neighbor distances, cluster ratio, and client number.

For the prediction of the minimum population size parameter (ps), Figure 12d shows the CV of nearest neighbor distances, client number, and outlier ratio features to be most relevant. Moreover, when predicting the value for the target proportion of feasible solutions (xi), the model’s output is most strongly impacted by the outlier ratio, client number, cluster ratio, and both capacity features (ratio of the CV and mean of client demand w.r.t capacity).

The most frequently relevant features between parameters are the cluster ratio, outlier ratio, client number, and both capacity features, which seem to have the largest impact on overall parameter prediction. Regarding the time window-related features, we observe that some are among the ten most important compared to other classical spatial or clustering features. Moreover, the time windows features can be seen for the top ten features for gs (sixth, ninth, and tenth place), for nc (fifth, sixth, seventh, and tenth place), for the ne (eighth place), and for the xi parameter (sixth and seventh place).

6. Conclusions

This work proposes a methodology to use ML algorithms to suggest suitable parameter configurations using problem instance features and the values a well-known tuning algorithm provides. Here, we use clustering algorithms to obtain instance features of the CVRPTW, HGS as the target algorithm, ParamILS to tune HGS, and KNN and ANN algorithms to predict configurations. Results show that the parameter prediction was successful, reducing the average travel time in Homberger instances compared to the base parameters proposed in [5]. It is important to mention that more accurate predictions would require instances with a higher level of diversity, for example, each of which may require specific parameter configurations that machine learning algorithms could learn. An explainability analysis was performed to determine the relevance of extracted features to parameter prediction. We determine that client clustering, client number, and vehicle capacity are the most relevant features when predicting HGS’ optimal parametrization. In addition, features related to the time windows were considered within the top ten most relevant for four of five parameters of HGS.

Future work we are interested in includes:

(A): Constructing a wider spanning set of CVRPTW instances using an evolutionary algorithm as mentioned in [25], to maximize the variance of extracted features in the instance set.
(B): Extending the methodology here proposed to other relevant combinatorial optimization problems like the classical Quadratic Assignment Problem [35] or Job-shop Scheduling applications [36]. Some more complex extensions of classical problems could also be interesting to consider [37].
(C): Applying this parameter prediction pipeline to tune the parameters of a metaheuristic for dynamically changing instances.

Author Contributions

Conceptualization, T.B.-E. and E.M.; methodology, E.M. and N.R.-M.; software, T.B.-E.; validation, T.B.-E.; formal analysis, T.B.-E., E.M. and N.R.-M.; investigation, T.B.-E., E.M. and N.R.-M.; resources, T.B.-E. and E.M.; data curation, T.B.-E.; writing—original draft preparation, T.B.-E., E.M. and N.R.-M.; writing—review and editing, E.M. and N.R.-M.; visualization, T.B.-E.; supervision, E.M. and N.R.-M.; project administration, E.M. and N.R.-M.; funding acquisition, E.M. and N.R.-M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Fund for Scientific and Technological Development (FONDECYT) [grant numbers 1230365, 11230748].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Dataset available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Elshaer, R.; Awad, H. A taxonomic review of metaheuristic algorithms for solving the vehicle routing problem and its variants. Comput. Ind. Eng. 2020, 140, 106242. [Google Scholar] [CrossRef]
Talbi, E.G. Machine learning into metaheuristics: A survey and taxonomy. ACM Comput. Surv. (CSUR) 2021, 54, 1–32. [Google Scholar] [CrossRef]
Hutter, F.; Hoos, H.H.; Stützle, T. Automatic Algorithm Configuration based on Local Search. In Proceedings of the Twenty-Second Conference on Artificial Intelligence, Vancouver, BC, Canada, 22–26 July 2007. [Google Scholar]
Vidal, T.; Crainic, T.G.; Gendreau, M.; Lahrichi, N.; Rei, W. A Hybrid Genetic Algorithm for Multidepot and Periodic Vehicle Routing Problems. Oper. Res. 2012, 60, 611–624. [Google Scholar] [CrossRef]
Vidal, T. Hybrid genetic search for the CVRP: Open-source implementation and SWAP* neighborhood. Comput. Oper. Res. 2022, 140, 105643. [Google Scholar] [CrossRef]
Vidal, T.; Crainic, T.G.; Gendreau, M.; Prins, C. A hybrid genetic algorithm with adaptive diversity management for a large class of vehicle routing problems with time-windows. Comput. Oper. Res. 2013. [Google Scholar] [CrossRef]
Gasse, M.; Bowly, S.; Cappart, Q.; Charfreitag, J.; Charlin, L.; Chételat, D.; Chmiela, A.; Dumouchelle, J.; Gleixner, A.; Kazachkov, A.M.; et al. Automatic Algorithm Configuration based on Local Search. In Proceedings of the Machine Learning for Combinatorial Optimization Competition (ML4CO): Results and Insights, Online, 6–14 December 2021; pp. 1–12. [Google Scholar]
Bonami, P.; Lodi, A.; Zarpellon, G. Learning a Classification of Mixed-Integer Quadratic Programming Problems. In Integration of Constraint Programming, Artificial Intelligence, and Operations Research; Springer: Cham, Switzerland, 2018; pp. 595–604. [Google Scholar]
Kadioglu, S.; Malitsky, Y.; Sellmann, M.; Tierney, K. ISAC—Instance-Specific Algorithm Configuration. In ECAI 2010; Ios Press: Amsterdam, The Netherlands, 2010; Volume 215, pp. 751–756. [Google Scholar] [CrossRef]
Karimi-Mamaghan, M.; Mohammadi, M.; Meyer, P.; Karimi-Mamaghan, A.M.; Talbi, E.G. Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar] [CrossRef]
Gunawan, A.; Lau, H.C.; Misir, M. Designing a portfolio of parameter configurations for online algorithm selection. In Proceedings of the Algorithm Configuration: Papers from the 2015 AAAI Workshop, Austin, TX, USA, 25–29 January 2015; pp. 2–8. [Google Scholar]
Arnold, F.; Sörensen, K. What makes a VRP solution good? The generation of problem-specific knowledge for heuristics. Comput. Oper. Res. 2019, 106, 280–288. [Google Scholar] [CrossRef]
Rasku, J.; Kärkkäinen, T.J.; Musliu, N. Feature Extractors for Describing Vehicle Routing Problem Instances. In Proceedings of the Student Conference on Operational Research, Nottingham, UK, 8–10 April 2016. [Google Scholar] [CrossRef]
Caserta, M.; Rico, E.Q.n. A cross entropy-Lagrangean hybrid algorithm for the multi-item capacitated lot-sizing problem with setup times. Comput. Oper. Res. 2009, 36, 530–548. [Google Scholar] [CrossRef]
Dobslaw, F. A parameter-tuning framework for metaheuristics based on design of experiments and artificial neural networks. World Acad. Sci. Eng. Technol. Int. J. Aerosp. Mech. Eng. 2010, 64, 213–216. [Google Scholar]
Pavón, R.; Díaz, F.; Laza, R.; Luzón, V. Automatic parameter tuning with a Bayesian case-based reasoning system. A case of study. Expert Syst. Appl. 2009, 36, 3407–3420. [Google Scholar] [CrossRef]
Yasmin, A.; Haider Butt, W.; Daud, A. Ensemble effort estimation with metaheuristic hyperparameters and weight optimization for achieving accuracy. PLoS ONE 2024, 19, e0300296. [Google Scholar] [CrossRef] [PubMed]
Narayanan, R.; Ganesh, N. A Comprehensive Review of Metaheuristics for Hyperparameter Optimization in Machine Learning. In Metaheuristics for Machine Learning: Algorithms and Applications; Wiley Online Library: Hoboken, NJ, USA, 2024; pp. 37–72. [Google Scholar]
Tayebi, M.; El Kafhali, S. Performance analysis of metaheuristics based hyperparameters optimization for fraud transactions detection. Evol. Intell. 2024, 17, 921–939. [Google Scholar] [CrossRef]
Vivek, B. Exploring The Efficiency of Metaheuristics in Optimal Hyperparameter Tuning for Ensemble Models on Varied Data Modalities. EAI Endorsed Trans. Intell. Syst. Mach. Learn. Appl. 2024, 1. [Google Scholar] [CrossRef]
Gehring, H. A Parallel Hybrid Evolutionary Metaheuristic for the Vehicle Routing Problem with Time Windows. In Proceedings of EUROGEN99; Springer: Berlin/Heidelberg, Germany, 1999. [Google Scholar]
Ankerst, M.; Breunig, M.; Kröger, P.; Sander, J. OPTICS: Ordering Points to Identify the Clustering Structure. In ACM SIGMOD Record; Association for Computing Machinery: New York, NY, USA, 1999; Volume 28, pp. 49–60. [Google Scholar] [CrossRef]
Shi, J.; Malik, J. Normalized Cuts and Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 888–905. [Google Scholar]
Salawudeen, A.; Akut, E.E.; Momoh, I.; Ibrahim, A.; Zion, M.; Yusuf, S. Depot Location Analysis for Capacitated Vehicle Routing Problem: A Case Study of Solid Waste Management. IJEEC-Int. Journaly Electr. Eng. Comput. 2020, 4, 132. [Google Scholar] [CrossRef]
Smith-Miles, K.; van Hemert, J. Discovering the suitability of optimisation algorithms by learning from evolved instances. Ann. Math. Artif. Intell. 2011, 61, 87–104. [Google Scholar] [CrossRef]
Fix, E.; Hodges, J.L. Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties. Int. Stat. Rev. 1989, 57, 238. [Google Scholar] [CrossRef]
Kouiroukidis, N.; Evangelidis, G. The Effects of Dimensionality Curse in High Dimensional kNN Search. In Proceedings of the 2011 15th Panhellenic Conference on Informatics, Kastoria, Greece, 30 September–2 October 2011; pp. 41–45. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
McCullough, W.; Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Chollet, F. Keras. 2015. Available online: https://github.com/fchollet/keras (accessed on 1 June 2024).
Prins, C. A simple and effective evolutionary algorithm for the vehicle routing problem. Comput. Oper. Res. 2004, 31, 1985–2002. [Google Scholar] [CrossRef]
López-Ibáñez, M.; Stützle, T. Automatically improving the anytime behaviour of optimisation algorithms. Eur. J. Oper. Res. 2014, 235, 569–582. [Google Scholar] [CrossRef]
Sae-Dan, W.; Kessaci, M.E.; Veerapen, N.; Jourdan, L. Time-dependent automatic parameter configuration of a local search algorithm. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, New York, NY, USA, 8–12 July 2020; pp. 1898–1905. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S. A unified approach to interpreting model predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar]
Santucci, V.; Ceberio, J. Doubly Stochastic Matrix Models for Estimation of Distribution Algorithms. In Proceedings of the Genetic and Evolutionary Computation Conference, New York, NY, USA, 15–19 July 2023; GECCO ’23. pp. 367–374. [Google Scholar] [CrossRef]
Benni, R.; Umarani, S.R.; Totad, S. A Comprehensive Study of Meta-Heuristic Algorithms for Job Shop Scheduling Optimization. In Proceedings of the 2024 15th International Conference on Computing Communication and Networking Technologies (ICCCNT), Kamand, India, 24–28 June 2024; pp. 1–10. [Google Scholar] [CrossRef]
Santucci, V. An Iterative Optimization Algorithm for Planning Spacecraft Pathways Through Asteroids. Appl. Sci. 2024, 14, 10987. [Google Scholar] [CrossRef]

Figure 1. Capacitated Vehicle Routing Problem with time windows scheme.

Figure 2. dynamicOPTICS clustering of Homberger instances. Black points are outliers not belonging to any cluster. Corresponding min_samples values are displayed at the top of each problem instance.

Figure 3. Clustering of Homberger instances with three clusters, using only spatial features.

Figure 4. Clustering of Homberger instances with three clusters, using only demand-related features.

Figure 5. Clustering of Homberger instances with respect to client-clustering features.

Figure 6. Clustering of Homberger instances with three clusters, using only client-clustering features.

Figure 7. Clustering of Homberger instances with four clusters, using all features.

Figure 8. Design of the artificial neural network used to predict parameter configurations for HGS. Each box corresponds to a layer of the ANN, and includes information on its activation function and input and output dimensionality.

Figure 9. Bar plot of the performance of HGS using the suggested parameter configurations. The average results obtained in ten runs over all the running times scenarios are presented per parameter set. Blue bars show results for 10 s, orange bars for 30 s, and green bars for 60 s.

Figure 10. Bar plot of the performance of HGS using the suggested parameter configurations. The average results over the three running times scenarios are presented.

Figure 11. Training and validation loss curves.

Figure 12. SHAP values. (a) Generation size (gs); (b) Number of close solutions (nc); (c) Number of elite solutions (ne); (d) Population size (ps).

Figure 13. Target proportion of feasible solutions (xi) SHAP values.

Table 1. Grouping subsets considering the client distribution and capacity of vehicles.

Set	Client Distribution	Capacity
C1	Densely Clustered	200
C2	Sparsely Clustered	700
R1	Random	200
R2	Random	1000
RC1	Semi Clustered	200
RC2	Semi Clustered	1000

Table 2. Comparison of cluster centroids considering

k = 4

, using all extracted features. Capacity* is measured as the inverse of

r a t i o_m e a n_c l i e n t_d e m a n d_c a p a c i t y

and was included to facilitate interpretation.

Table 2. Comparison of cluster centroids considering

k = 4

, using all extracted features. Capacity* is measured as the inverse of

r a t i o_m e a n_c l i e n t_d e m a n d_c a p a c i t y

and was included to facilitate interpretation.

Cluster	Capacity*	NN Distance	Ratio	Outlier Ratio	Clients per Cluster	CV of Clients per Cluster	Intra Cluster Distance	Inter Cluster Distance
1	Low	Low	Low	Low	Low	Low	Low	Low
2	Low	High	High	High	Low	High	Low	Low
3	High	High	High	High	Low	High	Low	Low
4	High	High	Low	Low	High	Low	High	High

Table 3. Discretization of parameters tuned. In bold are shown the parameter values suggested in [5].

Parameter	Values
$λ$	10	20	40	70	100
$n_{Closest}$	0.1	0.2	0.4	0.6	0.8
$n_{Elite}$	0.1	0.2	0.4	0.6	0.8
$μ$	5	15	25	35	45
$ξ^{REF}$	0.1	0.2	0.4	0.6	0.8

Table 4. Prediction performance comparison. For each method and each set, we show the average quality obtained in each tuning scenario tested. The last three columns summarize the method performance on the set of Homberger problem instances. Bold values show the best performance per instance subgroup. The worst values per set are accompanied by an asterisk.

Parameter Set	Instances	10 [s]	30 [s]	60 [s]	Avg. 10 [s]	Avg. 30 [s]	Avg. 60 [s]
Base	C1	14,934 *	14,686	14,608	16,005	15,366	14,920
	C2	11,761	10,876 *	10,514
	R1	26,336	24,993	23,344
	R2	15,220	14,638	14,363
	RC1	16,523 *	16,240	16,144
	RC2	11,255	10,762	10,548
KNN1	C1	14,728	14,572	14,543	15,645	15,231	15,228 *
	C2	11,100	10,532	10,505
	R1	26,005	25,216	25,529 *
	R2	14,850	14,377	14,144
	RC1	16,278	16,076	16,048
	RC2	10,908	10,610	10,597
KNN3	C1	14,755	14,579	14,547	15,768	15,630	15,031
	C2	11,202	10,550	10,448
	R1	26,515	27,641	24,542
	R2	14,904	14,349	14,123
	RC1	16,307	16,083	16,051
	RC2	10,925	10,576	10,473
KNN5	C1	14,732	14,575	14,535	15,648	15,620	15,094
	C2	11,061 *	10,533	10,418
	R1	26,054 *	27,620	25,028
	R2	14,816	14,335	14,128
	RC1	16,303	16,093	16,031
	RC2	10,920	10,563	10,422
KNN7	C1	14,727	14,585	14,531	15,882	15,631 *	15,121
	C2	11,018	10,522	10,440
	R1	27,485	27,654 *	25,092
	R2	14,875	14,348	14,175
	RC1	16,295	16,108	16,023
	RC2	10,892	10,566	10,466
KNN9	C1	14,763	14,588	14,512	15,924	15,233	15,016
	C2	11,205	10,529	10,432
	R1	27,493	25,242	24,576
	R2	14,868	14,379	14,140
	RC1	16,312	16,100	15,982
	RC2	10,905	10,559	10,452
KNN11	C1	14,771	14,595	14,523	16,017 *	15,249	14,938
	C2	11,176	10,536	10,425
	R1	27,951	25,254	24,082
	R2	15,042	14,454	14,163
	RC1	16,349	16,113	16,003
	RC2	10,815	10,544	10,434
Neural Network	C1	14,742	14,591	14,526	15,755	15,414	14,964
	C2	11,149	10,560	10,431
	R1	26,482	26,196	24,064
	R2	14,854	14,394	14,305
	RC1	16,337	16,125	16,021
	RC2	10,967	10,618	10,436
Random	C1	14,886	14,705 *	14,617 *	15,769	15,392	14,988
	C2	11,716	10,952 *	10,587 *
	R1	24,951	24,664	23,415
	R2	15,271 *	14,667 *	14,493 *
	RC1	16,456	16,299 *	16,173 *
	RC2	11,333 *	11,065 *	10,641 *

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Barros-Everett, T.; Montero, E.; Rojas-Morales, N. Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning. Appl. Sci. 2025, 15, 2946. https://doi.org/10.3390/app15062946

AMA Style

Barros-Everett T, Montero E, Rojas-Morales N. Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning. Applied Sciences. 2025; 15(6):2946. https://doi.org/10.3390/app15062946

Chicago/Turabian Style

Barros-Everett, Tomás, Elizabeth Montero, and Nicolás Rojas-Morales. 2025. "Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning" Applied Sciences 15, no. 6: 2946. https://doi.org/10.3390/app15062946

APA Style

Barros-Everett, T., Montero, E., & Rojas-Morales, N. (2025). Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning. Applied Sciences, 15(6), 2946. https://doi.org/10.3390/app15062946

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Prediction for Metaheuristic Algorithms Solving Routing Problem Instances Using Machine Learning

Abstract

1. Introduction

2. Background

2.1. Capacitated Vehicle Routing Problem with Time Windows

2.2. Literature Review

3. Methodology

3.1. Instance Feature Extraction Step

3.1.1. Problem Instances

3.1.2. Feature Metrics

Spatial Features

Clustering Features

Time Window Features

3.1.3. Analysis of Extracted Features

3.1.4. Exploratory Data Analysis Conclusions

3.2. Algorithms for Training and Prediction Steps

4. Experimental Setup

4.1. HGS

4.2. Metaheuristic Parameter Tuning Step Details

5. Results

Explainability Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI