Next Article in Journal
Osteoblastic Differentiation and Mitigation of the Inflammatory Response in Titanium Alloys Decorated with Oligopeptides
Next Article in Special Issue
An Efficient Multi-Objective White Shark Algorithm
Previous Article in Journal
The Effect of Lipopolysaccharides from Salmonella enterica on the Size, Density, and Compressibility of Phospholipid Vesicles
Previous Article in Special Issue
MSBKA: A Multi-Strategy Improved Black-Winged Kite Algorithm for Feature Selection of Natural Disaster Tweets Classification
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Q-Learning-Driven Butterfly Optimization Algorithm for Green Vehicle Routing Problem Considering Customer Preference

1
College of Artificial Intelligence, Guangxi Minzu University, Nanning 530006, China
2
Guangxi Key Laboratories of Hybrid Computation and IC Design Analysis, Nanning 530006, China
*
Author to whom correspondence should be addressed.
Biomimetics 2025, 10(1), 57; https://doi.org/10.3390/biomimetics10010057
Submission received: 7 January 2025 / Revised: 13 January 2025 / Accepted: 14 January 2025 / Published: 15 January 2025

Abstract

:
This paper proposes a Q-learning-driven butterfly optimization algorithm (QLBOA) by integrating the Q-learning mechanism of reinforcement learning into the butterfly optimization algorithm (BOA). In order to improve the overall optimization ability of the algorithm, enhance the optimization accuracy, and prevent the algorithm from falling into a local optimum, the Gaussian mutation mechanism with dynamic variance was introduced, and the migration mutation mechanism was also used to enhance the population diversity of the algorithm. Eighteen benchmark functions were used to compare the proposed method with five classical metaheuristic algorithms and three BOA variable optimization methods. The QLBOA was used to solve the green vehicle routing problem with time windows considering customer preferences. The influence of decision makers’ subjective preferences and weight factors on fuel consumption, carbon emissions, penalty cost, and total cost are analyzed. Compared with three classical optimization algorithms, the experimental results show that the proposed QLBOA has a generally superior performance.

1. Introduction

With the ongoing increase in market size, the logistics and transportation business’s position in social production activities has become more prominent. With the spread of the notion of green environmental protection, public awareness of greenhouse gas emissions in transportation has grown. However, in pursuing cost-cutting and efficiency improvements, logistics companies frequently overlook the possible environmental impact of their operations. Carbon and nitrogen oxide emissions, in particular, contribute to global climate change while also endangering human health and ecosystems. Thus, encouraging fuel-efficient vehicles and other emission-reducing technologies is essential to achieving the country’s low-carbon objective. In the area of urban distribution, daily traffic congestion in urban areas has a direct impact on the fuel consumption and carbon emissions of vehicles; in turn, this congestion state will raise vehicle fuel consumption and emissions, which will have an indirect impact on the financial and environmental costs associated with transportation [1].
The vehicle routing problem (VRP) refers to a distribution center providing services to customers with goods in demand and meeting their needs according to established routes within a specific range in order to achieve the goals of minimum cost, shortest distance, and minimum time consumption. The VRP seeks to minimize the total cost and trip distance. The green vehicle routing problem (GVRP) is a developing study area. It is defined by the optimization aims of lowering operating costs, reducing energy consumption and carbon emissions, and ensuring customer happiness while adhering to customer service needs and vehicle capacity limits. Economic and environmental benefits can be optimized by strategically planning vehicle departures, times, and routes [2]. The GVRP is a variation of the classic vehicle routing problem (VRP) that tries to lessen the environmental impact, such as fuel usage and carbon emissions generated by the allocation process. The VRP problem with time windows (VRPTW) considers customer demand and demands customer service be completed within a given period. Researchers have classified the GVRP as an NP-hard issue. Several academics have looked into the use of the GVRP to partially reduce fuel usage and carbon emissions. Metaheuristics were applied to overcome the fuel consumption problem. Internationally, research on the GVRP has yielded outstanding results. For instance, the pollution-routing problem (PRP), which considers the journey distance, greenhouse gas emissions, and travel time expenses, was first presented by Bektas et al. with the primary goal of lowering fuel consumption when a vehicle is operating. Furthermore, PRP mathematical models with and without time windows in mixed-integer form were established [3]. A thorough emission model was presented by Barth et al. to determine vehicle fuel usage [4]. Demir and Bektas, drawing from the work of Bektas et al., examined and contrasted a number of widely used emission models pertaining to fuel consumption and greenhouse gas emissions associated with road freight transportation, and then contrasted the model output with data from real-world road usage [5]. Laporte proposed an adaptive large neighborhood search algorithm to solve the PRP. It employed both novel and pre-existing deletion and insertion strategies to optimize the algorithm and enhance the quality of the solution, and a large number of experiments were conducted to confirm its efficacy [6]. To solve the multi-depot GVRP problem, Mehlawat et al. suggested a hybrid genetic algorithm and conducted experiments to confirm the technique’s robustness [7]. Previous studies have shown that carbon dioxide (CO2) emissions are directly proportional to fuel consumption, and this relationship is significantly affected by the speed of vehicles. Franceschetti et al. first proposed the time-dependent pollution-routing problem, which extended the PRP problem by explicitly considering traffic congestion scenarios. The integer linear programming formulation of the TDPRP has also been described in detail [8]. Cimen et al. proposed a heuristic algorithm based on approximate dynamic programming to solve the time-dependent green vehicle routing problem (TDGVRP) considering the time dependence and randomness of vehicle speeds in time-varying road networks [9]. Kazemian et al. took greenhouse gas emissions and fuel consumption as the optimization objectives of the TDGVRP and reduced the complexity of the solution by transforming the problem with time windows into a problem without time windows so as to reduce carbon emissions and fuel consumption while controlling the total cost [10]. Rui Qi employed the QMOEA algorithm, which is based on Q-learning, to address the time-dependent green vehicle routing problem with time windows (TDGVRPTW) [11]. With carbon emissions as the minimum objective, Prakash et al. established a GVRP model with time window constraints [12]. Wang et al. proposed a bi-objective model to minimize total carbon emissions and operating costs and simultaneously implemented piecewise penalty costs for early and late arrivals to reduce waiting times and improve customer satisfaction, which was solved by the multi-objective Particle Swarm Optimization (MOPSO) algorithm [13]. Wu et al. set the sum of the driving cost, fuel consumption cost, time window penalty cost, and fixed vehicle usage cost as the optimization objective and established a time-varying GVRP model with vehicle capacity and time window constraints [14]. Zhang et al. established a mixed-integer linear programming model considering time-varying traffic conditions, customer time windows, and vehicle energy consumption functions and solved it using several different metaheuristic algorithms, which is convenient for practical application [15].
Based on a complete understanding of the environmental and logistics system properties of urban areas, mathematical models are developed with the objective of minimizing the cost; the total cost includes the fixed cost of vehicles, fuel consumption cost, and carbon emission cost, as well as the penalty cost. This section explores the environmental challenges faced by road freight transport operators and proposes a systematic optimization approach. Without loss of generality, the improved butterfly optimization algorithm (BOA) is combined with Q-learning. The effectiveness of the proposed method is verified by several computational experiments. The results show that the proposed method effectively reduces the total allocation cost, reduces carbon emissions and fuel consumption, and avoids daily traffic congestion.
The butterfly optimization algorithm (BOA) is a novel metaheuristic optimization algorithm first proposed by Arora and Singh [16]. Smell is the sense that butterflies value most. The BOA offers the advantages of effectiveness and simplicity. The two stages of butterfly actions are separated by simple updating algorithms for both local search and global movement [17]. Moreover, the BOA can address issues instantly and requires fewer control parameters [18]. The algorithm’s advantages have spurred other academics to develop it. In order to address the problem of premature convergence, Elhoseny et al. used a COVID-19 dataset to assess the suggested approach after applying the BOA to a hybrid feature selection model [19]. The Flower Pollination Algorithm (FPA) and BOA were merged by Zhou et al. [20]. The multi-swarm binary BOA (MBBOA) was proposed by Nader et al. for the 0-1 MKP issue [21]. Mazaheri H. et al. introduced the concept of an intelligent throwing agent and proposed an efficient routing algorithm based on the BOA to calculate the shortest path and minimize the power of obstacle avoidance UAVs [22]. Chatterjee S. et al. proposed a population-based metaheuristic BOA to predict the secondary structure of ribonucleic acid (RNA) [23]. Bhanja S et al. proposed a fuzzy time-series optimization based on the BOA (FTSBO) algorithm to optimize all hyperparameters of the Type-2 fuzzy time-series (FTS) prediction method [24]. Alhassan A. M. et al. proposed a Gaussian Kernel Chaotic Butterfly Optimization Algorithm (TCBOAGK) to realize the automatic detection of affected areas in X-ray images [25]. Manickam S. et al. used the improved BOA (IBOA) for feature optimization and realized a convolutional neural network (CNN) fused with a long short-term memory network (LSTM) for voice identification [26].
Even though numerous academics have expanded the BOA procedure, it still has certain shortcomings. The traditional BOA cannot fully balance the global exploration and local exploitation in the iterative process and has the shortcomings of easily falling into local optima and having low solution accuracy and slow convergence speed [27,28]. To more effectively maintain the exploration and exploitation properties’ coordination and increase the optimization ability to avoid slipping into a local optimum, the BOA paired with Q-learning was developed, and this combination is known as the QLBOA.
The following is a summary of the principal achievements:
(1) The BOA was fused with reinforcement learning, the reinforcement learning mechanism was introduced, and the butterfly update strategy was fused with dynamic Gaussian mutation. The variance of random state–action pairs and Gaussian random numbers were used to balance the exploration and exploitation of the algorithm.
(2) Species migration and mutation strategies were introduced to enhance the population diversity of the algorithm.
(3) Eighteen challenging benchmark functions in high dimensions and the CEC2022 test suite were chosen for the preliminary detection of several performance characteristics, including accuracy, convergence, and statistics.
(4) The green vehicle routing problem with time windows that take into account consumer needs is solved using the QLBOA. This study examines the impact of decision makers’ subjective preferences and weight variables on various optimization targets, the overall cost, and the comparison with other traditional metaheuristic algorithms.
This is how the rest of the paper is organized: A brief introduction to the BOA, Q-learning, and the adaptive Gaussian mutation mechanism is provided in Section 2. The QLBOA is thoroughly introduced in Section 3. Section 4 provides an in-depth examination of the initial simulation outcomes for reference functions. The integration of the suggested method for the green vehicle routing problem with consumer-focused time frames is shown in Section 5. Conclusions and future works are presented in Section 6.

2. Hybrid Mechanism Butterfly Optimization Algorithm

2.1. Butterfly Optimization Algorithm

Simon Arora and Singh presented the initial proposal for the BOA [16]. As one of their most vital senses, smell is thought to be able to both assist butterflies in finding food and serve as a useful means of communication between them. Every butterfly has the capacity to emit a distinct scent that is associated with its fitness rating, according to the BOA. Put differently, the butterfly’s fitness changes with movement. The butterfly uses a technique called global search to locate another butterfly if it detects a stronger scent from that butterfly. The butterfly goes into a stage known as local search, where it moves randomly in an attempt to find more scents.
The principle of butterfly aroma generation is as follows:
f i = c I a
where f i defines the scent intensity function; c represents the sensory morphology coefficient; I represents the stimulus intensity, which is the function’s fitness value; and a is a random number and denotes the intensity coefficient in the [0, 1] range.
The global search phase is shown in Equation (2):
x i t + 1 = x i t + r 2 × g * x i t × f i
where x i signifies the solution vector for the i th person in the tth iteration, r is a random number in [0, 1], g * represents the optimal solution among all solutions in the current iteration, and f i represents the amount of scent created by the ith individual.
When a butterfly cannot identify the scent of another butterfly inside the search region, it will act haphazardly. We refer to this stage as the local search phase. The local search phase makes use of the following mathematical model:
x i t + 1 = x i t + r 2 × x j t x k t × f i
where x j and x k represent the solution vectors of the j th and k th individuals, respectively. The global optimization and local search procedures are governed by a fixed switching probability, p 0 , 1 .
c t + 1 = c t + 0.025 c t × max iter
The pseudocode for the BOA is displayed as follows Algorithm 1:
Algorithm 1 BOA
Initialize parameters and generate the initial population of N butterflies.
Calculate the fitness and choose the best solution.
While stopping criteria are not met, do
      For each butterfly in the population, do
            Generate fragrance using Equation (1).
      End for
      Calculate the fitness and choose the best individual.
      For each butterfly in the population, do
            Set r in [0, 1] randomly.
            If r < p, then
                  Update position using Equation (2)
            Else
                  Update position using Equation (3).
            End if
      End for
End while
Output the best solution.

2.2. Q-Learning

One of the best techniques for reinforcement learning is Q-learning, which Watkins [29] introduced. Reinforcement learning, sometimes referred to as evaluation learning and reinforcement learning, is a significant machine learning technique with several applications in the domains of analysis, prediction, and intelligent control robots [30]. The foundation of Q-learning is the idea of reward and punishment, whereby the environment provides the learner with an appropriate response as soon as a state shifts. The current state changes to the subsequent state upon completion of an activity. The Q-table, which is represented as Q(s, a), where s is the state and a is the action, indexes a Q-value as the cumulative reward using a state–activity pair. With respect to a particular state–action pair, the reward/penalty is dynamically updated in the Q-table. Temporal difference learning includes Q-learning. The algorithm simulates an episode, integrates dynamic programming and the Monte Carlo MC algorithm [31], and then estimates the value of the state prior to execution based on the value of the new state following one or more action steps.
To represent the update of a Q-table in Q-learning, we use the following formula:
Q ( t + 1 ) ( S t , a t ) = Q ( S t , a t ) + a t [ r t + γ × max ( Q t ( S t + 1 , a t + 1 ) ) Q t ( S t , a t ) ]
where α is the discount factor within [0, 1], r is the reward/penalty, and the value is given in Equation (6). The learning rate γ varies within [0, 1], and the value is shown in Equation (7).
r t = 1 reward 1 punish
α = 1 0.9 × t M a x I t e r
Exploration or exploitation is to be performed according to parameter α’s value. All defined states are investigated if the value is near 1, as this indicates that the most recent data were obtained with greater priority. Conversely, the present data are given more weight if the value is near 0. We put the value at 0.8 [32].

2.3. The Adaptive Gaussian Mutation Mechanism

The value of the parameter determines whether exploration or exploitation will be carried out. If the value is close to one, then the most recent data are used in the first iteration of the algorithm, which is required to explore a wide range, and in the second iteration, it is necessary to increase the optimization accuracy and avoid premature optimization. As a result, the variance of a Gaussian random number can be utilized to balance algorithm exploration with development. The figure shows that the chance of the function’s value being at the mean is inversely related to the amount of variance. In the early iteration of the algorithm, a large value is assigned according to Equation (8) so that the butterfly can explore a large range. In the later iteration, the value is adaptively reduced so that the butterfly search range is more accurate, the optimization accuracy is enhanced, and the local optimum is avoided. The effect of the standard deviation on the function is shown in Figure 1.
The Gaussian probability density function can be expressed as follows:
f ( x ) = 1 2 π δ e ( x μ ) 2 2 δ 2
where μ is the mean value, and δ is the standard variance.
δ ( t ) = δ max ( δ max δ min ) × I t e r c u r r e n t I t e r max
where δ max is the maximum value of variance, δ max is the minimum value, I t e r c u r r e n t is the current iteration number, and I t e r max is the maximum iteration number.

3. Butterfly Optimization Algorithm with Q-Learning (QLBOA)

Due to the early convergence and tendency to fall into the local optimum of the classic BOA, this section suggests an improved BOA based on reinforcement learning. The population update in the first iteration updates each individual position using the two fundamental BOA update algorithms. The approach where the butterfly individual learns from the optimal individual improves the algorithm’s exploration and exploitation capabilities. Reinforcement learning is performed once each for both local and global searches. Following constant iteration, the algorithm uses the methods of species migration and mutation to improve the performance of rapid convergence when it reaches a local extremum.

3.1. Move Formulation Incorporating Gaussian Mutation

The global update formula of the BOA can be expressed as follows:
x i t + 1 = x i t × β + ( r 2 × g x k t ) × f i
The local update formula is as follows:
x i t + 1 = x i t + ( r 2 × x j t x k t × β ) × f i
In the above two equations, β is n o r m r n d ( 0 , δ ( t ) ) , that is, a random number with mean 0 and variance δ ( t ) .

3.2. Update Strategy of Reinforcement Learning

Combining the BOA and Q-learning algorithm, the QLBOA is proposed. The Q-learning algorithm replaces the control parameters of the BOA, that is, p, a, and c, with state–action pairs to indicate the selection of Q-values. The BOA has two stages—global search and local search—and the parameters of the algorithm determine which stage is carried out. The Q-table of the QLBOA is constructed as a 2 × 2 matrix, where the rows represent the states (st) and the columns represent the actions for each state (at), as shown in Figure 2.
The behavior of the Q-learner and the state to be learned are randomly chosen at the beginning of the search. The actions are determined and the state is updated according to the Q-table, which means that BOA search operators are selected based on past performance. Figure 2 shows the Q-table mapping of the QLBOA along with a numerical illustration. r = 1, at = 0.69, γ = 0.80, and the current state–action pair is the st global search operator, and at is the local search machine operator. With Q s t , a t = 0.95 , we update the new value of a in the Q-table according to the following equation:
Q ( t + 1 ) ( s t , a t ) = 0.95 + 0.69 [ 1 + 0.8 × max ( 0.50 , 0.78 ) 0.95 ] = 1.41

3.3. Migration and Mutation Mechanisms

Inspired by the relationship between species migration and mutation, butterfly populations can evolve to a stable state without considering different species. However, when the number of predators or food changes, butterfly populations will also adjust in time by migrating or undergoing genetic changes to adapt to environmental changes. As shown in Figure 3, the maximum migration rate is I when no species are present in the habitat. As the number of species increases, the habitat becomes occupied, the chance of the successful survival of the migrants decreases, and the rate of migration decreases. The immigration rate becomes zero at S max . When the habitat contains no species, the emigration rate is zero. As the number of species increases, the habitat is crowded and more species can escape from the habitat; therefore, the emigration rate increases. The maximum value of the emigration rate is E. When the space contains species with equilibrium numbers, the two rates are equal at S 0 .
The equations of migration rate µk and migration rate λk for k species are as follows:
μ k = E × n N
λ k = I × 1 n N
where n is the current number of butterflies, N is the maximum number of butterflies allowed, E is the maximum emigration rate, and I represents the maximum immigration rate.
The mutation mechanism improves the mining ability of the algorithm and maintains the diversity of the population as much as possible. The definition of this component is as follows:
m n = M × 1 p n p max
where M is the defined mutation, and pn is the mutation probability of the nth butterfly.
After calculating the fitness of each butterfly, the emigration rate, the immigration rate, and the mutation rate are updated. Non-elite individuals migrate and mutate according to these rates. The predefined best individual is saved as the elite of the next generation. Elitism prevents the best solution from being destroyed by migration. Figure 4 presents the proposed Algorithm 2 flow diagram.
Algorithm 2 QLBOA
Generate the initial population of N butterflies.
Calculate the fitness of each search agent.
Sort the fitness and choose the best solution.
While t < 80% of the maximum number of iterations, do
      For each butterfly in the population, do
                      Calculate fragrance
                      set Q(st, at) = 0
      End for
      For each butterfly in the population, do
                Select action and state randomly.
                Select the best action at from the Q-table.
                If action == global search mechanism. then
                      Update position using Equation (6)
                Else
                      Update position using Equation (7)
                End if
                Evaluate the butterfly individual and update
            End for
End while
While 80% of the maximum number of iterations <= t < maximum number of iterations, do
            For each butterfly in the population, do
                  Calculate the fitness value and choose the elites.
                  Perform migration and mutation operations.
            End for
Calculate the fitness of each search agent.
Sort the fitness and choose the best solution.
End while
Output the best solution.

4. Simulation Experiments

4.1. Experimental Setup

The QLBOA’s efficacy was initially evaluated using eighteen complex functions in high dimensions. Table 1 details the parameters included in the comparison algorithms. In Table 2, the test functions are displayed. The suggested algorithm is contrasted with five traditional metaheuristics, namely, PSO, GA, ABC, DE, and BOA, and three BOA variants: IBOA [33], CBOA [34], and HPSOBOA [34].
The Table 3 CEC2022 optimization function test set has a total of 12 single-objective test functions with boundary constraints [35]. These are unimodal functions (F1), multimodal functions (F2–F5), hybrid functions (F6–F8), and combined functions (F9–F12). The test dimensions are 2, 10, and 20 dimensions, the same as the CEC2020 optimization function test set. All the test functions solve minimization problems. The CEC2022 optimization function test set is one of the latest test sets at present. The results are categorized using seven well-known algorithms: Differential Evolution (DE), Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Artificial Bee Colony Algorithm (ABC), Sine Cosine Algorithm (SCA), Seagull Optimization Algorithm (SOA), and BOA.
Utilizing MATLAB R2019 (b), the method was applied. Experiments were carried out on a Windows 10 PC with an Intel (R) Core (TM) i5 CPU clocked at 3.30 GHz. There were exactly 30 butterflies in the population. Every technique employed in this study was run 30 times on its own, for a total of 500 iterations across all methods. To account for contingencies, all algorithms were run 30 times separately on each function, and the average minimal error (mean error) and standard deviation of the 30 runs were recorded. The first measures the algorithm’s convergence accuracy, while the second reflects its stability.

4.2. Analysis and Discussion of 18 Benchmark Functions’ Outcomes

This section compares the QLBOA on 18 benchmark functions with three BOA versions and five traditional metaheuristics. The algorithm’s stability is represented by the standard deviation, while its optimization accuracy is represented by the mean. The original table was too broad; therefore, the pertinent findings are displayed in two separate tables. Table 4 reveals what happened when the recommended strategy was compared to the other six algorithms. The results obtained using the QLBOA are the optima on most functions, with the exception of the cases on F7, F8, F17, and F18. Even more pleasantly, the QLBOA hits the optimal theoretical values on F1–F3 and F9–F15. The table’s findings demonstrate that, in terms of accuracy and optimization power, the enhanced algorithm performs better than the original BOA. The comparison findings between the three versions are shown in Table 5. It mostly illustrates the algorithm’s capacity for development with regard to unimodal functions. The multimodal functions are suitable for testing the local and global search capabilities. The convergence accuracy of the QLBOA on F7, F8, and F17 is lower than that on ABC, PSO, and GA, respectively. The convergence accuracy of the QLBOA on the remaining 15 functions is excellent, which shows that the adaptive Gaussian variation strategy can effectively improve the local development ability and improve the convergence accuracy of the algorithm.
Figure 5 plots the convergence curves of different algorithms on nine benchmark functions, including F2, F7, F9, F11, F12, and F14. In general, the convergence of the QLBOA demonstrates that its convergence speed is faster and that its convergence accuracy is higher—even with the same iteration. The QLBOA technique converges swiftly in 20–30 generations and delivers the theoretical best value for the multimodal F12. Even though the QLBOA falls into the local optimum on F18 rather than the ideal value, it nevertheless exhibits a comparatively fast convergence rate. Figure 6 demonstrates that the QLBOA has the best stability, the smallest error, and the optimal difference when it comes to addressing the benchmark function problem.
Additional statistical work has been performed to support the QLBOA, including the Wilcoxon rank-sum test. Table 6 shows that the majority of the p-values are significantly less than 0.05, suggesting that the QLBOA and the other six algorithms differ significantly from one another.

4.3. Analysis and Discussion of CEC2022 Outcomes

CEC2022 was selected for the comparative test experiment in this section. D is the dimension size; therefore, D = 10. Table 7 displays the results of CEC2022 from the experiments. It can jump out of local optima in larger dimensions and has a good global search capacity. The mixed functions have a lot of extreme points, most of which indicate how effectively the algorithm can balance the stages of development and search. It is evident that most QLBOA results are closer to the optimal value and more accurate. The QLBOA offers great convergence speed and accuracy on the unimodal function F1, while other methods have poor convergence speed and low accuracy. None of the comparison algorithms were able to determine the exact solution on F6 except for the QLBOA. The QLBOA can easily enter the local optimum value on F9 and F11, much like other algorithms can, but in the middle and late iterations, it can successfully exit the local optimum. This is mostly due to the population’s implementation of the adaptive Gaussian mutation strategy and migration and mutation following each generation of evolution, which can enhance population quality and hasten population convergence.
The information in the table shows that the QLBOA has certain benefits when it comes to solving hybrid functions. The optimization capabilities of the comparison algorithms are displayed individually in Figure 7. The comparison algorithms’ inaccuracy is displayed in Figure 8. It is easier to compare each method’s performance. It shows that the QLBOA and DE perform better and are more stable than the other algorithms. The comparative experiment conducted on the CEC2022 suite has demonstrated that the QLBOA offers outstanding flexibility and global search capabilities. In addition to improving the individual quality of the population, the migration and variation strategy can further expand the optimization range of the population, maintain the diversity of the population, and help the algorithm to escape local optima and find other areas where there may be excellent solutions. The reinforcement learning mechanism can maintain the balance between the exploration and development capabilities of the algorithm and improve the optimization performance of the QLBOA. In addition, the population is able to broaden its search area, effectively exit the local optimum in the local scope, and assist in locating the region where the optimal solution is most likely to exist and converge swiftly. The QLBOA is an advanced algorithm that should be adopted because it has faster convergence speed, higher optimization accuracy, and more stability than the other nine comparison algorithms on the chosen CEC2022 suite, according to an analysis of the convergence curve and the results of the aforementioned numerical experiments. A comparison of the Wilcoxon rank-sum test findings is displayed in Table 8.
Additional statistical work was conducted to support the QLBOA, including the Wilcoxon rank-sum test. Table 8 shows that the optimization performance is compare with other methods and that there is little discernible difference between the proposed algorithm and DE and ABC for F3, F7, F9, and F11. The majority of the p values are significantly smaller than 0.05, suggesting that the QLBOA method and the other six algorithms differ from one another.

4.4. Computational Complexity of BOA and QLBOA

The population’s initialization is one of the key processes in the BOA: O N . The optimal population is selected by classifying the starting population’s fitness: O N 2 . Updating the population occurs throughout the exploration phase: O N . During the exploration phase, the updating population is O 2 N . The fitness value is calculated, and the best option is selected: O N 2 . The temporal complexity of the BOA is 2 × O N + O N 2 + O 2 N + O N 2 .
In the QLBOA, the population’s initialization is determined by its size: O N . The starting population’s fitness is determined: O N . The reinforcement learning strategy is applied to the whole population: O 2 N . Updating of the population occurs throughout the exploration phase: O N + 2 N 2 . The best option is selected after classifying the population: O N 2 . The fitness value is calculated, and the best option is selected: O N + N 2 . The QLBOA’s temporal complexity is O N + O N + O N 2 + O 2 N + O N + 2 N 2 + O N + N 2 .

5. The QLBOA Solves the Green Vehicle Routing Problem Considering Customer Preferences

5.1. Description of the Vehicle Routing Problem

This section focuses on freight distribution and introduces the QLBOA strategy to address the green vehicle routing problem with time windows. It combines the time-dependent speed and piecewise penalty cost for early and late deliveries. The goal is to establish a cooperative relationship that allows idle vehicles to serve customers at any facility, transforming soft time windows into piecewise penalty costs based on arrival time and customer characteristics. Customers are categorized as loyal or general, with delayed delivery to loyal customers assumed to have a greater negative impact on firm stability than other companies. The operating costs include transportation, fuel, and penalty costs. Transportation resources in the same warehouse are shared to improve resource management, optimize routes, and reduce environmental impacts. A mixed-integer mathematical model is developed for carbon emission and cost minimization considering the time dependence of vehicle speed and piecewise penalty costs. Two types of penalties are used: a constant waiting penalty for early arrivals that do not affect on-time delivery, and a variable delay penalty for late arrivals (Figure 9).
The location distribution of customers is different, and the service time window is also different. After a customer places an order, the historical order demand is known, but the actual customer demand is unknown. After the vehicle loads enough goods, it will deliver the order to each customer according to the established route. The vehicle can arrive at the customer point in advance but cannot be late. The service must be carried out within the time window.
The whole distribution process should minimize the total distribution cost of the distribution center. The research problem model in this section needs to match the real situation. In order to better present the model, the following assumptions are put forward:
(1)
Customer requirements are independent of each other, and they will be updated only after the vehicle arrives at the customer point;
(2)
Vehicles depart from and return to the distribution center;
(3)
Vehicle use has a transportation cost, fuel cost, and penalty cost;
(4)
The quantity of goods delivered can meet the predicted demand and actual demand of customers.

5.2. Problem Model

Model symbols and parameter settings see Table 9.

5.2.1. Soft Time Window

After setting the soft time window, the vehicle needs to provide service within the agreed time. If it does not arrive within the specified time, it can still provide service to the customer, but it will be punished according to the degree of early arrival or late arrival, and a certain penalty cost needs to be paid [44]. The relation between the penalty cost and time is shown in Figure 10.

5.2.2. Vehicle Speed

As shown in Figure 11, a normal distribution of traffic density is assumed, taking into account the transition from a condition of free flow to one of extreme congestion. Since severe congestion may cause a full stop, each vehicle’s speed is defined to follow a normal distribution, with vmin = 0 m/s; the maximum speed permitted by traffic regulations is vmax. A vehicle is not allowed to travel on a specific road (i, j) in the same amount of time under particular traffic circumstances. Consequently, the total fuel consumption on (i, j) is determined by adding the fuel consumption at each time period. Within the tn time interval [tn−1, tn], the vehicle speed is represented by the following equation:
υ i j k t n = υ t n + υ t n 1 2

5.2.3. Fuel Consumption

Vehicle carbon emissions are calculated, and the fuel consumption rate of vehicle k on (i, j) can be regarded as the carbon emission rate of the vehicle. Assuming that x kg carbon is emitted from no gasoline, the fuel consumption rate of vehicle k on the road section (i, j) is
f i j k = c i j k x
where c represents the carbon emission rate (kg/km), and the expression is as follows:
c i j k = φ υ ψ 1000
The carbon emission rate of vehicle k in g/km is
φ υ = ω 0 + ω 1 υ + ω 2 υ 2 + ω 3 υ 3 + ω 4 υ + ω 5 υ 2 + ω 6 υ 3
Here, φv is the carbon emission rate at velocity v, and ω0 to ω6 are constants. The carbon emission rate load correction factor is
ψ = χ 0 + χ 1 γ + χ 2 γ 2 + χ 3 γ 3 + χ 4 υ + χ 5 υ 2 + χ 6 υ 3 + χ 7 υ
Here, γ represents the ratio of the actual load and capacity of the vehicle on the road section (i, j), and χ0 to χ7 are constants.

5.2.4. Penalty Costs

Delivery vehicles arriving from I before the earliest time restriction may cause needless waiting before offloading, which might potentially affect other customers’ punctual deliveries, resulting in delays and lower customer satisfaction. If service is started at node j within the permitted time frames, there is no penalty. Thus, the distribution company can optimize service dependability and customer happiness by following the client’s time interval. By modifying vehicle departure times, a vehicle routing solution that satisfies these requirements can be used as a model for efficiently cutting operating expenses and preventing traffic jams. A lateness penalty factor δlj is applied in the event that the delivery vehicle comes over the closest time restriction.
μ j = δ e [ E T j t i j k t i ] t i j + t i < E T j 0 E T j < t i j k + t i k < L T j δ l j [ t i j k + t i k L T j ] t i j k + t i > L T j

5.3. Objective Function

min   F 1 x , y , z , λ 1 , λ 2 , λ 3 = 1 λ 1 C k i N j N k K x i j k d i j + C v k K z k + 1 λ 2 C f i N j N k K x i j k f i j k d i j + C e i N j N k K x i j k e i j k d i j + 1 λ 3 j N k K μ j y j k z k
s . t . Pr i N q i y i j Q ε , k K
j N x 0 j k = j N x j 0 k , k K
j N x 0 j k 1 , k K
j N x j 0 k 1 , k K
k K i N x i l k = k K j N x l j k K , l N
i N k K q 0 j k + x 0 j k i N q ^ i
k K y i k = 1 , i N
j N x i j k = y i k , i N ,   k K
i N y i k = z k , k K
t i j k = d i j / υ k , k K , i , j N
x i j k 0 , 1 , i , j N , i j , k K
x i j k 0 , 1 , i , j N , i j , k K
Equation (22) represents the optimization of the vehicle transportation cost, fuel cost, and penalty cost with weighting factors λ1, λ2, λ3 ∈ [0, 1].
Equation (23) ensures that the probability of customer demand being less than the load in the vehicle distribution path is greater than a specified confidence level.
Equations (24)–(26) represent the departure and return of vehicles from/to the distribution center.
Equation (27) indicates that the number of vehicles passing through the distribution center or a customer point is consistent and does not exceed a maximum limit.
Equation (28) ensures the satisfaction of all customer requirements.
Equation (29) ensures that each customer is served by one vehicle only.
Equations (30) and (31) represent relationships between 0 and 1 variables.
Equation (32) represents travel time for vehicle k from customer point i to j.
Equations (33) and (34) represent 0–1 decision variables.

5.4. The Flow of the QLBOA to Solve the Problem

The upper and lower boundaries of ϕ determine where the initial search agent is located in this subsection. The placement of the QLBOA agent is determined by setting upper and lower limit values. During this stage, the search agent’s position is confirmed and its duplicate number is checked. Guidelines for converting LRVS from search agent locations (continuous data) to trip sequences (discrete values) are obtained in [5,45]. In combinatorial problems, LRV is a frequently used technique for converting continuous to discrete values. As seen in Figure 12, each search agent’s position data are sorted in this phase from the maximum to the minimum. The search agent location cannot be applied when visiting the sequence/route of vehicles at the same two locations.

5.5. Datasets and Parameter Settings

The QLBOA was used to solve the green vehicle routing problem with time windows to minimize carbon emissions and operating costs, and the simulation experiment was carried out by using the examples in the Solomon VRPTW standard test library [46]. The calculation examples in the standard test library include six data types: random distributions (R1, R2), clustering distributions (C1, C2), and random and clustering mixture distributions (RC1, RC2). In this section, two datasets are randomly selected from various data types to test the influence of the algorithm. Table 10 shows the various parameter settings for the problem. The QLBOA process for solving the green vehicle routing problem with time windows is shown in Figure 13.

5.6. Response Analysis

5.6.1. Analysis of the Influence of Decision Makers’ Subjective Preferences on Goals

The subjective preferences of decision makers determine the utilization rate of vehicles, and the subjective preference value is usually between 0 and 1. With the increase in the subjective preference value, the customer’s requirements for the quality of the service received will be higher and higher; otherwise, more attention will be paid to the utilization rate of vehicles so as to reduce the use cost of vehicles. This subsection keeps the other parameters unchanged and analyzes the changes in the transportation cost, fuel cost, and penalty cost when ε changes to three values: 0.2, 0.6, and 0.8. Meanwhile, the number of vehicles used is counted. It can be seen in Table 11 that when ε is small, although the number of vehicles used in the planned route is small, the fuel cost and vehicle use cost are high, and the penalty cost in the transportation process is high. When ε is 0.6, the cost of most routes reaches the minimum. When ε is 0.8, the subjective preference of decision makers increases, the service process gives priority to customer satisfaction, and the penalty cost decreases, but the transportation cost and fuel cost recover, and the number of vehicles used reaches the highest value. As shown in Figure 14, the lowest total cost is shown with a green cylinder, and as the subjective preference increases, the total service cost shows a trend of first decreasing and then increasing. Based on the analysis of the graph, ε is set to 0.6. The figure shows that the total cost of the example of type C is relatively low, while R and RC are relatively high. This is due to the fact that the customer locations in the example of type C are clustered and concentrated, and the vehicle does not need to cross long distances to complete the service of multiple customers in a small range. The increase in vehicles in the service process will increase the transportation cost and fuel cost, and the total cost will also increase.

5.6.2. Analysis of the Impact of Weight Factors on the Target

When λ1 = 0.6, λ2 = 0.3, and λ3 = 0.1, the model focuses on transportation costs, followed by fuel costs, and does not fully consider customer satisfaction. When λ1 = 0.1, λ2 = 0.6, and λ3 = 0.3, the model focuses on fuel consumption and carbon emissions in the service process and considers environmental protection an important task. When λ1 = 0.3, λ2 = 0.1, and λ3 = 0.6, the service focuses on customer demand and strives to minimize the penalty cost in the service process. The results of various types of costs for the above three cases are shown in Table 12. Taking transportation cost as an example, the cost obtained by adopting the first scheme is the lowest in all kinds of examples, and the difference between the first scheme and the highest cost is at most 12.53%. For the fuel cost, the cost of adopting the second scheme is the lowest, and the difference from the highest cost is up to 11.24%. The service penalty cost is lowest when customer requirements are focused on. The total generation cost for each case is shown in Figure 15. When customer satisfaction is not fully considered, the total cost can reach the lowest value. The reason is that the penalty factor in the calculation formula of the penalty cost is 100, and the opening-to-ending span of the time window is small, so the penalty cost is far less than the transportation cost and the fuel cost. After adding the weight factor, the penalty cost has a smaller impact on the total cost. For the model proposed in this section, it is necessary to comprehensively consider the service path, not only considering the economic cost and carbon emissions but also based on the brand image to improve customer satisfaction. Therefore, the weight factor is introduced. Based on the above analysis, the set of weight values with the lowest total cost is selected as the final plan, and the path is shown in Figure 16.

5.6.3. Comparison with Other Algorithms

Table 13 shows the comparison results between the QLBOA and the other three metaheuristics. The improved algorithm achieves small values for transportation cost, fuel cost, and penalty cost. Compared with the basic BOA, the effect of the QLBOA’s optimization objective is greatly improved. As shown in Figure 17, the average of the total cost is calculated after each algorithm is run 10 times. In examples C202 and RC102, the total generation cost of the QLBOA is greater than that of the GA and ACO, respectively, and in other examples, the QLBOA achieves the greatest effect.

6. Conclusions and Future Work

This work proposes the butterfly optimization algorithm with Q-learning (QLBOA). This paper is innovative in that it introduces a reinforcement learning mechanism, combines the butterfly optimization algorithm with dynamic Gaussian variation, integrates the butterfly renewal strategy with the strategy for species migration and mutation, and increases the diversity of the algorithm population. Premature convergence is successfully avoided, mutual benefits are realized, and the global optimal value is solved more quickly in the search space. It was tested by solving the CEC2022 test suite and eighteen single-objective benchmark functions, as well as a range of traditional metaheuristic algorithms. Lastly, the fuel consumption, carbon emissions, and penalty costs of vehicles in the process of delivering goods and serving customers were adopted as the optimization objectives, and the QLBOA was employed to solve the green vehicle routing problem, taking customer wants into consideration. An analysis was performed on the impact of decision makers’ subjective preferences and weight factors on various optimization costs as well as total costs. Carbon emissions rise in response to an increase in the penalty cost weight factor, which is counterproductive to environmental preservation. Ultimately, the results demonstrate that the QLBOA technique suggested in this study has certain advantages when compared with three heuristic optimization strategies. Consequently, the QLBOA exhibits considerable potential for use in the industrial control domain. Furthermore, the QLBOA can be expanded to address multi-objective scenarios in the resolution of progressively complex engineering optimization problems.

Author Contributions

W.M.: experimental results analysis. Y.H.: methodology, writing—original draft. Y.Z.: writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant Nos. U21A20464 and 62066005.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article. Further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Pepin, A.-S.; Desaulniers, G.; Hertz, A.; Huisman, D. A comparison of five heuristics for the multiple depot vehicle scheduling problem. J. Sched. 2009, 12, 17–30. [Google Scholar] [CrossRef]
  2. Ye, C.; He, W.; Chen, H. Electric vehicle routing models and solution algorithms in logistics distribution: A systematic review. Environ. Sci. Pollut. Res. 2022, 29, 57067–57090. [Google Scholar] [CrossRef]
  3. Bektaş, T.; Laporte, G. The pollution-routing problem. Transp. Res. Part B Methodol. 2011, 45, 1232–1250. [Google Scholar] [CrossRef]
  4. Barth, M.; Boriboonsomsin, K. Real-World Carbon Dioxide Impacts of Traffic Congestion. Transp. Res. Rec. J. Transp. Res. Board 2008, 2058, 163–171. [Google Scholar] [CrossRef]
  5. Demir, E.; Bektaş, T.; Laporte, G. A comparative analysis of several vehicle emission models for road freight transportation. Transp. Res. Part D Transp. Environ. 2011, 16, 347–357. [Google Scholar] [CrossRef]
  6. Demir, E.; Bektaş, T.; Laporte, G. An adaptive large neighborhood search heuristic for the Pollution-Routing Problem. Eur. J. Oper. Res. 2012, 223, 346–359. [Google Scholar] [CrossRef]
  7. Mehlawat, M.K.; Gupta, P.; Khaitan, A.; Pedrycz, W. A hybrid intelligent approach to integrated fuzzy multiple depot capacitated green vehicle routing problem with split delivery and vehicle selection. IEEE Trans. Fuzzy Syst. 2019, 28, 1155–1166. [Google Scholar] [CrossRef]
  8. Franceschetti, A.; Honhon, D.; Van Woensel, T.; Bektaş, T.; Laporte, G. The time-dependent pollution-routing problem. Transp. Res. Part B Methodol. 2013, 56, 265–293. [Google Scholar] [CrossRef]
  9. Çimen, M.; Soysal, M. Time-dependent green vehicle routing problem with stochastic vehicle speeds: An approximate dynamic programming algorithm. Transp. Res. Part D Transp. Environ. 2017, 54, 82–98. [Google Scholar] [CrossRef]
  10. Kazemian, I.; Rabbani, M.; Farrokhi-Asl, H. A way to optimally solve a green time-dependent vehicle routing problem with time windows. Comput. Appl. Math. 2018, 37, 2766–2783. [Google Scholar] [CrossRef]
  11. Qi, R.; Li, J.Q.; Wang, J.; Jin, H.; Han, Y.Y. QMOEA: A Q-learning-based multiobjective evolutionary algorithm for solving time-dependent green vehicle routing problems with time windows. Inf. Sci. 2022, 608, 178–201. [Google Scholar] [CrossRef]
  12. Prakash, R.; Pushkar, S. Green vehicle routing problem: Metaheuristic solution with time window. Expert Syst. 2022, 41, 13007. [Google Scholar] [CrossRef]
  13. Wang, Y.; Assogba, K.; Fan, J.; Xu, M.; Liu, Y.; Wang, H. Multi-depot green vehicle routing problem with shared transportation resource: Integration of time-dependent speed and piecewise penalty cost. J. Clean. Prod. 2019, 232, 12–29. [Google Scholar] [CrossRef]
  14. Wu, D.; Wu, C. Research on the Time-Dependent Split Delivery Green Vehicle Routing Problem for Fresh Agricultural Products with Multiple Time Windows. Agriculture 2022, 12, 793. [Google Scholar] [CrossRef]
  15. Zhang, S.; Zhou, Z.; Luo, R.; Zhao, R.; Xiao, Y.; Xu, Y. A low-carbon, fixed-tour scheduling problem with time win-dows in a time-dependent traffic environment. Int. J. Prod. Res. 2023, 61, 6177–6196. [Google Scholar] [CrossRef]
  16. Arora, S.; Singh, S. Butterfly optimization algorithm: A novel approach for global optimization. Soft Comput. 2019, 23, 715–734. [Google Scholar] [CrossRef]
  17. Sharma, S.; Saha, A.K.; Nama, S. An enhanced butterfly optimization algorithm for function optimization. In Soft Computing: Theories and Applications: Proceedings of SoCTA 2019; Springer: Singapore, 2020; pp. 593–603. [Google Scholar]
  18. Fathy, A. Butterfly optimization algorithm based methodology for enhancing the shaded photovoltaic array extracted power via reconfiguration process. Energy Convers. Manag. 2020, 220, 113115. [Google Scholar] [CrossRef]
  19. El-Hasnony, I.M.; Elhoseny, M.; Tarek, Z. A hybrid feature selection model based on butterfly optimization algorithm: COVID-19 as a case study. Expert Syst. 2022, 39, e12786. [Google Scholar] [CrossRef]
  20. Wang, Z.; Luo, Q.; Zhou, Y. Hybrid metaheuristic algorithm using butterfly and flower pollination base on mutual-ism mechanism for global optimization problems. Eng. Comput. 2021, 37, 3665–3698. [Google Scholar] [CrossRef]
  21. Shahbandegan, A.; Naderi, M. A binary butterfly optimization algorithm for the multidimensional knapsack problem. In Proceedings of the 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran, 23–24 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
  22. Mazaheri, H.; Goli, S.; Nourollah, A. Path planning in three-dimensional space based on butterfly optimization algorithm. Sci. Rep. 2024, 14, 2332. [Google Scholar] [CrossRef]
  23. Chatterjee, S.; Debnath, R.; Biswas, S.; Bairagi, A.K. Prediction of RNA Secondary Structure Using Butterfly Optimization Algorithm. Hum.-Centric Intell. Syst. 2024, 4, 220–240. [Google Scholar] [CrossRef]
  24. Bhanja, S.; Das, A. An air quality forecasting method using fuzzy time series with butterfly optimization algorithm. Microsyst. Technol. 2024, 30, 613–623. [Google Scholar] [CrossRef]
  25. Alhassan, A.M. Thresholding Chaotic Butterfly Optimization Algorithm with Gaussian Kernel (TCBOGK) based seg-mentation and DeTrac deep convolutional neural network for COVID-19 X-ray images. Multimed. Tools Appl. 2024, 1–24. [Google Scholar]
  26. Gade, V.S.R.; Manickam, S. Speaker recognition using Improved Butterfly Optimization Algorithm with hybrid Long Short Term Memory network. Multimed. Tools Appl. 2024, 83, 73817–73839. [Google Scholar] [CrossRef]
  27. Sharma, S.; Saha, A.K. m-MBOA: A novel butterfly optimization algorithm enhanced with mutualism scheme. Soft Comput. 2020, 24, 4809–4827. [Google Scholar] [CrossRef]
  28. Long, W.; Jiao, J.; Liang, X.; Wu, T.; Xu, M.; Cai, S. Pinhole-imaging-based learning butterfly optimization algorithm for global optimization and feature selection. Appl. Soft Comput. 2021, 103, 107146. [Google Scholar] [CrossRef]
  29. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  30. Sinha, D.; Chakrabarty, S.P. A review of efficient Multilevel Monte Carlo algorithms for derivative pricing and risk management. MethodsX 2023, 10, 102078. [Google Scholar] [CrossRef]
  31. Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
  32. Agushaka, J.O.; Ezugwu, A.E.; Abualigah, L.; Alharbi, S.K.; Khalifa, H.A.E.-W. Efficient Initialization Methods for Population-Based Metaheuristic Algorithms: A Comparative Study. Arch. Comput. Methods Eng. 2023, 30, 1727–1787. [Google Scholar] [CrossRef]
  33. Yuan, Z.; Wang, W.Q.; Wang, H.Y.; Khodaei, H. Improved Butterfly Optimization Algorithm for CCHP driven by PEMFC. Appl. Therm. Eng. 2020, 173, 114766. [Google Scholar]
  34. Zhang, M.; Long, D.; Qin, T.; Yang, J. A Chaotic Hybrid Butterfly Optimization Algorithm with Particle Swarm Optimization for High-Dimensional Optimization Problems. Symmetry 2020, 12, 1800. [Google Scholar] [CrossRef]
  35. Biedrzycki, R.; Arabas, J.; Warchulski, E. A Version of NL-SHADE-RSP Algorithm with Midpoint for CEC 2022 Single Objective Bound Constrained Problems. In Proceedings of the 2022 IEEE Congress on Evolutionary Computation (CEC), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
  36. Vesterstrom, J.; Thomsen, R. A comparative study of differential evolution, particle swarm optimization, and evolutionary algorithms on numerical benchmark problems. In Proceedings of the 2004 Congress on Evolutionary Computation (IEEE Cat. No. 04TH8753), Portland, OR, USA, 19–23 June 2004; IEEE: Piscataway, NJ, USA, 2004; Volume 2, pp. 1980–1987. [Google Scholar]
  37. Lim, S.P.; Haron, H. Performance comparison of Genetic Algorithm, Differential Evolution and Particle Swarm Optimization towards benchmark functions. In Proceedings of the 2013 IEEE Conference on Open Systems (ICOS), Kuching, Malaysia, 2–4 December 2013; pp. 41–46. [Google Scholar]
  38. Li, X.; Yang, G. Artificial bee colony algorithm with memory. Appl. Soft Comput. 2016, 41, 362–372. [Google Scholar] [CrossRef]
  39. Wang, Y.; Wang, Z.; Wang, G.-G. Hierarchical learning particle swarm optimization using fuzzy logic. Expert Syst. Appl. 2023, 232, 120759. [Google Scholar] [CrossRef]
  40. Abdelrazek, M.; Elaziz, M.A.; El-Baz, A.H. CDMO: Chaotic Dwarf Mongoose Optimization Algorithm for feature selection. Sci. Rep. 2024, 14, 701. [Google Scholar] [CrossRef]
  41. Zhang, Q.; Bu, X.; Gao, H.; Li, T.; Zhang, H. A hierarchical learning based artificial bee colony algorithm for numerical global optimization and its applications. Appl. Intell. 2024, 54, 169–200. [Google Scholar] [CrossRef]
  42. Zhang, H.; Zhang, Y.; Niu, Y.; He, K.; Wang, Y. T Cell Immune Algorithm: A Novel Nature-Inspired Algorithm for Engineering Applications. IEEE Access 2023, 11, 95545–95566. [Google Scholar] [CrossRef]
  43. He, K.; Zhang, Y.; Wang, Y.K.; Zhou, R.H.; Zhang, H.Z. EABOA: Enhanced adaptive butterfly optimization algorithm for numerical optimization and engineering design problems. Alex. Eng. J. 2024, 87, 543–573. [Google Scholar] [CrossRef]
  44. Jie, K.W.; Liu, S.Y.; Sun, X.J. A hybrid algorithm for time-dependent vehicle routing problem with soft time windows and stochastic factors. Eng. Appl. Artif. Intell. 2022, 109, 104606. [Google Scholar] [CrossRef]
  45. Utama, D.M.; Widodo, D.S.; Ibrahim, M.F.; Hidayat, K.; Baroto, T.; Yurifah, A. The hybrid whale optimization algorithm: A new metaheuristic algorithm for energy-efficient on flow shop with dependent sequence setup. J. Physics Conf. Ser. 2020, 1569, 022094. [Google Scholar] [CrossRef]
  46. Solomon, M.M. Algorithms for the Vehicle Routing and Scheduling Problems with Time Window Constraints. Oper. Res. 1987, 35, 254–265. [Google Scholar] [CrossRef]
Figure 1. Result of the effect of standard deviation on the function.
Figure 1. Result of the effect of standard deviation on the function.
Biomimetics 10 00057 g001
Figure 2. Examples of reinforcement learning Q-tables.
Figure 2. Examples of reinforcement learning Q-tables.
Biomimetics 10 00057 g002
Figure 3. Change in mobility.
Figure 3. Change in mobility.
Biomimetics 10 00057 g003
Figure 4. The flowchart of the QLBOA.
Figure 4. The flowchart of the QLBOA.
Biomimetics 10 00057 g004
Figure 5. Comparison of convergence curves for F2, F7, F9, F11, F12, and F14.
Figure 5. Comparison of convergence curves for F2, F7, F9, F11, F12, and F14.
Biomimetics 10 00057 g005
Figure 6. Error comparison for F2, F7, F9, F11, F12, and F14.
Figure 6. Error comparison for F2, F7, F9, F11, F12, and F14.
Biomimetics 10 00057 g006aBiomimetics 10 00057 g006b
Figure 7. Comparison of convergence curves for F1, F2, F5, F7, F9, and F11 of CEC2022.
Figure 7. Comparison of convergence curves for F1, F2, F5, F7, F9, and F11 of CEC2022.
Biomimetics 10 00057 g007
Figure 8. Error comparison for F1, F2, F5, F7, F9, and F11 of CEC2022.
Figure 8. Error comparison for F1, F2, F5, F7, F9, and F11 of CEC2022.
Biomimetics 10 00057 g008
Figure 9. Green vehicle routing problem.
Figure 9. Green vehicle routing problem.
Biomimetics 10 00057 g009
Figure 10. The relationship between punishment cost and time.
Figure 10. The relationship between punishment cost and time.
Biomimetics 10 00057 g010
Figure 11. The relationship between vehicle speed and time.
Figure 11. The relationship between vehicle speed and time.
Biomimetics 10 00057 g011
Figure 12. LRV application process.
Figure 12. LRV application process.
Biomimetics 10 00057 g012
Figure 13. QLBOA solution process for green vehicle routing problems with time windows.
Figure 13. QLBOA solution process for green vehicle routing problems with time windows.
Biomimetics 10 00057 g013
Figure 14. Analysis of the impact of decision makers’ subjective preferences on total costs.
Figure 14. Analysis of the impact of decision makers’ subjective preferences on total costs.
Biomimetics 10 00057 g014
Figure 15. Analysis of the impact of weight factors on the total cost.
Figure 15. Analysis of the impact of weight factors on the total cost.
Biomimetics 10 00057 g015
Figure 16. QLBOA’s path to solving different problem sets.
Figure 16. QLBOA’s path to solving different problem sets.
Biomimetics 10 00057 g016aBiomimetics 10 00057 g016b
Figure 17. Comparison of total costs between QLBOA and three other algorithms.
Figure 17. Comparison of total costs between QLBOA and three other algorithms.
Biomimetics 10 00057 g017
Table 1. Parameter settings.
Table 1. Parameter settings.
AlgorithmsNameParameter Settings
PSOParticle Swarm Optimizationa = 0.3, b = 1, c = 1
ABCArtificial Bee Colony Algorithmm = 0.2
ACOAnt Colony Optimizationc  = 10−6, Q = 20, m = 1,
GAGenetic AlgorithmQc = 1, Qm = 0.01
DEDifferential Evolution Algorithmq = 0.2, α1 = 0.8, α2 = 0.2
SCASine Cosine Algorithmα = 2, c1 = b − t × (b/T),
SOASeagull Optimization Algorithmc = 1
BOAButterfly Optimization Algorithmp = 0.8, α = 0.1, c = 0.01
CBOAOptimization Algorithm with Cubic Mapa1 = 0.1, a2 = 0.3, c = 0.01, p = 0.6, m = 0.315, P = 0.295
HPSOBOAHybrid PSO with BOA and Cubic Mapa1 = 0.1, a2 = 0.3, c = 0.01, p = 0.6, x = 0.315, P = 0.295, c1 = c2 = 0.5
IBOAImproved BOAa = 0.1, c = 0.01, P = 0.6,
x = 0.33, w = 4
QLBOABOA with Q-learningp = [0.1, 0.8], α = 0.1, c = 0.01, m = 0.1, e = 0.4
Table 2. Eighteen benchmark functions.
Table 2. Eighteen benchmark functions.
TypeNo.FunctionsSearch RangesFmin
High-dimensional unimodalF1Schwefel’s Problem 1.2[−100, 100]0
F2Generalized Rosenbrock’s Function[−10, 10]0
F3Sphere Function[−100, 100]0
F4Schwefel’s Problem 2.21[−100, 100]0
F5Schwefel’s Problem 2.22[−10, 10]0
F6Sum-of-Different-Powers Function[−100, 100]0
F7Quartic Function, i.e., Noise[−1.28, 1.28]0
F8Bent Cigar Function[−10, 10]0
F9Step Function[−100, 100]0
F10Zakharov Function[−5, 10]0
F11Discus Function[−5, 5]0
High-dimensional multimodalF12Generalized Rastrigin’s Function[−5.12, 5.12]0
F13Ackley’s Function[−32, 32]0
F14Generalized Griewank’s Function[−600, 600]0
F15HappyCat Function[−50, 50]0
F16Lévy Function[−10, 10]0
F17Katsuura Function[−50, 50]0
F18HGBat Function[−20, 20]0
Note: x* stands for the global optima. F is the fitness value. D = 30.
Table 3. CEC2022 test suite.
Table 3. CEC2022 test suite.
Type NO. Functions Fmin
Unimodal Functions1Shifted and full Rotated Zakharov Function300
Multimodal Functions2Shifted and full Rotated Rosenbrock’s Function400
3Shifted and full Rotated Rastrigin’s Function600
4Shifted and full Rotated Non-Continuous Rastrigin’s Function800
5Shifted and full Rotated Lévy Function900
Hybrid Functions6Hybrid Function 1 (N = 3)1800
7HF 2 (N = 6)2000
8HF 3 (N = 5)2200
Composition Functions9Composition Function 1 (N = 5)2300
10CF 2 (N = 4)2400
11CF 3 (N = 5)2600
12CF 4 (N = 6)2700
Search range: [−100, 100]D; “HF” means “Hybrid Function”, and “CF” means “Composition Function”.
Table 4. The results are compared with those of five other algorithms on benchmark functions.
Table 4. The results are compared with those of five other algorithms on benchmark functions.
Function GA [36]DE [37] PSO [37]ABC [38]BOA [28]QLBOA
F1Mean1.4181 × 10+033.8513 × 10−036.8100 × 10−132.6770 × 10−162.5100 × 10−118.0212 × 10−231
Std5.9444 × 10+021.0000 × 10−025.3000 × 10−136.4934 × 10−171.9300 × 10−120.0000 × 10+00
F2Mean2.4766 × 10+01 −2.0602 × 10−002.0892 × 10−023.1462 × 10−092.3900 × 10−110.0000 × 10+00
Std5.2444 × 10+009.2312 × 10−081.4800 × 10−015.3864 × 10−092.2800 × 10−120.0000 × 10+00
F3Mean2.2230 × 10+04 −1.0000 × 10−001.4184 × 10−059.3412 × 10−102.2400 × 10−110.0000 × 10+00
Std4.4852 × 10+033.1712 × 10−065.9800 × 10+028.9224 × 10−031.8800 × 10−120.0000 × 10+00
F4Mean5.1304 × 10+01 −2.8732 × 10−001.4184 × 10−055.9962 × 10−101.1900 × 10−081.3380 × 10−249
Std6.4693 × 10+001.5538 × 10−128.2700 × 10−062.3114 × 10−128.3500 × 10−100.0000 × 10+00
F5Mean6.9558 × 10+031.7400 × 10−013.5600 × 10+022.9732 × 10−102.8900 × 10+014.7112 × 10−02
Std9.7903 × 10+032.1200 × 10−012.1500 × 10+033.5514 × 10+019.5400 × 10−025.2924 × 10−02
F6Mean9.5971 × 10+02−4.1413 × 10−004.0300 × 10−024.9872 × 10−175.1700 × 10+003.6750 × 10−03
Std2.5531 × 10+021.6542 × 10−023.9800 × 10−014.6481 × 10−146.3900 × 10−013.3326 × 10−03
F7Mean3.5458 × 10−011.1500 × 10−001.4082 × 10−047.3670 × 10−144.0300 × 10−031.1574 × 10−04
Std7.3510 × 10−020.2300 × 10−001.1400 × 10−035.3882 × 10−098.7000 × 10−041.1050 × 10−04
F8Mean2.8900 × 10+011.5000 × 10−027.3800 × 10−616.0962 × 10−036.5100 × 10−175.5810 × 10−02
Std2.5400 × 10−024.0414 × 10−023.8102 × 10−607.3131 × 10−031.3900 × 10−162.8200 × 10−01
F9Mean1.5721 × 10+012.0000 × 10−081.3712 × 10−141.8780 × 10−146.3300 × 10−140.0000 × 10+00
Std5.1484 × 10+005.3312 × 10−084.6430 × 10−142.4251 × 10−133.4000 × 10−140.0000 × 10+00
F10Mean1.4434 × 10+01−1.8732 × 10+029.0222 × 10−056.5802 × 10−056.7200 × 10−110.0000 × 10+00
Std8.4536 × 10−013.3950 × 10−041.0544 × 10−041.4841 × 10−056.9000 × 10−120.0000 × 10+00
F11Mean1.5250 × 10+014.3300 × 10−039.0221 × 10−057.8280 × 10−046.7200 × 10−110.0000 × 10+00
Std7.3036 × 10+001.9000 × 10−021.0504 × 10−042.2000 × 10−046.9000 × 10−120.0000 × 10+00
F12Mean3.0789 × 10+003.1300 × 10−038.5598 × 10−033.2000 × 10−082.5100 × 10+010.0000 × 10+00
Std1.8282 × 10+009.5412 × 10−034.7900 × 10−022.2112 × 10−086.5200 × 10+010.0000 × 10+00
F13Mean8.7058 × 10+002.5170 × 10+765.3300 × 10−035.4671 × 10−057.6400 × 10−128.8824 × 10−16
Std1.2778 × 10+001.1750 × 10+777.4800 × 10−032.6163 × 10−056.9400 × 10−120.0000 × 10+00
F14Mean9.9800 × 10−016.3350 × 10−011.1512 × 10−039.1182 × 10−111.9000 × 10−100.0000 × 10+00
Std5.6000 × 10−168.6912 × 10−019.3600 × 10−047.6752 × 10−114.3400 × 10−100.0000 × 10+00
F15Mean6.1300 × 10−034.8452 × 10−044.5600 × 10+015.5400 × 10−162.5100 × 10+010.0000 × 10+00
Std5.3700 × 10−036.6000 × 10−041.1100 × 10+011.5300 × 10−166.5200 × 10+010.0000 × 10+00
F16Mean7.7620 × 10−01 −1.9400 × 10−004.7700 × 10−021.0210 × 10+011.1700 × 10+011.3780 × 10−05
Std2.0844 × 10−015.4200 × 10−076.5800 × 10−027.3521 × 10+002.6600 × 10+001.4504 × 10−02
F17Mean4.6481 × 10+011.8713 × 10+031.6900 × 10+066.7600 × 10+081.0900 × 10+093.2042 × 10+04
Std6.7914 × 10+028.0654 × 10+051.3200 × 10+062.6100 × 10+058.1600 × 10+102.3582 × 10+03
F18Mean3.0000 × 10+007.1000 × 10+055.3300 × 10+052.7110 × 10−017.6400 × 10+122.0000 × 10−01
Std0.0000 × 10+008.6942 × 10−057.4800 × 10+051.5000 × 10−016.9400 × 10+122.0940 × 10−07
Table 5. The results are compared with those of three other BOA versions on benchmark functions.
Table 5. The results are compared with those of three other BOA versions on benchmark functions.
Function IBOA [33]HPSO-BOA [34]CBOA [34]QLBOA
F1Mean1.6100 × 10−303.7400 × 10−1041.0100 × 10−130.0000 × 10+00
Std3.9000 × 10−302.0500 × 10−1032.1100 × 10−130.0000 × 10+00
F2Mean5.1100 × 10−192.6300 × 10−221.2500 × 10−148.0212 × 10−231
Std1.7300 × 10−181.4400 × 10−212.1500 × 10−140.0000 × 10+00
F3Mean6.1500 × 10−313.0400 × 10−716.3000 × 10−130.0000 × 10+00
Std1.1600 × 10−301.6700 × 10−701.3700 × 10−120.0000 × 10+00
F4Mean1.3600 × 10−193.6100 × 10−462.7700 × 10−101.3380 × 10−249
Std1.9700 × 10−191.9700 × 10−452.9600 × 10−100.0000 × 10+00
F5Mean2.8900 × 10+012.9000 × 10+012.8700 × 10+014.7112 × 10−02
Std3.4000 × 10−028.1800 × 10−021.3900 × 10−055.2924 × 10−02
F6Mean4.4400 × 10+004.1700 × 10−028.5000 × 10−063.6750 × 10−03
Std8.7000 × 10−016.4000 × 10−021.0600 × 10−053.0320 × 10−03
F7Mean1.2200 × 10−042.5500 × 10−042.0000 × 10−031.1774 × 10−04
Std8.0600 × 10−054.0000 × 10−047.8900 × 10−041.1000 × 10−04
F8Mean8.4500 × 10−317.1500 × 10−152.2400 × 10−235.5810 × 10−02
Std2.5200 × 10−303.9200 × 10−147.5100 × 10−232.8200 × 10−01
F9Mean1.3200 × 10−363.1900 × 10−1186.5800 × 10−150.0000 × 10+00
Std4.5900 × 10−361.6800 × 10−1171.1900 × 10−140.0000 × 10+00
F10Mean1.1000 × 10−303.6400 × 10−782.3700 × 10−140.0000 × 10+00
Std2.9000 × 10−301.9900 × 10−774.2400 × 10−140.0000 × 10+00
F11Mean0.0000 × 10+001.3200 × 10−1361.5400 × 10−180.0000 × 10+00
Std0.0000 × 10+006.8400 × 10−1352.8500 × 10−180.0000 × 10+00
F12Mean0.0000 × 10+000.0000 × 10+000.0000 × 10+000.0000 × 10+00
Std0.0000 × 10+000.0000 × 10+000.0000 × 10+000.0000 × 10+00
F13Mean8.2400 × 10−128.6900 × 10−111.8400 × 10−098.8824 × 10−16
Std0.0000 × 10+004.7300 × 10−101.7600 × 10−090.0000 × 10+00
F14Mean0.0000 × 10+000.0000 × 10+001.7000 × 10−140.0000 × 10+00
Std0.0000 × 10+000.0000 × 10+001.8200 × 10−140.0000 × 10+00
F15Mean0.0000 × 10+000.0000 × 10+002.5700 × 10−220.0000 × 10+00
Std0.0000 × 10+000.0000 × 10+002.2300 × 10−240.0000 × 10+00
F16Mean9.8300 × 10+007.2800 × 10−024.3500 × 10−041.3780 × 10−05
Std2.4700 × 10+001.8700 × 10−014.6600 × 10−041.0504 × 10−02
F17Mean5.8500 × 10+065.8500 × 10+042.4500 × 10+073.2042 × 10+04
Std3.2400 × 10+057.6200 × 10+033.2400 × 10+052.3582 × 10+03
F18Mean6.1000 × 10+044.5300 × 10+023.6800 × 10+032.0000 × 10−01
Std5.2000 × 10+033.1200 × 10+014.2700 × 10+032.3640 × 10−07
Note: The best results are shown in bold.
Table 6. The Wilcoxon results of comparison algorithms.
Table 6. The Wilcoxon results of comparison algorithms.
No.PSOGADEABCBOACBOAIBOAHPSOBOA
F11.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−123.0345 × 10−111.2118 × 10−121.2118 × 10−12
F23.0199 × 10−112.9802 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−11
F31.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−122.1449 × 10−131.2118 × 10−121.2118 × 10−12
F43.0199 × 10−113.0212 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−11
F53.4742 × 10−103.9935 × 10−043.9881 × 10−043.0199 × 10−113.0199 × 10−113.1559 × 10−018.1527 × 10−118.5641 × 10−04
F65.9673 × 10−092.8745 × 10−103.0199 × 10−111.3685 × 10−053.0199 × 10−113.3384 × 10−111.3289 × 10−103.0199 × 10−11
F73.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−111.0139 × 10−101.9963 × 10−053.0199 × 10−11
F81.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−127.6083 × 10−131.2118 × 10−121.2118 × 10−12
F91.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−122.3371 × 10−011.2118 × 10−121.2118 × 10−12
F101.2118 × 10−123.0199 × 10−111.2118 × 10−121.2118 × 10−121.2118 × 10−123.3735 × 10−021.2118 × 10−121.2118 × 10−12
F111.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−125.6493 × 10−131.2118 × 10−121.2118 × 10−12
F121.9324 × 10−091.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−124.5336 × 10−123.9229 × 10−052.2574 × 10−04
F131.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−123.0199 × 10−111.2118 × 10−12
F141.2118 × 10−123.0199 × 10−111.2118 × 10−121.2118 × 10−121.2118 × 10−124.3492 × 10−122.5474 × 10−041.2118 × 10−12
F151.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−121.2118 × 10−12
F166.7869 × 10−023.0199 × 10−111.8577 × 10−013.0199 × 10−113.0199 × 10−113.1589 × 10−101.3252 × 10−063.0199 × 10−11
F171.5369 × 10−033.0199 × 10−111.366 × 10−023.5350 × 10−095.6900 × 10−081.1549 × 10−121.4333 × 10−052.0149 × 10−03
F181.6490 × 10−033.0199 × 10−112.5455 × 10−054.3230 × 10−011.2118 × 10−121.2118 × 10−123.8406 × 10−031.2118 × 10−12
Note: Bold type indicates values > 0.05.
Table 7. Comparison of the results of the QLBOA with others on the CEC2022 test suite.
Table 7. Comparison of the results of the QLBOA with others on the CEC2022 test suite.
Function PSO [39]ACO [40]ABC [41]DE [39]SCA [42]SOA [42]BOA [43]QLBOA
F1Mean1.9400 × 10+031.7000 × 10+031.3900 × 10+032.4164 × 10+031.2745 × 10+031.1900 × 10+037.9116 × 10+033.1289 × 10+02
Std7.0900 × 10+025.7400 × 10+022.7200 × 10+022.7820 × 10+026.4150 × 10+021.8300 × 10+023.2110 × 10+035.8670 × 10+01
F2Mean9.8800 × 10+025.4900 × 10+024.2200 × 10+022.5130 × 10+024.6409 × 10+024.0000 × 10+034.3443 × 10+034.1452 × 10+02
Std1.6800 × 10+023.9700 × 10+021.4700 × 10+027.0540 × 10+002.4199 × 10+011.4800 × 10+034.6354 × 10+022.5278 × 10+01
F3Mean1.5300 × 10 +031.3400 × 10+037.2100 × 10+027.3080 × 10+026.1885 × 10+039.3100 × 10+021.3365 × 10+037.1135 × 10+02
Std4.5200 × 10 +015.0100 × 10+011.0300 × 10+017.6000 × 10+014.9100 × 10+014.9900 × 10+017.9134 × 10+014.6948 × 10+01
F4Mean1.8200 × 10+031.6500 × 10+031.0300 × 10+032.6500 × 10+031.6540 × 10+031.2400 × 10+031.6562 × 10+031.0012 × 10+03
Std6.4400 × 10+015.5400 × 10+011.2300 × 10+014.3370 × 10+004.3850 × 10+014.9700 × 10+014.6885 × 10+014.8552 × 10+01
F5Mean7.7600 × 10 +034.7900 × 10+031.5000 × 10+027.4620 × 10−024.4750 × 10+042.0700 × 10+034.3724 × 10+039.1416 × 10+02
Std1.2200 × 10 +023.6200 × 10 +033.0100 × 10+028.9320 × 10−028.6800 × 10+014.5200 × 10+024.9864 × 10+037.4438 × 10+01
F6Mean2.5700 × 10+032.4900 × 10 +032.2500 × 10+032.9460 × 10+032.5000 × 10+046.5800 × 10+032.5074 × 10+042.0000 × 10+03
Std4.6200 × 10+022.8900 × 10+035.1100 × 10+025.7010 × 10+022.1400 × 10+032.6000 × 10+034.8867 × 10+032.7954 × 10+02
F7Mean4.6900 × 10+034.1700 × 10+033.0800 × 10+031.1360 × 10+032.7405 × 10+031.0100 × 10+044.7423 × 10+032.0962 × 10+03
Std1.8700  × 10+021.4600 × 10+021.0500 × 10+025.5810 × 10+026.2500 × 10+024.1500 × 10+034.2523 × 10+028.0452 × 10+01
F8Mean3.6400 × 10 +033.1800 × 10+032.3300 × 10+037.7570 × 10+032.6505 × 10+032.2000 × 10+023.3674 × 10+032.6948 × 10+03
Std1.2600 × 10+023.2300 × 10+011.0600 × 10+015.1360 × 10+019.3450 × 10+013.9300 × 10+039.3323 × 10+011.0420 × 10+01
F9Mean5.4300 × 10 +034.2400 × 10 +032.8400 × 10+032.1240 × 10+035.1550 × 10+033.4100 × 10+025.1562 × 10+032.6297 × 10+03
Std3.7800 × 10 +029.4900 × 10+011.4000 × 10+012.9450 × 10−021.4700 × 10+013.5800 × 10+032.4776 × 10+029.9538 × 10+00
F10Mean6.8700 × 10+034.6700 × 10 +033.0600 × 10+033.2000 × 10+033.7804 × 10+036.3400 × 10+035.7883 × 10+033.0014 × 10+03
Std2.9500 × 10+029.6300 × 10 +011.0300 × 10+011.2870 × 10+014.3100 × 10+026.8000 × 10+034.3124 × 10+021.4200 × 10+02
F11Mean3.0600 × 10+041.7100 × 10+046.6400 × 10+053.9200 × 10+021.6450 × 10+043.2900 × 10+021.6478 × 10+042.9350 × 10+03
Std2.9300 × 10+031.7400 × 10+031.8300 × 10+057.9870 × 10+013.4607 × 10+014.3600 × 10+031.0234 × 10+033.0226 × 10+01
F12Mean2.8100 × 10+041.5500 × 10+045.5700 × 10+031.0350 × 10+001.8400 × 10+041.6300 × 10+031.8402 × 10+047.6228 × 10+03
Std1.9200 × 10+038.3500 × 10+029.4400 × 10+011.8570 × 10−025.5370 × 10+021.1300 × 10+046.5305 × 10+021.7782 × 10+03
Table 8. The Wilcoxon results of comparison algorithms on the CEC2022 suite.
Table 8. The Wilcoxon results of comparison algorithms on the CEC2022 suite.
No.PSOACOABCDESCASOABOA
F13.0199 × 10−113.0199 × 10−113.0199 × 10−113.5105 × 10−083.0199 × 10−113.5384 × 10−113.0199 × 10−11
F28.6634 × 10−053.0199 × 10−112.3168 × 10−061.1058 × 10−043.0199 × 10−112.7829 × 10−073.0199 × 10−11
F33.8347 × 10−053.0199 × 10−116.4878 × 10−095.3874 × 10−023.0199 × 10−113.0199 × 10−113.8507 × 10−05
F43.0199 × 10−113.0199 × 10−113.0199 × 10−114.4645 × 10−083.4384 × 10−115.4541 × 10−113.0199 × 10−11
F52.3715 × 10−103.0199 × 10−113.5201 × 10−073.0199 × 10−113.0199 × 10−114.6159 × 10−104.0772 × 10−11
F63.4642 × 10−104.3374 × 10−022.6947 × 10−092.5771 × 10−078.6334 × 10−053.4542 × 10−103.0199 × 10−11
F73.5923 × 10−055.4941 × 10−113.4029 × 10−014.0772 × 10−116.0658 × 10−111.6813 × 10−042.3168 × 10−06
F83.0199 × 10−118.1714 × 10−103.0199 × 10−113.0199 × 10−115.7941 × 10−113.6597 × 10−113.0199 × 10−11
F94.4440 × 10−073.0199 × 10−118.1975 × 10−072.3399 × 10−013.0199 × 10−111.9883 × 10−023.0199 × 10−11
F101.4743 × 10−101.6480 × 10−087.3391 × 10−113.3374 × 10−118.4348 × 10−091.4810 × 10−091.4643 × 10−10
F111.9527 × 10−033.0199 × 10−114.5530 × 10−012.7829 × 10−073.0199 × 10−113.4783 × 10−013.0199 × 10−11
F123.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−113.0199 × 10−11
Note: Bold type indicates values > 0.05.
Table 9. Model symbols and parameter settings.
Table 9. Model symbols and parameter settings.
SymbolMeaning
NSet of nodes, N = {0, 1, …, n}
N’Customer collection
KSet k of distribution vehicles, k ∈ K
QMaximum vehicle loading capacity
tiCustomer delivery time iN
qijkLoad of vehicle k from customer i to customer j
[ETi, DTi, LTi]The service time window at customer point i
δeWaiting penalty for early arrival of customer i
δliTardiness penalty for late arrival of customer i
dijThe distance from customer point i to j
fijkFuel consumption rate of vehicle k on road segment (i, j) (kg/km)
CvUnit fuel consumption cost (CNY/L)
eijkCarbon emission rate of vehicle k on road segment (i, j) (kg/km)
CkUnit transportation cost (CNY/km)
tijkTravel time of vehicle k on road segment (i, j)
CfCharge per unit of carbon emissions (CNY/kg)
CeVehicle fixed cost (CNY/car)
vkThe traveling speed of vehicle k
εCustomer personal preference value
MTotal vehicle weight (kg)
gConstant of gravity (9.81 m/s2)
ζSpeed of engine
VDisplacement of engine
ξDiesel fuel calorific value
xijk0–1 variable, which is 1 if vehicle k is driving on road (i, j) and 0 otherwise
yik0–1 variable, 1 when customer point i is served by vehicle k and 0 otherwise
zk0–1 variable, 1 when vehicle k is used and 0 otherwise
Table 10. Problem parameter settings.
Table 10. Problem parameter settings.
SymbolMeaning
K25
Coefficient of penalty δ100
Unit transportation cost Ck (CNY/km)1
Unit fuel consumption cost Cv (CNY/L)7.5
Charge per unit of carbon emissions Cf (CNY/kg)0.0528
Vehicle fixed cost Ce (CNY/car)100
ω0, ω1, ω2, ω3, ω4, ω5, ω6110, 0, −0.0011, −0.00235, 0, 0
χ0, χ1, χ2, χ3, χ4, χ5, χ6, χ71.27, 0.0614, 0, −0.0011, −0.00235, 0, 0, −1.33
Table 11. Analysis of the influence of the subjective preferences of decision makers on three objectives.
Table 11. Analysis of the influence of the subjective preferences of decision makers on three objectives.
DatasetsεTransportation CostFuel CostPenalty CostNumber of Vehicles
C1070.2972.851687.28256.428
0.6987.591258.63278.3611
0.81008.351381.44112.0513
C2020.2916.141118.112236.127
0.6909.761048.52208.1410
0.81193.261634.77125.0211
R1060.21185.451607.34308.2511
0.61199.101642.16225.0314
0.81346.101844.26175.7415
R2010.21206.681652.32227.646
0.6988.851353.29198.769
0.81326.531817.48155.3111
RC1020.21695.362322.64198.5214
0.61685.742135.84108.4316
0.81702.842331.7889.3517
RC2060.21589.372176.93225.356
0.61466.671906.07208.9310
0.81697.852324.8205.3512
Table 12. Analysis of the influence of weight factors on three objectives.
Table 12. Analysis of the influence of weight factors on three objectives.
DatasetsWeighting Factor (λ)Transportation CostFuel CostPenalty CostNumber of Vehicles
C107λ1 = 0.6, λ2 = 0.3, λ3 = 0.1395.04881.04250.528
λ1 = 0.1, λ2 = 0.6, λ3 = 0.3888.83503.45194.8511
λ1 = 0.3, λ2 = 0.1, λ3 = 0.6691.311132.77111.3413
C202λ1 = 0.6, λ2 = 0.3, λ3 = 0.1363.90733.96187.338
λ1 = 0.1, λ2 = 0.6, λ3 = 0.3818.78419.41145.7010
λ1 = 0.3, λ2 = 0.1, λ3 = 0.6636.83943.6783.2612
R106λ1 = 0.6, λ2 = 0.3, λ3 = 0.1479.641149.51202.5312
λ1 = 0.1, λ2 = 0.6, λ3 = 0.31079.19656.86157.5214
λ1 = 0.3, λ2 = 0.1, λ3 = 0.6839.371477.9490.0115
R201λ1 = 0.6, λ2 = 0.3, λ3 = 0.1395.54947.30178.886
λ1 = 0.1, λ2 = 0.6, λ3 = 0.3889.97541.32139.139
λ1 = 0.3, λ2 = 0.1, λ3 = 0.6692.201217.9679.5012
RC102λ1 = 0.6, λ2 = 0.3, λ3 = 0.1674.301495.0997.5911
λ1 = 0.1, λ2 = 0.6, λ3 = 0.31517.17854.3475.9013
λ1 = 0.3, λ2 = 0.1, λ3 = 0.61180.021922.2643.3717
RC206λ1 = 0.6, λ2 = 0.3, λ3 = 0.1586.66 1334.25 188.04 6
λ1 = 0.1, λ2 = 0.6, λ3 = 0.31319.99 762.43 146.25 10
λ1 = 0.3, λ2 = 0.1, λ3 = 0.61026.66 1715.46 83.57 12
Table 13. Comparison of three objectives for QLBOA and three algorithms.
Table 13. Comparison of three objectives for QLBOA and three algorithms.
DatasetsAlgorithmsTransportation CostFuel CostPenalty Cost
C107GA449.041081.05270.53
ACO435.871103.47247.95
BOA691.311132.77311.34
QLBOA395.04881.04250.52
C202GA347.68678.78217.63
ACO818.78579.46145.70
BOA635.83973.68285.76
QLBOA363.90733.96187.33
R106GA426.571648.75237.47
ACO1079.19656.86257.52
BOA839.371477.94390.01
QLBOA479.641149.51202.53
R201GA405.55997.39190.08
ACO889.97841.32139.13
BOA692.201217.9679.50
QLBOA395.54947.30178.88
RC102GA1078.381585.89108.96
ACO1017.171064.04105.90
BOA1180.021906.27243.37
QLBOA674.301495.0997.59
RC206GA1088.67 1054.25 204.33
ACO1386.34 1056.49 246.25
BOA1056.47 1895.44 283.57
QLBOA586.66 1334.25 188.04
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Meng, W.; He, Y.; Zhou, Y. Q-Learning-Driven Butterfly Optimization Algorithm for Green Vehicle Routing Problem Considering Customer Preference. Biomimetics 2025, 10, 57. https://doi.org/10.3390/biomimetics10010057

AMA Style

Meng W, He Y, Zhou Y. Q-Learning-Driven Butterfly Optimization Algorithm for Green Vehicle Routing Problem Considering Customer Preference. Biomimetics. 2025; 10(1):57. https://doi.org/10.3390/biomimetics10010057

Chicago/Turabian Style

Meng, Weiping, Yang He, and Yongquan Zhou. 2025. "Q-Learning-Driven Butterfly Optimization Algorithm for Green Vehicle Routing Problem Considering Customer Preference" Biomimetics 10, no. 1: 57. https://doi.org/10.3390/biomimetics10010057

APA Style

Meng, W., He, Y., & Zhou, Y. (2025). Q-Learning-Driven Butterfly Optimization Algorithm for Green Vehicle Routing Problem Considering Customer Preference. Biomimetics, 10(1), 57. https://doi.org/10.3390/biomimetics10010057

Article Metrics

Back to TopTop