Next Article in Journal
Quasi-Hermitian Formulation of Quantum Mechanics Using Two Conjugate Schrödinger Equations
Next Article in Special Issue
Improved Whale Optimization Algorithm Based on Fusion Gravity Balance
Previous Article in Journal
Using Alternating Minimization and Convexified Carleman Weighted Objective Functional for a Time-Domain Inverse Scattering Problem
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Learning—Based Particle Swarm Optimizer for Solving Mathematical Combinatorial Problems

1
Escuela de Ingeniería Informática, Universidad de Valparaíso, Valparaíso 2362905, Chile
2
Escuela de Ingeniería Informática, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile
*
Authors to whom correspondence should be addressed.
Axioms 2023, 12(7), 643; https://doi.org/10.3390/axioms12070643
Submission received: 18 May 2023 / Revised: 21 June 2023 / Accepted: 26 June 2023 / Published: 28 June 2023

Abstract

:
This paper presents a set of adaptive parameter control methods through reinforcement learning for the particle swarm algorithm. The aim is to adjust the algorithm’s parameters during the run, to provide the metaheuristics with the ability to learn and adapt dynamically to the problem and its context. The proposal integrates Q–Learning into the optimization algorithm for parameter control. The applied strategies include a shared Q–table, separate tables per parameter, and flexible state representation. The study was evaluated through various instances of the multidimensional knapsack problem belonging to the NP -hard class. It can be formulated as a mathematical combinatorial problem involving a set of items with multiple attributes or dimensions, aiming to maximize the total value or utility while respecting constraints on the total capacity or available resources. Experimental and statistical tests were carried out to compare the results obtained by each of these hybridizations, concluding that they can significantly improve the quality of the solutions found compared to the native version of the algorithm.

1. Introduction

Nature–inspired optimization techniques are a set of algorithms or methods designed to adapt and solve complex optimization problems [1]. The level of adaptability of metaheuristics is because they do not depend on the mathematical structure of the problem but are based on heuristic procedures and intelligent search strategies [2,3]. There is a subset of metaheuristics known as swarm intelligence methods which are defined as procedures that operate based on a population of artificial individuals who can cooperate with each other to try to find the solution to the problem. Solutions can be classified as good enough and are found in a certain time, configured through the input parameters, which dictate the internal behavior of the execution [4].
Particle swarm optimization (PSO) is probably the bio-inspired optimization algorithm most applied in the last decades [5]. This method uses inertia weights, acceleration coefficients, and social coefficients to calculate the movement of its individuals in the search space. The parameters are so relevant in the execution and performance of the algorithm that, by making small adjustments, you can directly impact the result found [6]. Based on the “No Free Lunch” theorem, we can infer that no universal configuration for this algorithm can provide us with the best possible solution for all optimization problems [7]. Therefore, adapting the algorithm to the problem in question is necessary, considering the parameters that must be readjusted when facing different problems. It has been shown that parameter setting drastically affects the final result of the algorithm, and it is still a hot topic [8]. From this, the problem of parameter adjustment arises, which can be considered an optimization problem [9]. There are at least two ways to approach the problem of parameter tuning: (a) offline tuning, which implies the identification of the best values for a problem during a testing phase, and which does not modify the parameters during the execution of the algorithm, and (b) online control, which adapts the values of the parameters during execution, according to different strategies that can be deterministic, adaptive or self–adaptive. Due to the lack of a single solution to this problem, the scientific community has searched for hybrid modules inspired by different disciplines to provide a solution. One of these methods is Learnheuristics, which combines machine learning (ML) techniques with metaheuristic algorithms [10].
This study aims to investigate and develop different online parameter control strategies for swarm intelligence algorithms at runtime through reinforcement learning [11]. The proposal contemplates integrating a variety of Q–Learning algorithms into the PSO algorithm to assist in the control of its parameters online. Each variety of Q–Learning has its unique characteristics and adapts to different situations, making it possible to effectively address a variety of scenarios. The first strategy includes Q–Learning where the Q–table stores new parameter values of PSO. The second one integrates one table for each parameter, and finally, the third method is free for states. In order to demonstrate that the different technical proposals are viable, and to compare the performance of each one, some of the most challenging instances of the multidimensional knapsack Problem (MKP) will be solved. MKP is a wide-known NP-complete optimization problem consisting of items with profit and n–dimensional weight, and knapsacks that have been filled [12]. The objective is to choose a subset of items with maximum profits without exceeding the knapsack capacities. This problem was selected because it has a wide range of practical applications and continues to be a topic of interest in the operations research community [13,14,15]. For computational experiments, 70 of the most challenging instances of the MKP taken from the OR–Library [16] were used. The results were evaluated through descriptive analysis and statistical inferences, mainly a hypothesis contrast applying non–parametric evaluations.
The rest of the manuscript is divided as follows: Section 2 presents a robust analysis of the current relationship work on hybridizations between learning techniques and metaheuristics. Section 3 details the conceptual framework of the study. Section 4 explains how reinforcement learning techniques are applied on particle swarm optimization. Section 5 exposes the phases of the experimental design, while Section 6 discusses the results achieved. Finally, the conclusions and future work are given in Section 7.

2. Related Work

In recent years, the integration between swarm intelligence algorithms and machine learning has been extensively investigated [10]. To this end, various approaches have been described to implement self-adaptive and learning capabilities in these techniques. For example, in ref. [17], the virus optimization algorithm is modified to add self-adaptive capacities in the parameters. The performance was compared with different-sized optimization instances, and similar or better performance was observed for the improved version. Similarly, in ref. [18], the firefly algorithm was enhanced to auto-compute the parameter a that controls the balance between exploration and exploitation. In ref. [19], an analogous strategy modifies the cuckoo search algorithm to balance the intensification and diversification phases. The work published in [20] proposes improving the artificial bee colony by incorporating self-adaptive capabilities of its agents. This study aims to improve the convergence ratio by altering the parameter that controlled it during the run. A comparable work can be seen in [21]. Here, the differential evolution strategy is modified by adding auto–tuning qualities to the scalability factor and the crossing ratio to increase the convergence rate. The manuscript [22] describes an improvement of the discrete particle swarm optimization algorithm, which includes an adaptive parameter control to balance social and cognitive learning. A new formulation updates the probability factor p i in the Bernoulli distribution, which updates the parameters R 1 (social learning) and R 2 (cognitive learning). Following the same line, in [23,24], self-adaptive evolutionary algorithms were proposed. The first one details an enhancement through a population of operators that change based on a punishment and reward scheme, depending on the operator’s quality. The second one presents an improvement where the crossover and mutation probability parameters are adapted to balance exploration and exploitation in the search for solutions. Both cases show outstanding results. In ref. [25], a self-adjust was applied to the flower pollination algorithm. This proposal balances the exploration and exploitation phases during the run and uses a parameter S p as an adaptive strategy. A recent wolf pack algorithm was altered to auto-tune its parameter w that controls prey odor perception [26]. Here, the new version intensifies the local search toward more promising zones. Finally, the work in [27] proposes integrating the autonomous search paradigm in the dolphin echolocation algorithm for population self-regulation. The paradigm is applied when stagnation of a local optimum is detected.
Integrating metaheuristics with machine learning, regression, and clustering techniques has also been the subject of studies [28,29,30,31,32]. For example, in ref. [33], the authors propose an evolutionary algorithm that controls the parameters and operators. This is accomplished by integrating a controller module that applies learning rules, measuring the impact and assigning restarts to the parameter set. Under this same paradigm, the work reported in [34] explores the integration of the variable neighborhood search algorithm with reinforcement learning, applying reactive techniques for parameter adjustment, and selecting local searches to balance the exploration and exploitation phases. In ref. [35], a machine learning model was developed using the support vector machine, which can predict the quality of solutions for a problem instance. This solution then adjusts the parameters and guides the metaheuristics to more promising search regions. In ref. [36,37], the authors propose the integration of PSO with regression models and clustering techniques for population management and parameter tuning, respectively. In ref. [38], another combination between PSO and classifier algorithms is presented with the goal of deparameterizing the optimization method. In this approach, a previously trained model is used to classify the solutions found by the particles, which improves the exploration of the search space and the quality of the solutions obtained. Similar to previous works, in ref. [39], PSO is again enhanced with a learning model to control its parameters, obtaining a competitive performance compared to other parameter adaptation strategies. The manuscript [40] presents the hybridization between PSO, Gaussian Process Regression, and Support Vector Machine for real–time parameter adjustment. The study concluded that the hybrid offers superior performance compared to traditional approaches. The work presented in ref. [41] integrates randomized priority search with the Inductive Decision Tree data mining algorithm for parameter adjustment through a feedback loop. Finally, in ref. [42], the authors propose the integration of algorithms derived from ant colony optimization with fuzzy logic for the control of the parameters of pheromone evaporation rate, exploration probability factor, and the number of ants for solving the feature selection problem.
More specifically, reviewing studies on integrating metaheuristics and reinforcement learning, we find many works combining these two techniques to improve the search in optimization problems [43]. For example, in [44], the authors proposed the integration of bee swarm optimization with Q–Learning to improve their local search. In this approach, the artificial individuals become intelligent agents that gain and accumulate knowledge as the algorithm progresses, thus improving the effectiveness of the search. Along the same lines, ref. [45] proposes integrating a learning–based approach to the ant colony algorithm to control its parameters. It is carried out by assigning rewards to the change of parameters in the algorithm, storing them in an array, and learning the best values to apply to each parameter at runtime. In [46], another combination of a metaheuristic algorithm with reinforcement learning techniques is proposed. In this case, tabu search was integrated with Q–Learning to find promising regions in the search space when the algorithm is stuck at a local optimum. The work published in [47] also explored the application of reinforcement learning techniques in the context of an optimization problem. Here, the biased–randomized heuristic with reinforced learning techniques was studied to consider the variations generated by the change in the rewards obtained. Finally, Ref. [48] presents the implementation of a Q–Learning algorithm to assist in training neural networks to classify medical data. In this approach, a parameter tuning process was carried out in radial–based neural networks using Stateless Q–Learning. Although the latter is not an optimization algorithm, the work is relevant to our research.
Even though machine learning techniques have already been explored in bio-inspired algorithms, it is worth continuing research on this type of hybridization. Our strategy involves reinforcement learning on PSO, using Q–Learning and new variations that have not been studied yet. In this context, we studied how Q–Learning can be modified to provide better results. This approach can be fruitful if done properly.

3. Preliminaries

3.1. Parameter Setting

Metaheuristic algorithms are characterized by a set of parameters governing their behavior. In the scientific literature, the adjustment of these parameters is an interesting issue and still an open challenge [8,9,49,50,51]. According to [8], parameter tuning can be formally defined as follows:
  • Let A be an algorithm with p 1 , p 2 , , p k parameters that affect its behavior.
  • Let C be a configuration space (parameter setting), where each configuration c C describes the values of the parameters required by A.
  • I defines a set of instances to be resolved by the algorithm A.
  • m is a metric of A performance on an instance of set I given a configuration c.
Find the best configuration c ^ C , resulting in optimal performance of A, when resolving an instance of the set I according to the metric m.
Parameter settings can be treated by applying at least two options: parameter tuning and parameter control. Parameter tuning is the process of finding those configurations that allow the algorithm to present the best possible performance when solving a specific type of problem. Different methods or algorithms can be used to find the desired configuration manually or automatically. These methods include F–Race, Sampling F–Race, Iterative F–Race, ParamILS, Sharpening, and Adaptive Capping. These methods perform parameter changes before the execution of the algorithm, thus being considered a lengthy offline process that requires a large number of executions for each configuration instance [52]. On the other hand, parameter control is considered an online process because it focuses on implementing parameter changes during the execution of the algorithm [53]. Parameter control is classified into three strategies: deterministic, adaptive, and self–adaptive [54]. The deterministic strategy uses deterministic rules to change the parameters and does not have a feedback system. The adaptive strategy uses feedback to vary the direction and magnitude of the parameter change. The self-adaptive strategy encodes the parameters of each individual in the population and modifies them in a certain number of iterations based on the best solutions found up to that moment.

3.2. Reinforcement Learning

Reinforcement learning is an algorithm part of ML whose objective is to determine the set of actions that an agent must take to maximize a given reward [11]. These agents need to be able to figure out for themselves which actions have the highest reward return, which is accomplished through trial and error. In some cases, these actions can yield long–term rewards, known as retroactive rewards. Trial and error and retroactive rewards are the two most important features of reinforcement learning [55,56].
Reinforcement learning employs a cycle involving several elements: the agent, the environment, the value function, the reward, the policy, and the model [57]. The agent is the element that performs actions in the environment by learning and incorporating new policies to follow in future actions. Policies vary based on the reward received for performing an action. The agent’s objective is to choose actions that increase the reward in the long term. The environment is where the agent’s actions are applied, and based on these actions, it is the place where the state changes will be caused. The value function uses these state changes to determine the impact of each action. The value function is the evaluator element within the algorithm. Its function is to evaluate each action carried out by the agent and its impact on the environment. This is achieved by assigning a reward to the action/state change pair and delivering it to the agent, thus completing the cyclic element. The reward is the numerical value that the value function assigns to an action/state change pair. Usually, a positive value indicates positive feedback, while a negative value indicates negative feedback. The policy is the element that gives the algorithm the ability to continuously learn, allowing the agent’s behavior to be defined during the execution of the algorithm. Generally, a policy maps the perceived states in the environment and the actions that must be taken in those states. Finally, the model is an optional element that serves as input into the agent’s decision making. Unlike policies, the model is static at runtime and only has predefined data.
Uncertainty in Q–Learning can affect the exploration phase of PSO. This can impact the convergence and quality of solutions generated by PSO. The relationship between the learning uncertainty Q–Learning and the PSO is complex and depends on several factors, such as the configuration of the algorithm and the interaction between the parameters. This conflict remains latent among the scientific community. Optimization algorithms governed by learning methods continue to be a hot topic [58].

3.2.1. Q–Learning

Q–Learning is a model-free reinforcement learning algorithm introduced by C. Watkins in 1989 [59]. This algorithm introduces the Q function, which works as a table that stores the maximum expected rewards for each action performed in a given state. This function is defined as [60,61]:
Q t + 1 ( S t , A t ) Q t ( S t , A t ) + α [ R t + 1 + γ m a x a Q t ( S t + 1 , a ) Q t ( S t , A t ) ]
where A is the set of actions applicable to the environment, S t + 1 is the set of possible states of the environment in the next iteration, S t is the current state, R is the reward given for changing the state, α [ 0 , 1 ] is the learning rate, and γ [ 0 , 1 ] is the discount factor. The procedure of Q–Learning can be seen in Algorithm 1.
Algorithm 1: Q–Learning pseudocode
Axioms 12 00643 i001

3.2.2. Single State Q–Learning

In cases where it is difficult (or outright impossible) to determine the states of the system, there is the possibility of reducing Q–Learning to a single static state. This method is called Single State Q–Learning (or Stateless Q–Learning) [62,63]. Given the simplification of the algorithm, the table Q is transformed to an array, and the Equation (1) is abbreviated to the Equation (2):
Q ( A t ) Q ( A t ) + α [ R Q ( A t ) ]
where A is the set of actions applicable to the environment, R is the reward given for performing an action, and α [ 0 , 1 ) is the learning rate.

4. Developed Solution

In this section, we detail different ways to apply Q–Learning on PSO and how this implementation allows us to improve its performance when solving NP–Complete combinatorial optimization problems.

4.1. Particle Swarm Optimization

Particle swarm optimization is a swarm intelligence algorithm inspired by group behavior that occurs in flocks of birds and schools of fish [64]. In this algorithm, each particle represents a possible solution to the problem and has a velocity vector and a position vector.
PSO consists of a cyclical process in which each particle sees its trajectory influenced based on two types of learning: social learning, acquired through knowledge of other particles in the swarm, and cognitive learning, acquired through the experiences of the particle [5,65]. In the traditional PSO, the velocity of the particle is represented as a vector v i = v i 1 , v i 2 , , v i j , , v i n , as long as their position is described as v i = x i 1 , x i 2 , , x i j , , x i n . Initially, each particle’s position vector and velocity vector are randomly created. Then, during the execution of the algorithm, the particles are moved using the Equations (3) and (4):
v i j ( t + 1 ) = w v i j ( t ) + r 1 ϕ 1 ( p B e s t i j x i j ( t ) ) + r 2 ϕ 2 ( g B e s t j x i j ( t ) )
x i j ( t + 1 ) = x i j ( t ) + v i j ( t + 1 )
where w is the acceleration coefficient, r 1 and r 2 are social learning and cognitive learning, respectively, and finally, ϕ 1 and ϕ 2 are uniformly distributed random values in the range [ 0 , 1 ) . The method needs a memory, called p B e s t i , representing the best position met for the i–th particle. The best particle is stored in g B e s t , and it is found when the algorithm ends. Algorithm 2 summarizes the PSO search procedure.
Algorithm 2: PSO pseudocode
Axioms 12 00643 i002
As a solution to the parameter optimization problem, integrating the Q–Learning algorithm in a traditional PSO is proposed. The objective of this combination is focused on PSO that can acquire the ability to adapt its parameters online, that is, during the execution of the algorithm.
The approach initializes by declaring the swarm, its particles, the necessary velocity vectors, and the Q table. The normal course for a PSO algorithm is then continued. The Q–Learning module is invoked when the algorithm is stagnation. We detect stagnation by applying the approximate theory of nature-inspired optimization algorithms, derived from x ( t + 1 ) x ( t ) ρ Δ x b e s t x ( t ) , with ρ uniform value in [ 0 , 1 ) . This module analyzes the environment for possible state changes and then updates the Q table with the appropriate reward. If this is the first call to this module, these steps should be ignored. Subsequently, a decision must be made between two possible actions to adjust the algorithm’s parameters. One comes from the policy derived from Q, while the other option is to change the parameters randomly. The difference between these two options is that the first one will provide a better return depending on the existing knowledge. In contrast, the second one will allow us to find previously unknown knowledge. Finally, the Q–Learning module transfers control to PSO to continue operating. Figure 1 depicts the flow of the proposal.
In the work, the proposed strategies allow us to integrate different levels of reinforced learning into swarm intelligence algorithms: Classic Q–Learning, Modified Q–Learning, and Single State Q–learning.
(a)
Classic Q–Learning (CQL): The first one directly applies the Q–Learning theory, including a single Q table. This table represents the states by combining the possible values of each parameter (in intervals of 0 and 1) and actions (transitions from one state to another). This process computes the fitness variation from the previous invoke to the module to the current call, assigning a positive or negative reward value according to the objective function. Then, the Q table is updated concerning the action/state pair. Finally, the most favorable action is derived according to the current state and the greedy policy, applying the new set of parameters to the swarm.
(b)
Modified Q–Learning (MQL): The second one is a variation of the classic Q–Learning. This strategy divides the Q table by each parameter and, furthermore, decreases the number of possible actions, allowing the agent only to move forward, backward, or stay in one state. In this method, the module is allowed to use each particle individually for the training of the Q tables and the modification of parameters. This means that, unlike the method seen above, this strategy is invoked only by one of the individuals in the swarm instead of the entire swarm. It is important to consider that each particle must store its current state and action as attributes such as velocity or position.
(c)
Single state Q–Learning (SSQL): The third one replaces the concept of a Q table with an array, removing states and looking only at changes produced by actions. The new Q array includes actions representing the parameter amount to modify. Similar to the previous version, the Q array has been split by each parameter and allows to use of the entire swarm individually. These changes effectively remove the state dependency, granting for more precise parameter changing.

4.2. Integration

Algorithm 3 shows the steps to follow to integrate reinforcement learning into PSO.
Algorithm 3: Integration of reinforcement learning into PSO
Axioms 12 00643 i003
Firstly, the procedure determines the states that represent the current condition or configuration of the PSO algorithm. These states could include the positions of the particles, their velocities, or any other relevant variables. Next, the algorithm identifies the actions that can be taken to modify the parameters of the PSO algorithm. These actions could involve changing the inertia weight, acceleration coefficients, or any other parameter that influences the behavior of the particles. In the third step, the method uses the reward function that evaluates the performance of the PSO algorithm based on the solutions obtained. The reward function should provide feedback on how well the algorithm is performing and guide the Q–learning process. The fourth step describes how the Q–table is created. Q–table is a lookup table that maps states and actions to their corresponding Q–values and it is initially filled with random elements.
The iterative process runs while PSO needs it. Here, the current state of PSO saves information, such as the positions and velocities of the particles, and the best solution found so far. Next, the most appropriate action is chosen based on the ϵ –greedy method. This method is employable because all actions are equally applicable. We consider the ϵ –greedy method as the policy to identify the best action. A uniform probability of 1 ϵ where ϵ [ 0 , 1 ] is used. The action modifies the parameter configuration of PSO in a random way: c c φ ψ , where φ [ m i n ( c ) , m a x ( c ) ] and ψ { 1 , 0 , 1 } . Then, PSO is executed to obtain the solutions. Its performance is evaluated by using the reward function and it allows to modify the Q–value of the previous state–action pair applying the Q–learning update rule. Finally, the previous state is updated to the current state, preparing for the next iteration of the algorithm. Over time, the Q-table will be updated, and the PSO algorithm will learn to select actions that lead to better solutions.
Before implementing, we analyze the time complexity of each component and its integration. Firstly, the time complexity of PSO depends mainly on the number of particles and the number of iterations. At each iteration, the positions and velocities of the particles are updated, and the objective function is evaluated. However, in this case, these components are constant, and the algorithm really depends on the dimensionality of the problem. Therefore, the complexity of the PSO algorithm itself is O ( K n ) , where K represents the number of particles per number of iterations, and n defines the number of decision variables. On the other hand, the time complexity of Q–Learning is based on the size of the Q–table, which is determined by the number of possible states and actions. If the search space is large and the Q–table is large, the complexity will increase. In our case, we use a value range for parameters that remain constant during the run of PSO. Then, we can guarantee that the three proposals are efficient because none of them exceeds the polynomial time.

5. Experimental Setup

In order to comprehensively evaluate the performance of the proposed hybridizations, it is crucial to conduct a robust analysis that encompasses various aspects. One essential step in this analysis is to compare the solutions obtained by each strategy with the classic version of PSO. By benchmarking the solutions against the PSO’s results, we establish a reliable reference point for evaluating the effectiveness and efficiency of the proposed hybridizations. This enables us to gauge the extent to which the algorithmic enhancements contribute to improving solution quality and reaching optimality. Moreover, this comparative analysis serves to validate the credibility and competitiveness of the proposed hybridizations in the field. By showcasing their ability to achieve results that are on par with or even surpass PSO, we can establish the superiority of our approach and its potential to outperform existing methods.
To ensure the robustness of the performance analysis, it is important to employ a diverse set of benchmark problems that accurately represent the challenges and complexities encountered in real-world scenarios. By testing the proposed hybridizations on these benchmarks, we can assess their adaptability, generalizability, and ability to handle various problem instances effectively.
Figure 2 indicates the steps taken to examine the three proposals’ performance thoroughly. In addition, we establish objectives and recommendations for the experimental phase, in order to demonstrate that the proposed approaches allow for improving the optimization of metaheuristic parameters.
The analyses include: (a) the resolution time to determine the difference produced when applying the different methods, (b) the best value found by each method, which is an important indicator to assess future results, and finally, (c) an ordinal analysis and statistical tests to determine if one method is significantly better than another.
For the experimental phase, several optimization instances were solved in order to measure the performance of the different proposed methods. These instances were taken from the OR–Library, a virtual library that J.E. Beasley first described in 1990 [16], and in which it is possible to find various test data sets. In this study, 70 binary instances of the multidimensional knapsack problem were used (from MKP1 to MKP70). Table 1 details each instance, indicating its optimal solution, the number of backpacks, and the number of objects.
For instances from MKP56 to MKP70, there are no recorded optimal values because they could not be resolved by using exact methods. For this reason, we use “unknown” to describe that this value has not been found to date.
Equation (5) defines the formulation of MKP:
max j = 1 n p j x j s u b j e c t   t o j = 1 n w j k x j b k , k { 1 , , K } x j { 0 , 1 }
where x j describes whether or not the object is included in a backpack, and n represents the total number of objects. Each object has a real value p j that represents its profit and is used to calculate the objective function. Finally, w j k stores the weight of each object based on the backpack k with maximum capacity b k . As can be seen, this is a combinatorial problem that deals with the dilemma of including or not an object in a certain backpack that has a certain capacity.
To execute a metaheuristic of a continuous nature in a binary domain, it is required to add a binarization phase after the solution vector changes [69]. Here, a standard sigmoid function was used as the transformation function, that is, [ 1 / ( 1 + e x i j ) ] > δ , with δ as a uniform random value between [ 0 , 1 ) . Then, if the previous formulation is true, we use x i j 1 as discretization. Otherwise, we use x i j 0 .
The performance of each method is evaluated after resolving each of the 70 instances a total of 30 times. Once the complete set of results is obtained from all executions and instances, an outlier analysis is performed to study possible irregular results. Here, influential outliers were detected using the Tukey test, which takes as reference the difference between the first quartile (Q1) and the third quartile (Q3), or the interquartile range. In our case, it is considered a slight outlier if the result is 1.5 times that distance from one of those quartiles or an extreme outlier if it is three times that distance. This test was implemented using a spreadsheet to calculate the statistical values automatically. All outliers were removed to avoid distortion of the samples, and then new tests were taken to replace the removed solutions. Moreover, we use the metric of the relative percentage difference (RPD) between the best solution to the problem and the best solution found. This value is calculated on ( b e s t r e a c h e d ) / b e s t .
As a next step, a descriptive and statistical analysis of the results was carried out. For the first, metrics such as maximum and minimum values, the mean, the quasi–standard deviation, the median, and the interquartile range are used to compare the results generated by the three methods. The second analysis corresponds to statistical inference. In this analysis, two hypotheses are contrasted to reveal the one with the greatest statistical significance. The tests employed for that were: (a) the Shapiro–Wilk test for normality and (b) the Wilcoxon–Mann–Whitney test for heterogeneity. In addition, for a better understanding of the robustness of the analysis, it is essential to highlight that, given the independent nature of the instances, the results obtained in any of them do not affect the results of the others. Likewise, the repetition of an instance does not imply the need for more repetitions of the same instance.
In [70], the parameter values with the best average results in terms of swarm performance are described. Considering this, it has been determined that the initial values for the PSO parameters will be: M a x I t e r = 2000 , p o p S i z e = 100 , w = 0.6 , and r 1 = r 2 = 0.9 . A sampling phase was carried out for Q–Learning’s parameters to determine their value that offers the best results. Then, the best initial configuration was: S t e p S i z e = 6 , γ = ϵ = 0.5 , and α = 0.1 . Finally, all the methods were coded in the Java 1.8 programming language, and executed on a workstation whose infrastructure had a Windows 10 Enterprise operating system, AMD Ryzen 7 1700m 8–core 3.64 GHz processor, and 16 GB of memory. 1197.1 MHz RAM. It is important to note that parallel implementation was not required. Instances, data, and codes are available in [71,72,73].

6. Discussion

All algorithms were run 30 times in the testing phase for each instance. Results were recorded, distinguishing each method to be further compared (native PSO or NPSO, classic Q–Learning or CQL, modified Q–Learning or MQL, and single state Q–Learning or SSQL). Table 2 summarizes how many known optimums were found for each version, and in the case of the instances with unknown optimums, how many of these could reach the best solution found in a limited testing time. We employ a cut–off equal to five minutes. If an approach exceeds this bound, it is not included in the results.
Analyzing only these results, we can see that the number of optimal values achieved by the native PSO is better than those achieved by the basic Q–Learning implementation. In contrast, both are overshadowed by our modified version of Q–Learning and the single state Q–Learning. Thus, we can preliminarily observe that: (a) the performance of basic Q–Learning is inferior to that of native PSO, (b) a significant difference between modified Q–Learning and single state Q–Learning cannot yet be detected, (c) there is a significant difference between modified Q–Learning and single state Q–Learning in terms of the unknown optimums reached. In general, the single state Q–Learning obtained the best results. All the results obtained by each version of the algorithms are present in Table 3, Table 4, Table 5 and Table 6.
Now, to demonstrate more robustly which approach works best, we take more restricted instances of MKP to graph the distribution of best values generated by each strategy. These instances have many objects to select and a small number of backpacks to use.
Figure 3 shows the convergences of each method. For the MKP06 instance, we can see a similar convergence among strategies, with classic Q–Learning being the version with the latest convergence compared to the others. For the MKP35 instance, large convergence differences are observed in the four strategies. Here, PSO is the algorithm with the latest convergence, and the modified version of Q–Learning and the Single State version have the earliest convergence. For the MKP70 instance, a similar performance can be seen between the modified Q–Learning and single state Q–Learning, while the convergences between default PSO and classic Q–Learning are the latest.
Observing the results presented by the distribution Figure 4, it can be again concluded that in general, the standard PSO obtains final results very similar to its version assisted by Q–Learning. With these results and the previously mentioned, we dare say that a possible explanation for this phenomenon would be the high time cost required to train all the action/state pairs of the Q table, causing that, at the end of the execution, the algorithm cannot find a better parametric configuration than the initial one.
This possible problem is mitigated in the other two implemented methods due to the considerable reduction and division of the Q table. Last, the PSO algorithms assisted by the modified Q–Learning and single state Q–Learning obtain significantly better results. Here, we observe that both algorithms, in their runtime, train, and obtain edge configurations adjusted to the instance.
Following up with a robust result review, we employed the two statistical tests (mentioned in Section 5): (a) normality assessment and (b) contrast of hypotheses to determine if the samples come or not from an equidistributed sequence. For determining if observations (runs per instance) draw a Gaussian distribution, we establish H 0 as samples follow a normal distribution. Then, H 1 is the opposite.
The cutoff of the p–value is 0.05 , for which results under this threshold state the test is said to be significant ( H 0 rejects). Results confirmed that the samples do not follow a normal distribution, so we employ the non-parametric test, Mann–Whitney–Wilcoxon. Here, we assume H 0 as the null hypothesis that affirms native methods generate better values than their versions improved by the Q–Learning. Thus, H 1 suggests otherwise. In total, six tests were carried out, and the results are presented in Table 7 and Table 8. In the comparison between native PSO and the classic Q–Learning, we can note that the first one exceeds the 95% reliability threshold in 59 of the 70 instances, while the second one only does so in one instance.
On the other hand, in the comparison between native PSO and the modified Q–Learning, it is possible to observe that MQL surpasses the threshold in 44 instances, while NPSO does so only in two instances. Regarding the comparison between NPSO and SSQL, we observe that the latter outperforms the threshold in 53 instances, while NPSO does not exceed the threshold in any instance. Finally, in comparing MQL and the single state version, we can analyze that SSQL beats the threshold in 10 instances, while the modified version of Q–Learning only does so in six instances. Furthermore, in the remaining comparisons, MQL and SSQL exceed the 95% confidence threshold in all instances, while classic Q–Learning and NPSO do not exceed the threshold in any instance.
From all obtained results, we can conclude that there is a better performance in the modified Q–Learning and SSQL compared to its classic version and the standard PSO.

7. Conclusions

This article presents an approach to improve the efficiency of a swarm intelligence algorithm when solving complex optimization problems by integrating reinforcement learning techniques. Specifically, we use Q–Learning to adjust the optimal parameters of the particle swarm optimization for solving several instances of the multidimensional knapsack problem.
The analysis of the data obtained in the testing phase shows that the algorithms assisted by reinforcement learning obtained better results in multiple aspects when compared to the native version of PSO. In particular, the single state Q–Learning assisting PSO finds solutions that, as a whole, have better quality in terms of mean, median, standard deviation, and interquartile ranges. In addition, it is observed that SSQL achieves earlier convergence in significant instances when compared to the other methods. Notwithstanding the preceding, it is observed that the native PSO has a slightly better general performance than PSO improved by classic Q–Learning. This is attributed to the high time cost required to train all the action/state pairs. Here, Q–Learning can not guarantee the algorithm finding a better-than-initial parametric configuration at the end of each run. It is suggested to explore the performance of Q–learning with PSO in other optimization problems beyond the multidimensional knapsack problem. In general views, the effect of different parameter settings for the Q–learning algorithm, such as the learning rate and the discount factor, should also be explored to evaluate their effectiveness in the reinforcement learning method in conjunction with PSO.
Finally, it is suggested to explore the comparisons of using the Q–learning method with other bio-inspired algorithms, such as the gray wolf optimizer, whale optimization, bald eagle search optimization, and Harris hawks optimization, among others.
In conclusion, there is a promising approach for improving the performance of swarm intelligence algorithms through reinforcement learning when solving optimization problems. More research is needed to explore its effectiveness in other complex optimization problems and to be able to compare the results obtained with other currently existing methods.

Author Contributions

Formal analysis, R.O., R.S. and B.C.; investigation, R.O., R.S., B.C. and D.N.; methodology, R.O., R.S. and B.C.; resources, R.S. and B.C.; software, R.O. and D.N.; validation, R.S., B.C., V.R., P.O., C.R. and S.M.; writing—original draft, R.O., V.R., P.O., C.R., S.M. and D.N.; writing—review and editing, R.O., R.S., B.C., V.R., P.O., C.R., S.M. and D.N. All authors have read and agreed to the published version of the manuscript.

Funding

Rodrigo Olivares is supported by grant ANID/FONDECYT/INICIACIÓN/11231016. Broderick Crawford is supported by Grant ANID/FONDECYT/REGULAR/1210810.

Data Availability Statement

Acknowledgments

Víctor Ríos and Pablo Olivares received scholarships REXE 2286/2022 and REXE 4054/2022, respectively. Both were from Doctorado en Ingeniería Informática Aplicada, Universidad de Valparaíso.

Conflicts of Interest

The authors declare no conflict of interest. The funding sponsors had no role in the study’s design; in the collection, analyses, or interpretation of data; in the writing of the manuscript, and in the decision to publish the results.

References

  1. Du, K.L.; Swamy, M.; Du, K.L.; Swamy, M. Particle swarm optimization. In Search and Optimization by Metaheuristics: Techniques and Algorithms Inspired by Nature; Springer: Berlin/Heidelberg, Germany, 2016; pp. 153–173. [Google Scholar]
  2. Talbi, E.G. Metaheuristics: From Design to Implementation; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
  3. Boussaïd, I.; Lepagnot, J.; Siarry, P. A survey on optimization metaheuristics. Inf. Sci. 2013, 237, 82–117. [Google Scholar] [CrossRef]
  4. Panigrahi, B.K.; Shi, Y.; Lim, M.H. Handbook of Swarm Intelligence: Concepts, Principles and Applications; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2011; Volume 8. [Google Scholar]
  5. Shami, T.M.; El-Saleh, A.A.; Alswaitti, M.; Al-Tashi, Q.; Summakieh, M.A.; Mirjalili, S. Particle Swarm Optimization: A Comprehensive Survey. IEEE Access 2022, 10, 10031–10061. [Google Scholar] [CrossRef]
  6. Bansal, J.C. Particle swarm optimization. In Evolutionary and Swarm Intelligence Algorithms; Springer: Berlin/Heidelberg, Germany, 2019; pp. 11–23. [Google Scholar]
  7. Wolpert, D.H.; Macready, W.G. No free lunch theorems for optimization. IEEE Trans. Evol. Comput. 1997, 1, 67–82. [Google Scholar] [CrossRef]
  8. Hoos, H.H. Automated algorithm configuration and parameter tuning. In Autonomous Search; Springer: Berlin/Heidelberg, Germany, 2012; pp. 37–71. [Google Scholar]
  9. Huang, C.; Li, Y.; Yao, X. A Survey of Automatic Parameter Tuning Methods for Metaheuristics. IEEE Trans. Evol. Comput. 2019, 24, 201–216. [Google Scholar] [CrossRef]
  10. Calvet, L.; Armas, J.D.; Masip, D.; Juan, A.A. Learnheuristics: Hybridizing metaheuristics with machine learning for optimization with dynamic inputs. Open Math. 2017, 15, 261–280. [Google Scholar] [CrossRef]
  11. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  12. Skackauskas, J.; Kalganova, T. Dynamic Multidimensional Knapsack Problem benchmark datasets. Syst. Soft Comput. 2022, 4, 200041. [Google Scholar] [CrossRef]
  13. Liu, J.; Wu, C.; Cao, J.; Wang, X.; Teo, K.L. A binary differential search algorithm for the 0–1 multidimensional knapsack problem. Appl. Math. Model. 2016, 40, 9788–9805. [Google Scholar] [CrossRef]
  14. Cacchiani, V.; Iori, M.; Locatelli, A.; Martello, S. Knapsack problems-An overview of recent advances. Part II: Multiple, multidimensional, and quadratic knapsack problems. Comput. Oper. Res. 2022, 143, 105693. [Google Scholar] [CrossRef]
  15. Rezoug, A.; Bader-El-Den, M.; Boughaci, D. Application of supervised machine learning methods on the multidimensional knapsack problem. Neural Process. Lett. 2022, 54, 871–890. [Google Scholar] [CrossRef]
  16. Beasley, J.E. OR-Library: Distributing test problems by electronic mail. J. Oper. Res. Soc. 1990, 41, 1069–1072. [Google Scholar] [CrossRef]
  17. Liang, Y.C.; Cuevas Juarez, J.R. A self-adaptive virus optimization algorithm for continuous optimization problems. Soft Comput. 2020, 24, 13147–13166. [Google Scholar] [CrossRef]
  18. Olamaei, J.; Moradi, M.; Kaboodi, T. A new adaptive modified firefly algorithm to solve optimal capacitor placement problem. In Proceedings of the 18th Electric Power Distribution Conference, Kermanshah, Iran, 30 April–1 May 2013; pp. 1–6. [Google Scholar]
  19. Li, X.; Yin, M. Modified cuckoo search algorithm with self adaptive parameter method. Inf. Sci. 2015, 298, 80–97. [Google Scholar] [CrossRef]
  20. Li, X.; Yin, M. Self-adaptive constrained artificial bee colony for constrained numerical optimization. Neural Comput. Appl. 2014, 24, 723–734. [Google Scholar] [CrossRef]
  21. Cui, L.; Li, G.; Zhu, Z.; Wen, Z.; Lu, N.; Lu, J. A novel differential evolution algorithm with a self-adaptation parameter control method by differential evolution. Soft Comput. 2018, 22, 6171–6190. [Google Scholar] [CrossRef]
  22. de Barros, J.B.; Sampaio, R.C.; Llanos, C.H. An adaptive discrete particle swarm optimization for mapping real-time applications onto network-on-a-chip based MPSoCs. In Proceedings of the 32nd Symposium on Integrated Circuits and Systems Design, Sao Paulo, Brazil, 26–30 August 2019; pp. 1–6. [Google Scholar]
  23. Cruz-Salinas, A.F.; Perdomo, J.G. Self-adaptation of genetic operators through genetic programming techniques. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 15–19 July 2017; pp. 913–920. [Google Scholar]
  24. Kavoosi, M.; Dulebenets, M.A.; Abioye, O.F.; Pasha, J.; Wang, H.; Chi, H. An augmented self-adaptive parameter control in evolutionary computation: A case study for the berth scheduling problem. Adv. Eng. Inform. 2019, 42, 100972. [Google Scholar] [CrossRef]
  25. Nasser, A.B.; Zamli, K.Z. Parameter free flower algorithm based strategy for pairwise testing. In Proceedings of the 2018 7th international conference on software and computer applications, Kuantan Malaysia, 8–10 February 2018; pp. 46–50. [Google Scholar]
  26. Zhang, L.; Chen, H.; Wang, W.; Liu, S. Improved Wolf Pack Algorithm for Solving Traveling Salesman Problem. In FSDM; IOS Press: Amsterdam, The Netherlands, 2018; pp. 131–140. [Google Scholar]
  27. Soto, R.; Crawford, B.; Olivares, R.; Carrasco, C.; Rodriguez-Tello, E.; Castro, C.; Paredes, F.; de la Fuente-Mella, H. A reactive population approach on the dolphin echolocation algorithm for solving cell manufacturing systems. Mathematics 2020, 8, 1389. [Google Scholar] [CrossRef]
  28. Karimi-Mamaghan, M.; Mohammadi, M.; Meyer, P.; Karimi-Mamaghan, A.M.; Talbi, E.G. Machine learning at the service of meta-heuristics for solving combinatorial optimization problems: A state-of-the-art. Eur. J. Oper. Res. 2022, 296, 393–422. [Google Scholar] [CrossRef]
  29. Gómez-Rubio, Á.; Soto, R.; Crawford, B.; Jaramillo, A.; Mancilla, D.; Castro, C.; Olivares, R. Applying Parallel and Distributed Models on Bio–Inspired Algorithms via a Clustering Method. Mathematics 2022, 10, 274. [Google Scholar] [CrossRef]
  30. Caselli, N.; Soto, R.; Crawford, B.; Valdivia, S.; Olivares, R. A self–adaptive cuckoo search algorithm using a machine learning technique. Mathematics 2021, 9, 1840. [Google Scholar] [CrossRef]
  31. Soto, R.; Crawford, B.; Molina, F.G.; Olivares, R. Human behaviour based optimization supported with self–organizing maps for solving the S–box design Problem. IEEE Access 2021, 2021, 1–14. [Google Scholar] [CrossRef]
  32. Valdivia, S.; Soto, R.; Crawford, B.; Caselli, N.; Paredes, F.; Castro, C.; Olivares, R. Clustering–based binarization methods applied to the crow search algorithm for 0/1 combinatorial problems. Mathematics 2020, 8, 1070. [Google Scholar] [CrossRef]
  33. Maturana, J.; Lardeux, F.; Saubion, F. Autonomous operator management for evolutionary algorithms. J. Heuristics 2010, 16, 881–909. [Google Scholar] [CrossRef]
  34. dos Santos, J.P.Q.; de Melo, J.D.; Neto, A.D.D.; Aloise, D. Reactive search strategies using reinforcement learning, local search algorithms and variable neighborhood search. Expert Syst. Appl. 2014, 41, 4939–4949. [Google Scholar] [CrossRef]
  35. Zennaki, M.; Ech-Cherif, A. A new machine learning based approach for tuning metaheuristics for the solution of hard combinatorial optimization problems. J. Appl. Sci. 2010, 10, 1991–2000. [Google Scholar] [CrossRef]
  36. Lessmann, S.; Caserta, M.; Arango, I.M. Tuning metaheuristics: A data mining based approach for particle swarm optimization. Expert Syst. Appl. 2011, 38, 12826–12838. [Google Scholar] [CrossRef]
  37. Liang, X.; Li, W.; Zhang, Y.; Zhou, M. An adaptive particle swarm optimization method based on clustering. Soft Comput. 2015, 19, 431–448. [Google Scholar] [CrossRef]
  38. Harrison, K.R.; Ombuki-Berman, B.M.; Engelbrecht, A.P. A parameter-free particle swarm optimization algorithm using performance classifiers. Inf. Sci. 2019, 503, 381–400. [Google Scholar] [CrossRef]
  39. Dong, W.; Zhou, M. A supervised learning and control method to improve particle swarm optimization algorithms. IEEE Trans. Syst. Man Cybern. Syst. 2016, 47, 1135–1148. [Google Scholar] [CrossRef]
  40. Kurek, M.; Luk, W. Parametric reconfigurable designs with machine learning optimizer. In Proceedings of the 2012 International Conference on Field-Programmable Technology, Seoul, Republic of Korea, 10–12 December 2012; pp. 109–112. [Google Scholar]
  41. Al-Duoli, F.; Rabadi, G. Data mining based hybridization of meta-RaPS. Procedia Comput. Sci. 2014, 36, 301–307. [Google Scholar] [CrossRef]
  42. Wang, G.; Chu, H.E.; Zhang, Y.; Chen, H.; Hu, W.; Li, Y.; Peng, X. Multiple parameter control for ant colony optimization applied to feature selection problem. Neural Comput. Appl. 2015, 26, 1693–1708. [Google Scholar] [CrossRef]
  43. Seyyedabbasi, A.; Aliyev, R.; Kiani, F.; Gulle, M.U.; Basyildiz, H.; Shah, M.A. Hybrid algorithms based on combining reinforcement learning and metaheuristic methods to solve global optimization problems. Knowl.-Based Syst. 2021, 223, 107044. [Google Scholar] [CrossRef]
  44. Sadeg, S.; Hamdad, L.; Remache, A.R.; Karech, M.N.; Benatchba, K.; Habbas, Z. Qbso-fs: A reinforcement learning based bee swarm optimization metaheuristic for feature selection. In Proceedings of the Advances in Computational Intelligence: 15th International Work-Conference on Artificial Neural Networks, IWANN 2019, Gran Canaria, Spain, 12–14 June 2019; Proceedings, Part II 15. Springer: Berlin/Heidelberg, Germany, 2019; pp. 785–796. [Google Scholar]
  45. Sagban, R.; Ku-Mahamud, K.R.; Bakar, M.S.A. Nature-inspired parameter controllers for ACO-based reactive search. Res. J. Appl. Sci. Eng. Technol. 2015, 11, 109–117. [Google Scholar] [CrossRef]
  46. Nijimbere, D.; Zhao, S.; Gu, X.; Esangbedo, M.O.; Dominique, N. Tabu search guided by reinforcement learning for the max-mean dispersion problem. J. Ind. Manag. Optim. 2020, 17, 3223–3246. [Google Scholar] [CrossRef]
  47. Reyes-Rubiano, L.; Juan, A.; Bayliss, C.; Panadero, J.; Faulin, J.; Copado, P. A biased-randomized learnheuristic for solving the team orienteering problem with dynamic rewards. Transp. Res. Procedia 2020, 47, 680–687. [Google Scholar] [CrossRef]
  48. Kusy, M.; Zajdel, R. Stateless Q-learning algorithm for training of radial basis function based neural networks in medical data classification. In Intelligent Systems in Technical and Medical Diagnostics; Springer: Berlin/Heidelberg, Germany, 2014; pp. 267–278. [Google Scholar]
  49. Eiben, Á.E.; Hinterding, R.; Michalewicz, Z. Parameter control in evolutionary algorithms. IEEE Trans. Evol. Comput. 1999, 3, 124–141. [Google Scholar] [CrossRef]
  50. Rastegar, R. On the optimal convergence probability of univariate estimation of distribution algorithms. Evol. Comput. 2011, 19, 225–248. [Google Scholar] [CrossRef] [PubMed]
  51. Skakov, E.S.; Malysh, V.N. Parameter meta-optimization of metaheuristics of solving specific NP-hard facility location problem. J. Phys. Conf. Ser. 2018, 973, 012063. [Google Scholar] [CrossRef]
  52. López-Ibáñez, M.; Dubois-Lacoste, J.; Cáceres, L.P.; Birattari, M.; Stützle, T. The irace package: Iterated racing for automatic algorithm configuration. Oper. Res. Perspect. 2016, 3, 43–58. [Google Scholar] [CrossRef]
  53. Soto, R.; Crawford, B.; Olivares, R.; Galleguillos, C.; Castro, C.; Johnson, F.; Paredes, F.; Norero, E. Using autonomous search for solving constraint satisfaction problems via new modern approaches. Swarm Evol. Comput. 2016, 30, 64–77. [Google Scholar] [CrossRef]
  54. Soto, R.; Crawford, B.; Olivares, R.; Niklander, S.; Johnson, F.; Paredes, F.; Olguín, E. Online control of enumeration strategies via bat algorithm and black hole optimization. Nat. Comput. 2017, 16, 241–257. [Google Scholar] [CrossRef]
  55. Kaelbling, L.P.; Littman, M.L.; Moore, A.W. Reinforcement learning: A survey. J. Artif. Intell. Res. 1996, 4, 237–285. [Google Scholar] [CrossRef]
  56. Huotari, T.; Savolainen, J.; Collan, M. Deep Reinforcement Learning Agent for S&P 500 Stock Selection. Axioms 2020, 9, 130. [Google Scholar] [CrossRef]
  57. Van Otterlo, M.; Wiering, M. Reinforcement learning and markov decision processes. In Reinforcement Learning: State-of-the-Art; Springer: Berlin/Heidelberg, Germany, 2012; pp. 3–42. [Google Scholar]
  58. Imran, M.; Khushnood, R.A.; Fawad, M. A hybrid data-driven and metaheuristic optimization approach for the compressive strength prediction of high-performance concrete. Case Stud. Constr. Mater. 2023, 18, e01890. [Google Scholar] [CrossRef]
  59. Watkins, C.J.C.H.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  60. Zhang, L.; Tang, L.; Zhang, S.; Wang, Z.; Shen, X.; Zhang, Z. A Self-Adaptive Reinforcement-Exploration Q-Learning Algorithm. Symmetry 2021, 13, 1057. [Google Scholar] [CrossRef]
  61. Melo, F.S.; Ribeiro, M.I. Convergence of Q-learning with linear function approximation. In Proceedings of the 2007 European Control Conference (ECC), Kos, Greece, 2–5 July 2007; pp. 2671–2678. [Google Scholar]
  62. Claus, C.; Boutilier, C. The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998, 1998, 2. [Google Scholar]
  63. McGlohon, M.; Sen, S. Learning to cooperate in multi-agent systems by combining Q-learning and evolutionary strategy. Int. J. Lateral Comput. 2005, 1, 58–64. [Google Scholar]
  64. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, Australia, 27 November–1 December 1995; Volume 4, pp. 1942–1948. [Google Scholar]
  65. Piotrowski, A.P.; Napiorkowski, J.J.; Piotrowska, A.E. Population size in Particle Swarm Optimization. Swarm Evol. Comput. 2020, 58, 100718. [Google Scholar] [CrossRef]
  66. Dammeyer, F.; Voß, S. Dynamic tabu list management using the reverse elimination method. Ann. Oper. Res. 1993, 41, 29–46. [Google Scholar] [CrossRef]
  67. Drexl, A. A simulated annealing approach to the multiconstraint zero-one knapsack problem. Computing 1988, 40, 1–8. [Google Scholar] [CrossRef]
  68. Khuri, S.; Bäck, T.; Heitkötter, J. The zero/one multiple knapsack problem and genetic algorithms. In Proceedings of the 1994 ACM Symposium on Applied Computing, New York, NY, USA, 6 April 1994; pp. 188–193. [Google Scholar]
  69. Crawford, B.; Soto, R.; Astorga, G.; García, J.; Castro, C.; Paredes, F. Putting Continuous Metaheuristics to Work in Binary Search Spaces. Complexity 2017, 2017, 8404231. [Google Scholar] [CrossRef]
  70. Eberhart, R.C.; Shi, Y. Comparison between genetic algorithms and particle swarm optimization. In Proceedings of the Evolutionary Programming VII: 7th International Conference, EP98, San Diego, CA, USA, 25–27 March 1998; Proceedings 7. Springer: Berlin/Heidelberg, Germany, 1998; pp. 611–616. [Google Scholar]
  71. Universidad de Valparaíso. Implementations. 2021. Available online: https://figshare.com/articles/dataset/PSOQLAV_Parameter_Test/14999874 (accessed on 27 June 2023).
  72. Universidad de Valparaíso. Test Instances. 2021. Available online: https://figshare.com/articles/dataset/Test_Instances/14999907 (accessed on 27 June 2023).
  73. Universidad de Valparaíso. Data and Results. 2021. Available online: https://figshare.com/articles/dataset/PSOQL_Test_Data/14995374 (accessed on 27 June 2023).
Figure 1. Scheme of the proposal integrating a learning-based approach to the swarm intelligence method.
Figure 1. Scheme of the proposal integrating a learning-based approach to the swarm intelligence method.
Axioms 12 00643 g001
Figure 2. Schema of the experimental phase applied to this work.
Figure 2. Schema of the experimental phase applied to this work.
Axioms 12 00643 g002
Figure 3. Overall performance comparison for instances MKP06, MKP35, MKP70 between NPSO and its improvements with classic, modified, and single state Q–Learning. Convergences of the strategies.
Figure 3. Overall performance comparison for instances MKP06, MKP35, MKP70 between NPSO and its improvements with classic, modified, and single state Q–Learning. Convergences of the strategies.
Axioms 12 00643 g003
Figure 4. Overall performance comparison for instances MKP06, MKP35, MKP70 between NPSO and its improvements with classic, modified, and single state Q–Learning. Distributions of the strategies.
Figure 4. Overall performance comparison for instances MKP06, MKP35, MKP70 between NPSO and its improvements with classic, modified, and single state Q–Learning. Distributions of the strategies.
Axioms 12 00643 g004
Table 1. Instances of the multidimensional knapsack problem.
Table 1. Instances of the multidimensional knapsack problem.
InstanceNameBestKnap.Obj.InstanceNameBestKnap.Obj.
MKP01-3800106MKP36WEISH19 [66,67]7698570
MKP02-8706.11010MKP37WEISH20 [66,67]9450570
MKP03-40151015MKP38WEISH21 [66,67]9074570
MKP04-61201020MKP39WEISH22 [66,67]8947580
MKP05-12,4001028MKP40WEISH23 [66,67]8344580
MKP06-10,618539MKP41WEISH24 [66,67]10,220580
MKP07-16,537550MKP42WEISH25 [66,67]9939580
MKP08SENTO1 [66,67,68]77723060MKP43WEISH26 [66,67]9584590
MKP09SENTO2 [66,67,68]87223060MKP44WEISH27 [66,67]9819590
MKP10WEING1 [66,67,68]141,278228MKP45WEISH28 [66,67]9492590
MKP11WEING2 [66,67,68]130,883228MKP46WEISH29 [66,67]9410590
MKP12WEING3 [66,67,68]95,677228MKP47WEISH30 [66,67]11,191590
MKP13WEING4 [66,67,68]119,337228MKP48PB1 [66,67]3090427
MKP14WEING5 [66,67,68]98,796228MKP49PB2 [66,67]3186434
MKP15WEING6 [66,67,68]130,623228MKP50PB4 [66,67]95,168229
MKP16WEING7 [66,67,68]1,095,4452105MKP51PB5 [66,67]21391020
MKP17WEING8 [66,67,68]624,3192105MKP52PB6 [66,67]7763040
MKP18WEISH01 [66,67]4554530MKP53PB7 [66,67]10353037
MKP19WEISH02 [66,67]4536530MKP54HP1 [66,67]3418428
MKP20WEISH03 [66,67]4115530MKP55HP2 [66,67]3186435
MKP21WEISH04 [66,67]4561530MKP56-unknown5100
MKP22WEISH05 [66,67]4514530MKP57-unknown5100
MKP23WEISH06 [66,67]5557540MKP58-unknown5100
MKP24WEISH07 [66,67]5567540MKP59-unknown5100
MKP25WEISH08 [66,67]5605540MKP60-unknown5100
MKP26WEISH09 [66,67]5246540MKP61-unknown5100
MKP27WEISH10 [66,67]6339550MKP62-unknown5100
MKP28WEISH11 [66,67]5643550MKP63-unknown5100
MKP29WEISH12 [66,67]6339550MKP64-unknown5100
MKP30WEISH13 [66,67]6159550MKP65-unknown5100
MKP31WEISH14 [66,67]6954560MKP66-unknown5100
MKP32WEISH15 [66,67]7486560MKP67-unknown5100
MKP33WEISH16 [66,67]7289560MKP68-unknown5100
MKP34WEISH17 [66,67]8633560MKP69-unknown5100
MKP35WEISH18 [66,67]9580570MKP70-unknown5100
Table 2. Summary of the best values achieved.
Table 2. Summary of the best values achieved.
ApproachBest Value Reached (Known)Best Value Reached (Unknown)
Native PSO (NPSO)25 (45.45%)0 (0%)25 (35.71%)
Classic Q–Learning (CQL)16 (29.09%)0 (0%)16 (22.85%)
Modified Q–Learning (MQL)46 (83.63%)6 (40%)53 (75.71%)
Single state Q–Learning (SSQL)46 (83.63%)11 (73.33%)57 (81.42%)
Table 3. Experimental results for the best, minimum, median, and maximum values obtained from Instances MKP35–MKP70.
Table 3. Experimental results for the best, minimum, median, and maximum values obtained from Instances MKP35–MKP70.
IDBest–ValuesMinimum–ValueMedian–ValueMaximum–Value
NPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQL
MKP0130303030380038003800380038003800380038003800380038003800
MKP02303030308706.18706.18706.18706.18706.18706.18706.18706.18706.18706.18706.18706.1
MKP0330303030401540154015401540154015401540154015401540154015
MKP0430303030612061206120612061206120612061206120612061206120
MKP053030303012,40012,40012,40012,40012,40012,40012,40012,40012,40012,40012,40012,400
MKP06001110,52610,47210,48110,61810,58410,52310,58510,58510,58810,58510,61810,618
MKP07000016,34016,22316,37416,51816,40216,31616,44216,44116,47116,44516,49416,518
MKP08111518769276607707777277197692776577727772777277727772
MKP090011866186258679872286768651.5870287028709870387228722
MKP102543030141,258141,148141,278141,278141,278141,168141,278141,278141,278141,278141,278141,278
MKP111343030130,773130,103130,883130,883130,773130,748130,883130,883130,883130,883130,883130,883
MKP1272302195,00795,00795,67795,67795,51795,00795,67795,67795,67795,67795,67795,677
MKP1330303030119,337119,337119,337119,337119,337119,337119,337119,337119,337119,337119,337119,337
MKP140016798,39698,39698,50698,79698,41698,39698,79698,50698,63198,49598,79698,796
MKP152461923130,163130,103130,233130,623130,623130,123130,623130,623130,623130,623130,623130,623
MKP1600001,054,3041,053,0331,090,9451,095,2321,062,6461,066,3721,093,8421,093,9361,073,2251,089,6911,095,2061,095,232
MKP170000617,055612,176618,360623,952618,875615,860621,239623,092621,086621,086623,952623,952
MKP1830273030455445494554455445544554455445544554455445544554
MKP1930302930453645364531453645364536453645364536453645364536
MKP2040107410640564106411541064106410641064115410641154115
MKP2130303030456145614561456145614561456145614561456145614561
MKP2230303030451445144514451445144514451445144514451445144514
MKP233015135499543255235557551754805550.555425557551555575557
MKP24001314550254345517556755265479.5554955465550555055675567
MKP250012754995427555056055556.55507.55597.555925592554256055605
MKP26002729519451335192524651945160524652465210519452465246
MKP27002424624762166260633963016247633963396323630163396339
MKP28002428555555075562564355555555564356435605556256435643
MKP29102527624761756304633963016216633963396339630163396339
MKP30102828602859746083615960725991615961596159612261596159
MKP310020683567066870692368506807690268856850685069546923
MKP321051073917283742174867412.57376742474427486740874867486
MKP3300017130706072107289718071177252.572467187718572887289
MKP34007184668366856786338499.58445.58610.58616.58591857586338633
MKP35000192399150939995809287.59212948595159349938595739580
Table 4. Experimental results for the best, minimum, median, and maximum values obtained from Instances MKP35–MKP70.
Table 4. Experimental results for the best, minimum, median, and maximum values obtained from Instances MKP35–MKP70.
IDBest–ValuesMinimum–ValueMedian–ValueMaximum–Value
NPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQL
MKP3600217403712573727698747572837557.575557586746576987698
MKP3700126929391799383945093449283944594459410940594509450
MKP38002014889487868966907489598883907490489048893690749074
MKP390024857083388722894786618519.58827.588338777871389478947
MKP4000007902763279728341800678398124.581528137810583418341
MKP4100119761964910,06310,22098389747.510,13010,1389936996110,22010,220
MKP42005196379543984399399730962398949890.59825974399399939
MKP430000914889239253952292249054.5946494649354924495279522
MKP440000926891099568976493999248966696599562950997649764
MKP4500108998873191109433909088989276.59286.59272912594929433
MKP4600008898867090809310900687919208.592099178901693109310
MKP47001210,80110,67111,05711,19110,90510,77811,14311,14611,04010,98211,19111,191
MKP481013304230323045309030763056306030763090307730903090
MKP490024311130633089318631503099.53141.53167.53169317331863186
MKP5020242491,93591,93591,93595,16892,73893,65095,16895,16895,16894,80195,16895,168
MKP5130252930213920962122213921392139213921392139213921392139
MKP521702022732702723776776723776776776762776776
MKP530077101210041009103510291017102310331033103310351035
MKP541043338033473339341834043383339634043418340434183418
MKP550031309430373082318631313086.531403147.53159314431863186
MKP56003123,63923,48423,79324,21123,83723,64024,01523,97724,10923,88124,21124,211
MKP57001124,08923,90623,99524,27424,15324,05424,12424,18224,25824,18224,27424,274
MKP58000123,34423,19623,33223,55123,49423,35023,49423,49423,49423,49423,49423,551
MKP59000022,82522,67822,75723,22723,00722,84223,01423,06123,14023,09223,15323,227
MKP60000123,72223,54723,65923,94723,81723,69523,81723,84323,93923,93923,93923,947
MKP61000124,28924,18324,30424,60124,41324,27124,41124,41724,47724,47424,55524,601
MKP62001125,10024,90425,09125,52125,28825,09925,27425,32425,42025,49025,52125,521
MKP63000123,01022,81423,03023,32023,15923,00123,17723,23023,28523,27923,30523,320
MKP64001023,73023,51223,68724,10923,89823,73924,04124,04524,09123,99124,13524,109
MKP65002224,13824,05724,08824,34224,20624,14024,22524,22024,28724,26424,34224,342
MKP66001141,76741,58442,13042,65642,06741,87442,40442,39142,24342,20442,65642,656
MKP67000041,56441,29341,87442,38841,78841,60942,21142,21842,02642,00242,36442,388
MKP68000140,92740,68041,26041,93441,12440,96841,59141,60841,59541,35841,84441,934
MKP69001044,04243,68244,36444,90544,21244,01944,73544,78844,63144,50045,04744,905
MKP70000141,48741,27541,73142,19241,58641,55441,98341,99241,88941,86042,12342,192
Table 5. Experimental results for the average, standard deviation, interquartile range, and RPD values obtained from Instances MKP01–MKP35.
Table 5. Experimental results for the average, standard deviation, interquartile range, and RPD values obtained from Instances MKP01–MKP35.
IDAverages–ValuesStandard Deviations –ValueInterquatil Ranges–ValueRPD–Value
NPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQL
MKP013800380038003800000038003800380038000.000.000.000.00
MKP028706.18706.18706.18706.100008706.18706.18706.18706.10.000.000.000.00
MKP034015401540154015000040154015401540150.000.000.000.00
MKP046120612061206120000061206120612061200.000.000.000.00
MKP0512,40012,40012,40012,400000012,40012,40012,40012,4000.000.000.000.00
MKP0610,57410,52910,57910,58116.64262316.4310,58510,54210,58810,5880.000.000.000.00
MKP0716,40416,31416,44216,45026.6652.813834.2116,41716,35116,47416,4710.000.000.000.00
MKP087726.277017754.577757.41923.9522.842177397709777277720.000.000.000.00
MKP098678.5386568701870011.3618.2111118679.758664.758708.2587040.000.000.000.00
MKP10141,275141,188141,278141,2787.5852.7500141,278141,255141,278141,2780.000.000.000.00
MKP11130,821130,627130,883130,88355.44237.5600130,883130,773130,883130,8830.000.000.000.00
MKP1295,41495,09995,67795,662263.27213.7802495,62495,00795,67795,6770.000.000.000.00
MKP13119,337119,337119,337119,3370000119,337119,337119,337119,3370.000.000.000.00
MKP1498,44698,40198,70298,60369.7318.53109.14119.8898,50398,39698,79698,6310.000.000.000.00
MKP15130,541130,241130,480130,532166.51199.59191.15167.77130,623130,233130,623130,6230.000.000.000.00
MKP161,062,9501,066,1401,093,8721,093,90647717221.58897.41889.891,065,8271,069,5651,094,5611,094,6860.000.000.000.00
MKP17618,826616,378621,910622,3581325.812723.541724.741553.86619,863618,456623,749623,9470.000.000.000.00
MKP1845544553.54554455401.530045544554455445540.000.000.000.00
MKP19453645364535.834536000.91045364536453645360.000.000.000.00
MKP204107.24100.9341094108.13.1113.874.323.8741064106411541060.000.000.000.00
MKP214561456145614561000045614561456145610.000.000.000.00
MKP224514451445144514000045144514451445140.000.000.000.00
MKP235523.454785548.45540.2717.2121.99.4718.2355345493.75555755570.000.000.000.00
MKP245525.635484.235552.475553.7714.642913.9412.685537.255494556755670.000.000.000.00
MKP255556.835500.275594.635593.173127.7114.8415.265590.55517560556030.000.000.000.00
MKP265194.535166.735241.45244.872.9224.1214.246.2151945194524652460.000.000.000.00
MKP276296.336255.536331.536334.7317.22519.119.6363016263.75633963390.000.000.000.00
MKP285559.45547.2356365639.679.6918.3417.851555625555564356430.000.000.000.00
MKP296300.56242.136334.16335.7312.254211.21063016301633963390.000.000.000.00
MKP306077.46018.576155.236155.232448.3415.2215.2260726064.5615961590.000.000.000.00
MKP3168446793.236901.876892.87.4746.36179.8868506814690269020.000.000.000.00
MKP327413.27366.87439.237450.317.4633.7424.2127.8574137385744974860.000.000.000.00
MKP337177.27123.37255.13724512.3930.522319.6671807126.757272.57250.50.000.000.000.00
MKP348501.68452.638609.738613.2328.5952.6418.712.348512.258489.58621.7586210.000.000.000.00
MKP359293.89228.379489.679519.5328.6959.7648.6633.1693179262.759523.595500.000.000.000.00
Table 6. Experimental results for the average, standard deviation, interquartile range, and RPD values obtained from Instances MKP36–MKP70.
Table 6. Experimental results for the average, standard deviation, interquartile range, and RPD values obtained from Instances MKP36–MKP70.
IDAverages–ValuesStandard Deviations –ValueInterquatil Ranges–ValueRPD–Value
NPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQLNPSOCQLMQLSSQL
MKP367484.737284.537553.477551.8750.7286.1588.4273.187524.257330.57591.757598.50.000.000.000.00
MKP379351.59284.279441.679438.125.963.121516.249360.59341.75945094450.000.000.000.00
MKP388963.28869.5790569057.5347.8242.2231.272189728901.5907490740.000.000.000.00
MKP398665.98503.78842884054.190.4353.8665.188699.58563.758879.758873.750.000.000.000.00
MKP408014.678378138.278160.675910188.8292.88055.75787581908192.250.000.000.000.00
MKP4198479772.510,14010,13243.8382.8642.3244.489874.59816.510,17810,1630.000.000.000.00
MKP429738.839641.379901.59888.140.8759.2823.1918.119758.596949920.259895.50.000.000.000.00
MKP439223.3390599439.779451.7749.7386.6280.5945.129251.7591129499.7594680.000.000.000.00
MKP4494079256.279682.59660.1376.6896.8672.4672.119455.759320.7597649717.750.000.000.000.00
MKP459093.38895.19294.59291.2367.9492.9290.1373.59912289379331.2593390.000.000.000.00
MKP469006.48815.59192.739193.366.298.1468.7771.729049.2588719226.7592520.000.000.000.00
MKP4710,90310,79011,13611,1426372.4828.8424.1310,94710,81711,15011,1600.000.000.000.00
MKP483072.43052.53065.6330729.412.7711.159.6330763059.75307630760.000.000.000.00
MKP4931483103.333135.833159.8313.125.6926.55203158.753115.753147.753171.750.000.000.000.00
MKP5093,27393,36394,71795,0261397.171103.441112.71588.8994,80193,97595,16895,1680.000.000.000.00
MKP5121392135.32138.43213909.463.1021392139213921390.000.000.000.00
MKP52767728.93767.73772.2712.5312.8415.46.37767327767760.000.000.000.00
MKP531027.71018.91024.710295.437.788.176.5110331025.25103310340.000.000.000.00
MKP543398.23381.233394.933402.179.2912.1416.258.4934043387.5340434040.000.000.000.00
MKP553130.473088.533134.573142.6719.4426.4129.5121.913146.2531073150.753155.250.000.000.000.00
MKP5623,86623,65024,02923,969104.24114.85112114.4523,89723,71424,11124,0210.000.000.000.00
MKP5724,15324,05424,14024,17342.8766.4958.4554.624,18224,08924,18224,1840.000.000.000.00
MKP5823,46923,35423,46723,48542.7374.4647.6629.9523,49423,37423,49423,4940.000.000.000.00
MKP5922,99322,85123,00023,06078.95114104.4383.4523,03522,89923,06923,0730.000.000.000.00
MKP6023,83223,69723,80423,85555.286.376662.9223,85523,73123,83223,9090.000.000.000.00
MKP6124,40724,29024,42424,41641.371.1964.8168.1424,42224,34024,47324,4410.000.000.000.00
MKP6225,27625,09925,26725,34695.4812496.1677.925,32425,14625,32425,4060.000.000.000.00
MKP6323,15322,99823,18123,20663.5120.8475.7384.7423,18723,07023,21923,2780.000.000.000.00
MKP6423,90123,73424,00324,024110.3116.57114.4269.9123,98323,80324,08424,0840.000.000.000.00
MKP6524,21124,14424,23324,23938.7854.18644424,22524,18424,26424,2640.000.000.000.00
MKP6642,04641,89642,40242,397133.29170.34131.5109.5142,15842,04242,50542,4520.000.000.000.00
MKP6741,79341,59642,19942,235110.21171.92127.9105.2541,82841,69142,29242,3100.000.000.000.00
MKP6841,15140,98141,59541,595156.22180131.67118.5541,25741,06641,68841,6630.000.000.000.00
MKP6944,24644,01544,71644,771136.13163.89144.8990.8144,30244,10844,80144,8350.000.000.000.00
MKP7041,62041,55741,96741,99897.55156104.1596.8841,66841,67442,02542,0580.000.000.000.00
Table 7. Test Wilcoxon–Mann–Whitney for PSO and its enhanced versions solving Instances MKP01–MKP35.
Table 7. Test Wilcoxon–Mann–Whitney for PSO and its enhanced versions solving Instances MKP01–MKP35.
IDNPSO vs. CQLNPSO vs. MQLNPSO vs. SSQLMQL vs. SSQLCQL vs. MQLCQL vs. SSQL
NPSOCQLNPSOMQLNPSOSSQLMQLSSQLCQLMQLCQLSSQL
MKP010.50.50.50.50.50.50.50.50.50.50.50.5
MKP020.50.50.50.50.50.50.50.50.50.50.50.5
MKP030.50.50.50.50.50.50.50.50.50.50.50.5
MKP040.50.50.50.50.50.50.50.50.50.50.50.5
MKP050.50.50.50.50.50.50.50.50.50.50.50.5
MKP06010.980.020.980.020.490.511010
MKP070110100.770.231010
MKP080110100.780.221010
MKP090110100.410.591010
MKP10010.990.010.990.010.50.51010
MKP110110100.50.51010
MKP12011010011010
MKP130.50.50.50.50.50.50.50.50.50.50.50.5
MKP14011010011010
MKP15010.140.860.460.540.870.131010
MKP160.970.0310100.540.461010
MKP170110100.860.141010
MKP180.040.960.50.50.50.50.50.50.960.040.960.04
MKP190.50.50.160.840.50.50.840.160.160.840.50.5
MKP20010.970.030.840.160.20.81010
MKP210.50.50.50.50.50.50.50.50.50.50.50.5
MKP220.50.50.50.50.50.50.50.50.50.50.50.5
MKP230110100.070.931010
MKP240110100.710.291010
MKP250110100.160.841010
MKP260110100.860.141010
MKP270110100.550.451010
MKP280110100.930.071010
MKP290110100.750.251010
MKP300110100.50.51010
MKP31011010011010
MKP320110100.890.111010
MKP330110100.040.961010
MKP340110100.770.231010
MKP350110100.990.011010
Table 8. Test Wilcoxon–Mann–Whitney for PSO and its enhanced versions solving Instances MKP36–MKP70.
Table 8. Test Wilcoxon–Mann–Whitney for PSO and its enhanced versions solving Instances MKP36–MKP70.
IDNPSO vs. CQLNPSO vs. MQLNPSO vs. SSQLMQL vs. SSQLCQL vs. MQLCQL vs. SSQL
NPSOCQLNPSOMQLNPSOSSQLMQLSSQLCQLMQLCQLSSQL
MKP360110100.390.611010
MKP370110100.060.941010
MKP380110100.140.861010
MKP390110100.450.551010
MKP400110100.760.241010
MKP410110100.360.641010
MKP420110100.010.991010
MKP430110100.440.561010
MKP440110100.120.881010
MKP450110100.550.451010
MKP460110100.530.471010
MKP470110100.760.241010
MKP48010.010.990.420.580.990.011010
MKP49010.010.9910101010
MKP500.560.4410100.60.41010
MKP510.010.990.160.840.50.50.840.160.960.040.990.01
MKP52010.750.250.950.050.780.221010
MKP53010.080.920.930.070.970.031010
MKP54010.280.720.980.020.980.021010
MKP55010.620.380.990.010.90.11010
MKP560110100.010.991010
MKP57010.270.730.990.010.990.011010
MKP58010.50.50.920.080.930.071010
MKP59010.730.27100.970.031010
MKP60010.110.890.960.04101010
MKP61010.610.390.710.290.380.621010
MKP62010.320.6810101010
MKP63010.90.1100.910.091010
MKP640110100.540.461010
MKP65010.940.060.990.010.710.291010
MKP660110100.420.581010
MKP670110100.790.211010
MKP680110100.40.61010
MKP690110100.960.041010
MKP700.050.9510100.830.171010
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Olivares, R.; Soto, R.; Crawford, B.; Ríos, V.; Olivares, P.; Ravelo, C.; Medina, S.; Nauduan, D. A Learning—Based Particle Swarm Optimizer for Solving Mathematical Combinatorial Problems. Axioms 2023, 12, 643. https://doi.org/10.3390/axioms12070643

AMA Style

Olivares R, Soto R, Crawford B, Ríos V, Olivares P, Ravelo C, Medina S, Nauduan D. A Learning—Based Particle Swarm Optimizer for Solving Mathematical Combinatorial Problems. Axioms. 2023; 12(7):643. https://doi.org/10.3390/axioms12070643

Chicago/Turabian Style

Olivares, Rodrigo, Ricardo Soto, Broderick Crawford, Víctor Ríos, Pablo Olivares, Camilo Ravelo, Sebastian Medina, and Diego Nauduan. 2023. "A Learning—Based Particle Swarm Optimizer for Solving Mathematical Combinatorial Problems" Axioms 12, no. 7: 643. https://doi.org/10.3390/axioms12070643

APA Style

Olivares, R., Soto, R., Crawford, B., Ríos, V., Olivares, P., Ravelo, C., Medina, S., & Nauduan, D. (2023). A Learning—Based Particle Swarm Optimizer for Solving Mathematical Combinatorial Problems. Axioms, 12(7), 643. https://doi.org/10.3390/axioms12070643

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop