Next Article in Journal
Design and Demonstration of Hingeless Pneumatic Actuators Inspired by Plants
Previous Article in Journal
Heuristic Optimization Algorithm of Black-Winged Kite Fused with Osprey and Its Engineering Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization

1
School of Aeronautics and Astronautics, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China
2
School of Electronics and Communication Engineering, Shenzhen Campus of Sun Yat-sen University, Shenzhen 518000, China
*
Author to whom correspondence should be addressed.
Biomimetics 2024, 9(10), 596; https://doi.org/10.3390/biomimetics9100596
Submission received: 23 August 2024 / Revised: 28 September 2024 / Accepted: 29 September 2024 / Published: 1 October 2024

Abstract

:
The nutcracker optimizer algorithm (NOA) is a metaheuristic method proposed in recent years. This algorithm simulates the behavior of nutcrackers searching and storing food in nature to solve the optimization problem. However, the traditional NOA struggles to balance global exploration and local exploitation effectively, making it prone to getting trapped in local optima when solving complex problems. To address these shortcomings, this study proposes a reinforcement learning-based bi-population nutcracker optimizer algorithm called RLNOA. In the RLNOA, a bi-population mechanism is introduced to better balance global and local optimization capabilities. At the beginning of each iteration, the raw population is divided into an exploration sub-population and an exploitation sub-population based on the fitness value of each individual. The exploration sub-population is composed of individuals with poor fitness values. An improved foraging strategy based on random opposition-based learning is designed as the update method for the exploration sub-population to enhance diversity. Meanwhile, Q-learning serves as an adaptive selector for exploitation strategies, enabling optimal adjustment of the exploitation sub-population’s behavior across various problems. The performance of the RLNOA is evaluated using the CEC-2014, CEC-2017, and CEC-2020 benchmark function sets, and it is compared against nine state-of-the-art metaheuristic algorithms. Experimental results demonstrate the superior performance of the proposed algorithm.

Graphical Abstract

1. Introduction

Metaheuristic algorithms, as a class of optimization techniques, are specifically engineered to address complex optimization problems that are challenging or infeasible to solve using traditional methods [1,2]. These algorithms are inspired by natural phenomena, biological processes, physical systems, or social behaviors, offering flexible frameworks for identifying solutions within a reasonable time frame [3,4]. Their advantages, such as simple structure, ease of implementation, and robustness to initial values, have led to their widespread application across various fields, such as power system optimization [5,6], industrial design [7,8], path planning [9,10], and parameter optimization [11,12].
Metaheuristic algorithms rely on two fundamental concepts: exploration and exploitation [13]. These concepts are essential for effectively navigating the search space to identify optimal or near-optimal solutions for complex optimization problems [14]. Exploration involves the algorithm’s capability to explore the broader search space and uncover new regions that may harbor promising solutions [15]. Exploration aims to avoid local optima and ensure a broad investigation of various areas within the search space. In contrast, exploitation involves focusing the search on specific regions that have previously shown promise, with the goal of refining solutions and converging toward the optimal solution by thoroughly searching near high-quality solutions [16]. A significant challenge in the design of metaheuristic algorithms is achieving an appropriate balance between exploration and exploitation [17].
In recent years, numerous metaheuristic algorithms have been proposed, including the grey wolf optimizer (GWO) [18], snake optimizer (SO) [19], white shark optimizer (WSO) [20,21], reptile search algorithm (RSA) [22], crested porcupine optimizer (CPO) [23], and nutcracker optimizer algorithm (NOA) [24]. Among these, the NOA mimics the search, caching, and recovery behaviors of nutcrackers, incorporating two exploration strategies and two exploitation strategies that enhance its fast convergence and robust search capabilities. However, in NOA, the transition between search strategies is governed by random numbers. When applied to complex problems, the NOA encounters limitations, such as an inadequate balance between exploration and exploitation and a propensity to become trapped in local optima.
Several techniques have been adopted to improve the performance of metaheuristic algorithms. The local search method focuses on exploring the neighborhood of a solution to find improvements, enabling the metaheuristic algorithms to escape local optima and continue the search for a global optimum. The authors of ref. [25] proposed a novel local search strategy to improve the particle swarm optimization (PSO) algorithm. After optimizing the population in each iteration, a local search strategy is introduced to enhance the present individuals in the population to accelerate the searching process and prevent becoming trapped in local optima. To improve the population diversity and convergence ability, ref. [26] proposed a variant of GWO with the fusion of a stochastic local search technique, evolutionary operators, and a memory mechanism. The stochastic local search can check the neighborhood of each individual to promote GWO’s exploitation performance. The authors of ref. [27] presented a local search and chaos mapping-based binary group teaching optimization algorithm called BGTOALC. Local search was introduced to increase exploitation. The authors of ref. [28] proposed an oppositional chaotic local search strategy to improve the aquila optimizer. Local search techniques play a critical role in refining solutions within metaheuristic algorithms. However, their embedding may cause the optimizer to perform more exploitation operations during the iterative process. This could exacerbate the imbalance between exploitation and exploration in metaheuristic algorithms.
An elite mechanism is a technique used to preserve the best-performing individuals across iterations for metaheuristic algorithms. The authors of ref. [29] proposed an elite symbiotic organism search algorithm called Elite-SOS. The global convergence ability was enhanced by using the evolutionary information of elite individuals. The authors of ref. [30] built an elite gene pool to guide the reproduction operator and acquire superior offspring. To improve the optimization performance of PSO, ref. [31] built three types of elite archives to save elite individuals with different ranks. Elite individuals could be retained directly during the iteration process, which can make full use of the whole population’s information. The authors of ref. [32] introduced an elite-guided hierarchical mutation strategy to improve the performance of the differential evolution (DE) algorithm. Elite individuals were scheduled for a local search, and the remaining individuals performed a global search guided by the former. The elite mechanism speeds up convergence by ensuring the information of the best solutions persist across generations. However, by focusing on the best solutions, the algorithm might overly emphasize exploitation at the cost of exploration. This imbalance can result in the algorithm getting trapped in local optima.
Incorporating supervised learning into metaheuristic algorithms is an emerging area of research that uses training knowledge to assist in the acquisition of optimal solutions in the iterative process. The authors of ref. [33] proposed a kernelized autoencoder that can learn from past search experiences to speed up the optimization process. The authors of ref. [34] presented autoencoding to predict the moving of the optimal solutions. To solve the problems of parameter setting and strategy selection, ref. [35] proposed an adaptive distributed DE algorithm. The individual and population parameters were updated adaptively based on the best solutions and historically successful experience. The authors of ref. [36] introduced a learning-aided evolutionary optimization framework that learns knowledge from the historical optimization process by using artificial neural networks. The learned knowledge can help metaheuristic algorithms to better approach the global optimum. While supervised learning can guide the search process more effectively, the training phase requires additional computational resources and is not suitable for time-constrained problems. In addition, the generalization of supervised learning also limits the application scenarios of this kind of strategy.
Reinforcement learning (RL) is a subfield of machine learning in which an agent learns to make decisions by taking actions within an environment to maximize cumulative rewards [37,38]. Due to its strong environmental interaction capabilities, RL has been increasingly employed by researchers to guide the selection of search strategies in metaheuristic algorithms. The authors of ref. [39] introduced an inverse reinforcement learning-based moth-flame optimization algorithm, IRLMFO, to solve large-scale optimization problems. RL was utilized to select effective search strategies based on historical data from the strategy pool established by IRLMFO. To overcome the drawbacks of getting trapped in local optima easily, ref. [40] presented a reinforcement learning-based RSA known as RLNSA, where RL managed the switching between exploration and exploitation strategies. Additionally, refs. [41,42] applied RL to address mutation strategy selection within the evolutionary process of differential evolution algorithms. The authors of ref. [43] embedded RL in the teaching–learning-based optimization algorithm (RLTLBO) to solve optimization problems. The authors of ref. [44] proposed a reinforcement learning-based memetic particle swarm optimization algorithm called RLMPSO. The selection of five search operations is controlled by the RL algorithm. The authors of ref. [45] designed a reinforcement learning-based comprehensive learning grey wolf optimizer (RLCGWO) to adaptively adjust strategies. Although the introduction of RL can enable metaheuristic algorithms to adaptively select exploration and exploitation strategies, it does not always effectively enhance algorithm performance. Typically, exploration is enhanced through methods such as large step sizes, random perturbations, or probabilistic jumps, which enable the algorithm to search beyond the current solutions [46]. Consequently, in RL-based metaheuristic algorithms, exploration strategies often receive rewards mainly during the early optimization stages, leading RL to favor exploitation strategies as optimization progresses. This tendency can cause existing RL-based metaheuristic algorithms to struggle in escaping local optima, as exploration strategies are less frequently selected.
To overcome the aforementioned problems, this paper introduces an RL-based bi-population NOA called RLNOA. The RLNOA introduces a bi-population mechanism to better balance exploration and exploitation in the optimization process. At the beginning of each iteration, the population is divided into the exploration sub-populations and the exploitation sub-populations. Individuals with poor fitness in the raw population form the exploration sub-population. A random opposition-based learning (ROBL)-based foraging method is proposed as the update strategy for the exploration sub-population to avoid local optima. The remaining outstanding individuals of the raw population formed the exploitation sub-populations, which use Q-learning within RL to adaptively select between the NOA’s two exploitation strategies (storage and recovery) to accelerate convergence and improve generalization. The division of these sub-populations is based on fitness ranking and optimization progress. Experimental results show that the RLNOA achieves superior optimization performance compared to current state-of-the-art algorithms. The primary contributions of this paper are as follows:
  • An RL-based bi-population nutcracker optimizer algorithm (RLNOA) is developed to solve complex optimization problems;
  • The foraging strategy of the NOA is enhanced using ROBL, improving its ability to search for feasible solutions;
  • Q-learning is utilized to control the selection of the most appropriate exploitation strategy for each iteration, dynamically improving the refinement of the optimal solution.
The remainder of this paper is organized as follows. Section 2 describes the NOA and RL methods. Section 3 explains the detailed implementation of the proposed RLNOA. The comparison experiments are finished in Section 4. Section 5 summarizes the conclusions.

2. Preliminaries

2.1. Nutcracker Optimization Algorithm

The NOA is a metaheuristic algorithm inspired by the natural behavior of nutcrackers [24]. It solves optimization problems by simulating the nutcracker’s behavior in collecting, storing, and searching for food. The optimization process in the NOA is carried out through four strategies: foraging, storage, cache search, and recovery. Table A1 summarizes the nomenclature of this study.

2.1.1. Foraging and Storage Strategies

During the foraging phase, individuals start searching for potential food sources within the search space. This behavior is mathematically modeled as follows:
x i , j t + 1 F S n e w 1 = x i , j t   ,   i f   r a n d 1 < r a n d 2 x i , j t   ,   o t h e r w i s e
x i , j t = x m , j t + ε x A , j t x B , j t + μ r 1 2 U j L j   ,   i f   t < T max / 2 x C , j t + μ x A , j t x B , j t + μ r 2 < δ r 3 2 U j L j   ,   o t h e r w i s e
where x i , j t + 1 F S n e w 1 is the new position of the ith individual generated in the foraging phase; x i , j t is the jth dimension of the ith individual in the iteration t ; x m , j t is the mean position of the jth dimensions for the current population in the iteration t ; T max indicates the maximum generations; L j and U j are the lower and upper bounds of the optimization problem in the jth dimension; A, B, and C are three different integers randomly selected in the range of [0, NP]; NP is the population size; ε is a parameter generated by the levy flight; the values of r a n d 1 , r a n d 2 , r 1 , r 2 , and r 3 are random numbers selected within the range [0, 1]; δ is a control parameter; and μ is a parameter chosen among τ 1 (chosen randomly between zero and one), τ 2 (the normal distribution), and τ 3 (levy flight), as follows:
μ = τ 1   ,   i f   r a n d 1 < r a n d 2 τ 2   ,   i f   r a n d 2 < r a n d 3 τ 3   ,   i f   r a n d 1 < r a n d 3
where r a n d 1 , r a n d 2 , and r a n d 3 are random numbers selected within the range [0, 1].
At the storage phase, individuals store foods as follows:
x i t + 1 F S n e w 2 = x i t + μ x b e s t t x i t λ + r 1 x A t x B t ,   i f   r a n d 1 < r a n d 2 x b e s t t + μ x A t x B t   ,   i f   r a n d 1 < r a n d 3 x b e s t t ξ   ,   o t h e r w i s e
where x i t + 1 F S n e w 2 is the new position of the ith individual generated in the storage phase; λ is a parameter generated by the levy flight; and r 1 , r a n d 1 , r a n d 2 , and r a n d 3 are random numbers selected within the range [0, 1]. ξ is a parameter that linearly decreased from 1 to 0 during the optimization process.
The exchange between the foraging and storage strategies is used to balance exploration and exploitation phases as follows:
x i t + 1 F S = x i t + 1 F S n e w 1   ,   i f   r a n d 1 < P a 1 x i t + 1 F S n e w 2   ,   o t h e r w i s e
where r a n d 1 is random numbers selected within the range [0, 1] and P a 1 is a parameter that linearly decreased from 1 to 0 during the optimization process.

2.1.2. Cache Search and Recovery Strategies

At the cache search phase, individuals locate their caches through two reference points:
R P i , 1 t = x i t + α cos θ x A t x B t + β R P i , r a n d i 1 2 t   ,   i f   θ = π 2 x i t + α cos θ x A t x B t      ,   o t h e r w i s e
R P i , 2 t = x i t + α cos θ U L r 1 + L + α R P i , r a n d i 1 2 t ξ   ,   i f   θ = π 2 x i t + α cos θ U L r 1 + L ξ        ,   o t h e r w i s e
and
ξ = 1   ,   i f   r a n d 1 < P r p 0   ,   o t h e r w i s e
where θ is a parameter chosen in the range of [0, π ]; r a n d i 1 2 is an integer chosen randomly between zero and one; r 1 , r a n d 1 are random numbers selected within the range [0, 1]; P r p is a global exploration threshold; and β is a convergence parameter and can be acquired as follows:
β = 1 t T max 2 t T max ,   i f   r a n d 1 < r a n d 2 t T max 2 t   ,   o t h e r w i s e
where r a n d 1 and r a n d 2 are random numbers selected within the range [0, 1]. The new position of the individual during the cache search phase can be acquired as follows:
x i , j t + 1 C R n e w 1 = x i , j , 1 t   ,   i f   r a n d 1 < r a n d 2 x i , j , 2 t   ,   o t h e r w i s e
x i , j , 1 t + 1 = x i , j t   ,   i f   r a n d 1 < r a n d 2 x i , j t + r 1 x b e s t , j t x i , j t + r 2 R P i , j , 1 t x C , j t   ,   o t h e r w i s e
x i , j , 2 t + 1 = x i , j t   ,   i f   r a n d 1 < r a n d 2 x i , j t + r 1 x b e s t , j t x i , j t + r 2 R P i , j , 2 t x C , j t   ,   o t h e r w i s e
where R P i , j , 1 t and R P i , j , 2 t are the jth position of R P i , 1 t and R P i , 2 t , respectively; r a n d 1 , r a n d 2 , r 1 , and r 2 are random numbers selected within the range [0, 1]; and C is the index of a solution selected randomly from the population.
During the recovery phase, nutcrackers find the hidden caches and retrieve the buried pine seeds. The new position of a nutcracker is obtained using the following equation:
x i , j t + 1 C R n e w 2 = R P i , 1 t   ,   i f   f R P i , 1 t < f R P i , 2 t   a n d   f R P i , 1 t < f x i t   R P i , 1 t   ,   i f   f R P i , 2 t < f R P i , 1 t   a n d   f R P i , 2 t < f x i t x i t   ,   o t h e r w i s e  
Finally, the exchange between the cache search and recovery strategies is applied according to the following formula:
x i t + 1 C R = x i t + 1 C R n e w 1   ,   i f   r a n d 1 > P a 2 x i t + 1 C R n e w 2   ,   o t h e r w i s e
where r a n d 1 is random numbers selected within the range [0, 1] and P a 2 represents a probability value.

2.2. Reinforcement Learning

RL has been applied across various domains due to its effectiveness in problem-solving [47,48]. In RL, the agent interacts with the environment to learn how to perform optimal actions. As the representative method of RL, Q-learning defines the Q-table to control an agent’s actions. The Q-table is an m × n matrix, where m represents the number of states and n represents the number of actions available to the agent. By making a decision in the current state based on the Q-table values, the agent ultimately maximizes its reward. The Q-table is dynamically updated as follows:
Q s t + 1 , a t + 1 = Q s t , a t + α R t + 1 + γ max a Q s t + 1 , a Q s t , a t
where s t and s t + 1 represent the current and next states, respectively; a t and a t + 1 represent the current and next actions, respectively; R t + 1 is the reward acquired after performing action a t ; α is the learning rate; γ is the discount factor; and Q represents the corresponding value in the Q-table.

3. The Development of the Proposed Algorithm

3.1. Overview

The traditional NOA is limited to specific problems and is prone to getting trapped in local optima. To overcome these limitations, this paper introduces a hybrid strategy called the RLNOA. In each iteration of the RLNOA, the population is segmented into two groups based on fitness ranking: the exploration sub-population and the exploitation sub-population. The exploration sub-population consists of individuals with poor fitness in the current population. A ROBL-based foraging strategy is employed as the update strategy for individuals in the exploration sub-population, with a sine-based perturbation introduced to adjust the size of the exploration sub-population to ensure convergence. The remaining individuals form the exploitation sub-population, which implements two types of exploitation strategies: storage and recovery. The selection of the exploitation strategy is governed by Q-learning.
Furthermore, the RLNOA utilizes a single Q-table to map individual states to actions. States in the RLNOA are encoded by relative changes in fitness value and local diversity, while actions correspond to exploitation strategies. The exploitation sub-population updates based on the strategy selected by each individual, generating offspring. Rewards are assigned based on the selection process outcomes, and the Q-table is subsequently updated for the next generation. Figure 1 illustrates the flowchart of the RLNOA, with the main steps detailed in Algorithm 1.
Algorithm 1: The pseudocode of the RLNOA
Input :   Population   size   NP ,   Maximum   number   of   generations   T max Learning rate λ Discount factor γ , ,   P r p , δ
Output :   The   best   solution   x b e s t   and   corresponding   fitness   value   f x b e s t .
Set the initial Q-table: Q(s, a) = 0
Set t = 1
Initialize   population   position :   x i t , i = 1 ,   2 ,   ,   N P
Calculate   the   fitness   values   for   each   individual :   f x i t
Calculate   the   local   diversity   for   each   individual :   D i t
Set   D i t 1 = D i t ,   f x i t 1 = f x i t
While t < Tmax do
    Acquire sub-population by Equation (19)
     For   each   individual   x i t
        If   x i t belong to exploration sub-population
           Perform   ROBL - based   foraging   strategy   on   x i t by Equation (16)
       Else
          Determine the state of the exploitation sub-population by Equations (20) and (21)
          Choose the best a for the current s from Q-table
          Switch action
              Case 1: Storage
                   Perform   storage   strategy   on   x i t by Equation (4)
              Case 2: Recovery
                   Perform   recovery   strategy   on   x i t by Equation (13)
          End Switch
          Set the reward by Equation (25)
       End if
        Calculate   the   fitness   f x i t   of   x i t
        Update   the   position   of   x i t if its fitness is improved
    End for
    Calculate the relative changes of fitness and local diversity for the population
    Update Q-table by the exploitation sub-population
    t = t + 1
End While
Return results
Terminate

3.2. ROBL-Based Foraging Strategy

In the RLNOA, based on the ROBL method, an improved foraging strategy is introduced to construct the exploration behavior for the exploration sub-population [18]. The offspring in the exploration sub-population can be generated as follows:
x i , j t + 1 n e w 1 = x i , j t   ,   i f   t < T max / 2 x i , j t   ,   o t h e r w i s e
x i , j t = x m , j t + ε x A , j t x B , j t + μ r 1 2 U j L j ,   i f   r a n d 1 < r a n d 2 L j + U j r 1 x i . j t   ,   o t h e r w i s e
x i , j t = x C , j t + μ x A , j t x B , j t + μ r 2 < δ r 3 2 U j L j ,   i f   r a n d 1 < r a n d 2 L j + U j r 2 x i . j t   ,   o t h e r w i s e
where x i , j t + 1 n e w 1 is the new position of the ith individual generated in the foraging phase; x i , j t is the jth dimension of the ith individual in the iteration t ; x m , j t is the mean position of the jth dimensions for the current population in the iteration t ; T max indicates the maximum generations; L j and U j are the lower and upper bounds of the optimization problem in the jth dimension; A, B, and C are three different integers randomly selected in the range of [0, NP]; NP is the population size; ε is a parameter generated by the levy flight; r a n d 1 , r a n d 2 , r 1 , r 2 , and r 3 are random numbers selected within the range [0, 1]; δ is a control parameter; and μ is a parameter generated based on Equation (3). The size of the exploration sub-population can be calculated as follows:
N e x p l o r a t i o n = r o u n d N p 2 1 sin π 2 t T max ζ
where ζ is the control parameter. At the beginning of each iteration, N e x p l o r a t i o n individuals with poor fitness values are chosen from the total population to form the exploration sub-population. For example, assuming that T max = 10 , N P = 20 and ζ = 8 . The variation of N e x p l o r a t i o n is illustrated in Figure 2. To ensure population diversity and prevent premature convergence, the value of N e x p l o r a t i o n decreases slowly in the early stages of the optimization process. In the later stages, N e x p l o r a t i o n decays rapidly, allowing most individuals to focus on local exploitation.

3.3. Q-Learning-Based Exploitation Behavior

To ensure dynamic optimization of benefits at different stages for solving the optimization problem, Q-learning is employed as the selector for the exploitation sub-population to control the switch between storage (Equation (4)) and recovery (Equation (13)) strategies. The settings for Q-learning are specified as follows:

3.3.1. State Encoding

The state of each individual is encoded as the relative changes of local diversity and fitness values, which are defined as follows:
l d i t = D i t / D i t 1
f i t i t = f x i t / f x i t 1
where l d i t is the relative changes in local diversity; f i t i t is the relative changes in fitness value; and D i t represents the local diversity of individual x i t and can be calculated as follows:
D i t = 1 k l = 1 k j = 1 d x l , j t x ¯ l , j t 2
x ¯ l , j t = 1 k l = 1 k x l , j t
where x l t l = 1 k is the neighborhood set of x i t ; d is the dimension of the search space; and k is the number of near neighbors. As shown in Figure 3, the exploitation population consists of 32 states in total. Among these, the dimension of l d is divided into eight states: [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1.0), [1, 1.5), [1.5, 2), [2, 3), and [3, + ). Moreover, the dimension of f i t t is divided into four states: [0, 0.25), [0.25, 0.5), [0.5, 0.75), and [0.75, 1.0].

3.3.2. Action Options

The action of each individual is encoded as the selection of exploitation strategies. The probability of selecting each action is computed using the SoftMax function:
π q s t , a j = exp Q t s t , a j j = 1 n exp Q t s t , a j
where Q t s t , a j is the value in the Q-table in the iteration t ; a j is the jth action; and n is the total number of actions. Each individual sample has an optimization strategy based on the probability of each action.

3.3.3. Reward Options

The reward is determined based on the selection results of each generation, reflecting the performance of the current optimization strategy. If the fitness value of the new position x i t + 1 is better than the old position x i t , the individual is rewarded with 1. Otherwise, the individual is punished with −1. The reward settings are defined as follows:
R = 1   ,   i f   f x i t + 1 < f x i t 1   ,   o t h e r w i s e
Based on the above settings, the Q-table can be updated by Equation (15). The pseudocode of the RLNOA is shown in Algorithm 1.

3.4. Time Complexity

As can be seen from the pseudocode of the RLNOA in Algorithm 1, the proposed algorithm mainly consists of the following parts.
(1) Initializing the population and updating the fitness values and local diversity for each individual, with a time complexity of O N P × d .
(2) Acquiring the sub-populations, with a time complexity of O T max × N P .
(3) Updating the position for the exploration sub-population and exploitation sub-population, with a time complexity of O T max × N P × d .
(4) Calculating the relative changes in fitness and local diversity for the populations, with a time complexity of O T max × N P × d .
(5) Updating the Q-table using the exploitation sub-population, with a time complexity of O T max × N P N e x p l o r a t i o n × d .
Therefore, the maximum computing complexity of the RLNOA is O T max × N P × d , which is the same as that of the NOA.

4. Experimental Results

In this section, we perform a series of experiments on publicly available benchmark problems to assess the effectiveness of the RLNOA. The results are compared and analyzed against other state-of-the-art methods that have shown promising performances in the literature.

4.1. Test Conditions

The performance of the proposed RLNOA was tested on three global optimization test suites, including CEC-2014, CEC-2017, and CEC-2020 [49]. These test suites consist of unimodal, multimodal, hybrid, and composition functions, each with only one global optimum. The proposed RLNOA was compared with the NOA [24], SO [19], RSA [22], crested porcupine optimizer (CPO) [23], GWO [10], PSO [3], RLTLBO [43], RLMPSO [44], and RLCGWO [45]. The PSO and GWO are classical algorithms, while the NOA, SO, RSA, and CPO are well-known and recently proposed algorithms. The RLTLBO, RLMPSO, and RLCGWO are RL-based and recently proposed algorithms.
The common parameters for the experimental algorithms are presented in Table 1. The maximum number of iterations is set to 1000. To assess the experimental results, several performance metrics are used, including the average (Ave) and standard deviation (Std) of fitness values from 30 independent runs, and a ranking metric to assess the order of each method according to its average fitness value. Additionally, to highlight the significant differences between the RLNOA’s results and those of competing algorithms, convergence curves and box plots are utilized. The experiments were implemented in MATLAB R2024a on a device with Intel(R) Core(TM) i7-14700KF CPU @ 3.40 GHz and 64 GB RAM.

4.2. Comparison over CEC-2014

We performed optimization experiments on the CEC-2014 test suite to verify the effectiveness of the proposed RLNOA. The CEC-2014 test suite is a diverse collection of 30 test functions, including three unimodal functions, 13 multimodal functions, six hybrid functions, and eight composite functions. Each category is designed to test different aspects of an optimization algorithm’s capability, such as its ability to handle multiple local optima, locate the global optimum, and efficiently explore and exploit the search space. These functions are characterized by various levels of difficulty, dimensionality, and complexity, making them comprehensive tools for assessing the robustness, efficiency, and accuracy of optimization algorithms.
Table 2 shows the experimental results of various algorithms applied to the CEC-2014 test suite. As shown in the table, except for F3 and F30, the proposed RLNOA ranks first among all comparative algorithms in the remaining functions. Additionally, the second-to-last row in Table 2 confirms that the RLNOA achieves the highest average ranking, with a value of 1.0345. The second highest is the NOA, with a value of 2.1724, while the SO algorithm performs the worst.
Figure 4 illustrates the convergence curves of different algorithms applied to the CEC-2014 test suite. It can be seen from the figure that the convergence speed of the RLNOA is generally not the fastest. This is primarily because RL allows the population to learn the best exploitation strategy in the current state during the early stages of the search. As a result, the convergence speed of the RLNOA in the early stages is not the best, but its overall convergence performance remains competitive compared to the other algorithms. Moreover, due to RL combined with a dynamic exploration mechanism, the RLNOA can avoid local optima. Particularly for functions F5, F6, F8, F9, F10, F11, and F16, the RLNOA achieves better convergence results. Figure 5 shows the box plots of different algorithms applied to the CEC-2014 test suite, where the RLNOA achieves superior results.

4.3. Comparison over CEC-2017

In this experiment, the ability of the RLNOA to solve the optimization problems within the CEC-2017 test suite is evaluated, with the results presented in Table 3. The CEC-2017 benchmark suite consists of 30 test functions, categorized into unimodal functions, basic multimodal functions, expanded multimodal functions, hybrid functions, and composition functions. These categories encompass a wide range of optimization challenges, assessing the ability of algorithms to locate global optima, avoid local optima, and effectively explore the search space. It is also noted that the CEC-2017-F2 function was excluded from the test suite due to its unstable behavior.
As shown in Table 3, the proposed RLNOA obtained the most optimal values, ranking first overall. For the five functions where the RLNOA did not achieve the optimal value, it ranked second. The RSA performed the worst when applied to the CEC-2017 test suite, ranking last. Figure 6 illustrates the convergence curves of the different algorithms applied to the CEC-2017 test suite. It can be seen that the RLNOA converges faster than other methods for functions such as F16, F20, and F24. In most cases, its convergence speed surpasses that of algorithms such as the SO and RSA. Figure 7 presents the box plots of different algorithms on the CEC-2017 test suite, indicating that the RLNOA consistently achieved superior results.

4.4. Comparison over CEC-2020

To verify the effectiveness of the proposed algorithm for problems with enhanced complexity and realism, we performed optimization experiments on the CEC-2020 test suite. This suite comprises 10 benchmark functions and places a greater emphasis on dynamic and noisy functions, reflecting the evolving nature of real-world problems. Such a focus allows for a comprehensive evaluation of an algorithm’s performance under more variable and unpredictable conditions.
Table 4 presents the results of different algorithms applied to the CEC-2020 test suite. It can be seen that the RLNOA ranks first in eight out of the 10 functions. For functions F4 and F9, the RLNOA ranks second, but the gap between its results and the first-ranked results is minimal. The average ranking and final ranking of each algorithm across all test functions are shown in the last two rows of Table 4. The RLNOA performs the best, with an average ranking of 1.2222 and a final ranking of 1. Figure 8 and Figure 9, respectively, show the convergence curves and box plots of different algorithms applied to the CEC-2020 test suite. These figures confirm that the RLNOA consistently demonstrates superior performance across all benchmark functions.

4.5. Analysis of the Q-Table

We take test function F1 in CEC2014 as an example to illustrate the Q-table update process. As shown in Figure 10, the state s 1 , s 2 , , s 32 represents different stages of the relative changes in local diversity and fitness values. Actions a 1 and a 2 represent the storage strategy and recovery strategy, respectively. The Q-table is initialized as a zero matrix. After five iterations, the Q-values for different states in the Q-table change. The individual will execute the action corresponding to the highest Q-value in the current state. For example, the Q-value of the storage strategy (action a 1 ) in state s 26 is −0.93, which is much lower than the Q-value of the recovery strategy (action a 2 ), which is 1.32. When the individual is in state s 26 during the next iteration, it will execute the recovery strategy. In the later stages of the optimization process, when the population has converged close to the global optimum, selecting any exploitation strategy is unlikely to yield better results. As a result, almost all Q-values in the Q-table become negative.

4.6. Analysis of the RLNOA’s Parameters

The RLNOA contains some parameters that affect its performance, similar to other metaheuristic algorithms. Here are some suggestions for selecting and tuning these parameters.
(1) Population size NP
A larger population size generally allows for better exploration of the solution space, as more diverse solutions are maintained. However, increasing the population size typically leads to higher computational costs, as more solutions need to be evaluated during each iteration. This can slow down the algorithm, especially for complex or large-scale problems. Therefore, population sizes often range from a few dozen to several hundred individuals, depending on the problem’s complexity and the specific metaheuristic used. Reasonable values are within [20, 200].
(2) Maximum iterations T max
More iterations allow the algorithm to refine and improve solutions gradually. However, there may be diminishing returns after a certain number of iterations, where further improvements become minimal. Reasonable values are within [100, 2000].
(3) Learning rate α
This parameter determines how much new information overrides old information. A high learning rate means the agent learns quickly, but it may also make the learning process unstable. A low learning rate ensures stability but can slow down the learning process. Reasonable values are within [0.1, 0.9].
(4) Discount factor γ
This factor determines the importance of future rewards. A value close to 0 makes the agent prioritize immediate rewards, while a value close to 1 encourages the agent to consider long-term rewards. The discount factor helps balance immediate versus future gains, influencing the agent’s overall strategy. Reasonable values are within [0.1, 0.9].
We also studied the sensitivity of the RLNOA’s partial parameters. This analysis helps determine the impact of small changes in these parameters on the performance of the proposed algorithm. The functions used in these experiments are F1, F4, F17, and F23 from CEC-2014, which represent unimodal, multimodal, hybrid, and composite functions, respectively. The maximum number of iterations is set to 1000. The population size is set to 100. The average (Ave) of fitness values from 30 independent runs is used to acquire the sensitivity analysis results. Experimental results are as follows:
  • Global exploration threshold P r p : to verify the effect of P r p on the efficiency of the RLNOA, experiments are performed for several values of P r p , taken as 0.2, 0.4, 0.6, and 0.8, while other parameters are unchanged. As shown in Table 5, the RLNOA is insensitive to this parameter. The results of F17 indicate that the RLNOA performs best when P r p = 0.2 is set to a specific value.
  • Control parameter δ : experiments are performed for several values of δ , taken as 0.05, 0.1, 0.2, and 0.5, while other parameters are unchanged. As shown in Table 6, the RLNOA is not sensitive to small changes in the parameter δ .
  • Number of near neighbors k : Table 7 shows the results of the RLNOA with different values for the parameter k . It is evident from Table 7 that the RLNOA is not sensitive to small changes in the parameter k .
  • Control parameter ζ : to explore the sensitivity of the RLNOA to the parameter ζ , experiments are caried out for different values of ζ , as shown in Table 8. It is apparent that the RLNOA is sensitive to ζ . This is primarily because ζ controls the variation trend of the exploration sub-population during the optimization process. Table 8 also shows that the RLNOA acquires the best results when the value of ζ is set to 1.

5. Conclusions

In this paper, we propose an RL-based bi-population nutcracker optimizer algorithm. We developed a bi-population mechanism that uses fitness ranking to separate the raw population into exploration and exploitation sub-populations at the start of each iteration. The exploration sub-population, comprising individuals with lower fitness, employs a foraging strategy based on ROBL to maintain diversity. The exploitation sub-population includes two strategies, storage and recovery, with the selection of strategy controlled by Q-learning in RL. Experiments were conducted on the CEC-2014, CEC-2017, and CEC-2020 benchmark suites. The results, including optimization performance, convergence curves, and box plots, demonstrate that the proposed algorithm outperforms nine other comparative algorithms.
However, the proposed algorithm still has limitations. First, the exploration strategy needs to be further enhanced to improve the algorithm’s ability to escape local optima. Second, there is some redundancy in boundary states within the Q-learning process. In the future, based on the existing RLNOA framework, the development of search strategies and the encoding of states will be further studied to improve optimization performance. Additionally, complex engineering applications will be introduced to test future work.

Author Contributions

Conceptualization, Y.L. and Y.Z.; methodology, Y.L.; software, Y.L.; validation, Y.L. and Y.Z.; formal analysis, Y.L.; investigation, Y.L.; resources, Y.L.; data curation, Y.L.; writing—original draft preparation, Y.L.; writing—review and editing, Y.L. and Y.Z.; visualization, Y.L.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. The nomenclature of this study.
Table A1. The nomenclature of this study.
Indices A, B, and CIntegers randomly selected in the range of [0, NP]
i Index of individuals in the population L j The lower bound of the optimization problem in the jth dimension
j Index of dimensionsNPThe population size
l Index of individuals in the neighborhood set P a 2 A probability value
t Index of iteration P r p A global exploration threshold
Sets T max The maximum iterations
x i t i = 1 N P The population in the iteration t U j The upper bound of the optimization problem in the jth dimension
x l t l = 1 k The neighborhood set of x i t Variables
Parameters a t and a t + 1 The current and next actions respectively
α The learning rate f i t i t The relative changes of fitness value
γ The discount factor l d i t The relative changes of local diversity
δ and ζ Control parameter s t and s t + 1 The current and next states respectively
ε , λ , and τ 3 A parameter generated by the levy flight x i , j t + 1 F S n e w 1 The new position of the ith individual generated in the foraging phase
θ A parameter chosen in the range of [0, π ] x i , j t The jth dimension of the ith individual in the iteration t
ξ and P a 1 A parameter that linearly decreased from 1 to 0 x i t + 1 F S n e w 2 The new position of the ith individual generated in the storage phase
τ 1 A random number selected within the range [0, 1] x m , j t The mean position of the jth dimensions for current population in the iteration t
τ 2 A random number generated based on a normal distribution D i t The local diversity of individual x i t
d The dimension of the search space N e x p l o r a t i o n The size of the exploration sub-population
k The number of near neighbors Q The corresponding value in the Q-table
r a n d and r Random numbers selected within the range [0, 1] R t + 1 The reward acquired after performing action a t
r a n d i 1 2 An integer chosen randomly between zero and one

References

  1. Hubálovsky, S.; Hubálovská, M.; Matousová, I. A New Hybrid Particle Swarm Optimization-Teaching-Learning-Based Optimization for Solving Optimization Problems. Biomimetics 2024, 9, 8. [Google Scholar] [CrossRef] [PubMed]
  2. Wang, R.T.; Zhang, S.S.; Zou, G.Y. An Improved Multi-Strategy Crayfish Optimization Algorithm for Solving Numerical Optimization Problems. Biomimetics 2024, 9, 361. [Google Scholar] [CrossRef] [PubMed]
  3. Pardo, X.C.; González, P.; Banga, J.R.; Doallo, R. Population based metaheuristics in Spark: Towards a general framework using PSO as a case study. Swarm Evol. Comput. 2024, 85, 101483. [Google Scholar] [CrossRef]
  4. Tatsis, V.A.; Parsopoulos, K.E. Reinforcement learning for enhanced online gradient-based parameter adaptation in metaheuristics. Swarm Evol. Comput. 2023, 83, 101371. [Google Scholar] [CrossRef]
  5. Wang, Y.; Xiong, G.J.; Xu, S.P.; Suganthan, P.N. Large-scale power system multi-area economic dispatch considering valve point effects with comprehensive learning differential evolution. Swarm Evol. Comput. 2024, 89, 101620. [Google Scholar] [CrossRef]
  6. Zhang, Z.; Zhang, H.F.; Tian, Y.Z.; Li, C.W.; Yue, D. Cooperative constrained multi-objective dual-population evolutionary algorithm for optimal dispatching of wind-power integrated power system. Swarm Evol. Comput. 2024, 87, 101525. [Google Scholar] [CrossRef]
  7. Feng, X.; Pan, A.Q.; Ren, Z.Y.; Hong, J.C.; Fan, Z.P.; Tong, Y.H. An adaptive dual-population based evolutionary algorithm for industrial cut tobacco drying system. Appl. Soft Comput. 2023, 144, 110446. [Google Scholar] [CrossRef]
  8. Luo, T.; Xie, J.P.; Zhang, B.T.; Zhang, Y.; Li, C.Q.; Zhou, J. An improved levy chaotic particle swarm optimization algorithm for energy-efficient cluster routing scheme in industrial wireless sensor networks. Expert Syst. Appl. 2024, 241, 122780. [Google Scholar] [CrossRef]
  9. Qu, C.Z.; Gai, W.D.; Zhang, J.; Zhong, M.Y. A novel hybrid grey wolf optimizer algorithm for unmanned aerial vehicle (UAV) path planning. Knowl.-Based Syst. 2020, 194, 105530. [Google Scholar] [CrossRef]
  10. Qu, C.Z.; Gai, W.D.; Zhong, M.Y.; Zhang, J. A novel reinforcement learning based grey wolf optimizer algorithm for unmanned aerial vehicles (UAVs) path planning. Appl. Soft Comput. 2020, 89, 106099. [Google Scholar] [CrossRef]
  11. Qu, C.; Zhang, Y.; Ma, F.; Huang, K. Parameter optimization for point clouds denoising based on no-reference quality assessment. Measurement 2023, 211, 112592. [Google Scholar] [CrossRef]
  12. Chauhan, D.; Yadav, A. An archive-based self-adaptive artificial electric field algorithm with orthogonal initialization for real-parameter optimization problems. Appl. Soft Comput. 2024, 150, 111109. [Google Scholar] [CrossRef]
  13. Li, G.Q.; Zhang, W.W.; Yue, C.T.; Wang, Y.R. Balancing exploration and exploitation in dynamic constrained multimodal multi-objective co-evolutionary algorithm. Swarm Evol. Comput. 2024, 89, 101652. [Google Scholar] [CrossRef]
  14. Ahadzadeh, B.; Abdar, M.; Safara, F.; Khosravi, A.; Menhaj, M.B.; Suganthan, P.N. SFE: A Simple, Fast, and Efficient Feature Selection Algorithm for High-Dimensional Data. IEEE Trans. Evol. Comput. 2023, 27, 1896–1911. [Google Scholar] [CrossRef]
  15. Fu, S.; Huang, H.; Ma, C.; Wei, J.; Li, Y.; Fu, Y. Improved dwarf mongoose optimization algorithm using novel nonlinear control and exploration strategies. Expert Syst. Appl. 2023, 233, 120904. [Google Scholar] [CrossRef]
  16. Li, J.; Li, G.; Wang, Z.; Cui, L. Differential evolution with an adaptive penalty coefficient mechanism and a search history exploitation mechanism. Expert Syst. Appl. 2023, 230, 120530. [Google Scholar] [CrossRef]
  17. Hu, C.; Zeng, S.; Li, C. A framework of global exploration and local exploitation using surrogates for expensive optimization. Knowl.-Based Syst. 2023, 280, 111018. [Google Scholar] [CrossRef]
  18. Chang, D.; Rao, C.; Xiao, X.; Hu, F.; Goh, M. Multiple strategies based Grey Wolf Optimizer for feature selection in performance evaluation of open-ended funds. Swarm Evol. Comput. 2024, 86, 101518. [Google Scholar] [CrossRef]
  19. Hashim, F.A.; Hussien, A.G. Snake Optimizer: A novel meta-heuristic optimization algorithm. Knowl.-Based Syst. 2022, 242, 108320. [Google Scholar] [CrossRef]
  20. Braik, M.; Hammouri, A.; Atwan, J.; Al-Betar, M.A.A.; Awadallah, M.A. White Shark Optimizer: A novel bio-inspired meta-heuristic algorithm for global optimization problems. Knowl.-Based Syst. 2022, 243, 108457. [Google Scholar] [CrossRef]
  21. Kumar, S.; Sharma, N.K.; Kumar, N. WSOmark: An adaptive dual-purpose color image watermarking using white shark optimizer and Levenberg-Marquardt BPNN. Expert Syst. Appl. 2023, 226, 120137. [Google Scholar] [CrossRef]
  22. Abualigah, L.; Abd Elaziz, M.; Sumari, P.; Geem, Z.W.; Gandomi, A.H. Reptile Search Algorithm (RSA): A nature-inspired meta-heuristic optimizer. Expert Syst. Appl. 2022, 191, 116158. [Google Scholar] [CrossRef]
  23. Abdel-Basset, M.; Mohamed, R.; Abouhawwash, M. Crested Porcupine Optimizer: A new nature-inspired metaheuristic. Knowl.-Based Syst. 2024, 284, 111257. [Google Scholar] [CrossRef]
  24. Abdel-Basset, M.; Mohamed, R.; Jameel, M.; Abouhawwash, M. Nutcracker optimizer: A novel nature-inspired metaheuristic algorithm for global optimization and engineering design problems. Knowl.-Based Syst. 2023, 262, 110248. [Google Scholar] [CrossRef]
  25. Qaraad, M.; Amjad, S.; Hussein, N.K.; Farag, M.A.; Mirjalili, S.; Elhosseini, M.A. Quadratic interpolation and a new local search approach to improve particle swarm optimization: Solar photovoltaic parameter estimation. Expert Syst. Appl. 2024, 236, 121417. [Google Scholar] [CrossRef]
  26. Ahmed, R.; Rangaiah, G.P.; Mahadzir, S.; Mirjalili, S.; Hassan, M.H.; Kamel, S. Memory, evolutionary operator, and local search based improved Grey Wolf Optimizer with linear population size reduction technique. Knowl.-Based Syst. 2023, 264, 110297. [Google Scholar] [CrossRef]
  27. Khosravi, H.; Amiri, B.; Yazdanjue, N.; Babaiyan, V. An improved group teaching optimization algorithm based on local search and chaotic map for feature selection in high-dimensional data. Expert Syst. Appl. 2022, 204, 117493. [Google Scholar] [CrossRef]
  28. Ekinci, S.; Izci, D.; Abualigah, L.; Abu Zitar, R. A Modified Oppositional Chaotic Local Search Strategy Based Aquila Optimizer to Design an Effective Controller for Vehicle Cruise Control System. J. Bionic Eng. 2023, 20, 1828–1851. [Google Scholar] [CrossRef]
  29. Xiao, J.; Wang, Y.J.; Xu, X.K. Fuzzy Community Detection Based on Elite Symbiotic Organisms Search and Node Neighborhood Information. IEEE Trans. Fuzzy Syst. 2022, 30, 2500–2514. [Google Scholar] [CrossRef]
  30. Zhu, Q.L.; Lin, Q.Z.; Li, J.Q.; Coello, C.A.C.; Ming, Z.; Chen, J.Y.; Zhang, J. An Elite Gene Guided Reproduction Operator for Many-Objective Optimization. IEEE Trans. Cybern. 2021, 51, 765–778. [Google Scholar] [CrossRef]
  31. Zhang, Y.Y. Elite archives-driven particle swarm optimization for large scale numerical optimization and its engineering applications. Swarm Evol. Comput. 2023, 76, 101212. [Google Scholar] [CrossRef]
  32. Zhong, X.X.; Cheng, P. An elite-guided hierarchical differential evolution algorithm. Appl. Intell. 2021, 51, 4962–4983. [Google Scholar] [CrossRef]
  33. Zhou, L.; Feng, L.; Gupta, A.; Ong, Y.S. Learnable Evolutionary Search Across Heterogeneous Problems via Kernelized Autoencoding. IEEE Trans. Evol. Comput. 2021, 25, 567–581. [Google Scholar] [CrossRef]
  34. Feng, L.; Zhou, W.; Liu, W.C.; Ong, Y.S.; Tan, K.C. Solving Dynamic Multiobjective Problem via Autoencoding Evolutionary Search. IEEE Trans. Cybern. 2022, 52, 2649–2662. [Google Scholar] [CrossRef]
  35. Zhan, Z.H.; Wang, Z.J.; Jin, H.; Zhang, J. Adaptive Distributed Differential Evolution. IEEE Trans. Cybern. 2020, 50, 4633–4647. [Google Scholar] [CrossRef]
  36. Zhan, Z.H.; Li, J.Y.; Kwong, S.; Zhang, J. Learning-Aided Evolution for Optimization. IEEE Trans. Evol. Comput. 2023, 27, 1794–1808. [Google Scholar] [CrossRef]
  37. Zabihi, Z.; Moghadam, A.M.E.; Rezvani, M.H. Reinforcement Learning Methods for Computation Offloading: A Systematic Review. Acm Comput. Surv. 2024, 56, 1–41. [Google Scholar] [CrossRef]
  38. Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent Progress in Reinforcement Learning and Adaptive Dynamic Programming for Advanced Control Applications. IEEE-CAA J. Autom. Sin. 2024, 11, 18–36. [Google Scholar] [CrossRef]
  39. Zhao, F.; Wang, Q.; Wang, L. An inverse reinforcement learning framework with the Q-learning mechanism for the metaheuristic algorithm. Knowl.-Based Syst. 2023, 265, 110368. [Google Scholar] [CrossRef]
  40. Ghetas, M.; Issa, M. A novel reinforcement learning-based reptile search algorithm for solving optimization problems. Neural Comput. Appl. 2023, 36, 533–568. [Google Scholar] [CrossRef]
  41. Li, Z.; Shi, L.; Yue, C.; Shang, Z.; Qu, B. Differential evolution based on reinforcement learning with fitness ranking for solving multimodal multiobjective problems. Swarm Evol. Comput. 2019, 49, 234–244. [Google Scholar] [CrossRef]
  42. Tan, Z.; Li, K. Differential evolution with mixed mutation strategy based on deep reinforcement learning. Appl. Soft Comput. 2021, 111, 107678. [Google Scholar] [CrossRef]
  43. Wu, D.; Wang, S.; Liu, Q.; Abualigah, L.; Jia, H. An Improved Teaching-Learning-Based Optimization Algorithm with Reinforcement Learning Strategy for Solving Optimization Problems. Comput. Intell. Neurosci. 2022, 2022, 1535957. [Google Scholar] [CrossRef] [PubMed]
  44. Samma, H.; Lim, C.P.; Saleh, J.M. A new Reinforcement Learning-Based Memetic Particle Swarm Optimizer. Appl. Soft Comput. 2016, 43, 276–297. [Google Scholar] [CrossRef]
  45. Hu, Z.P.; Yu, X.B. Reinforcement learning-based comprehensive learning grey wolf optimizer for feature selection. Appl. Soft Comput. 2023, 149, 110959. [Google Scholar] [CrossRef]
  46. Li, J.; Dong, H.; Wang, P.; Shen, J.; Qin, D. Multi-objective constrained black-box optimization algorithm based on feasible region localization and performance-improvement exploration. Appl. Soft Comput. 2023, 148, 110874. [Google Scholar] [CrossRef]
  47. Wang, Z.; Yao, S.; Li, G.; Zhang, Q. Multiobjective Combinatorial Optimization Using a Single Deep Reinforcement Learning Model. IEEE Trans. Cybern. 2024, 54, 1984–1996. [Google Scholar] [CrossRef]
  48. Huang, L.; Dong, B.; Xie, W.; Zhang, W. Offline Reinforcement Learning with Behavior Value Regularization. IEEE Trans. Cybern. 2024, 54, 3692–3704. [Google Scholar] [CrossRef]
  49. Abdel-Basset, M.; El-Shahat, D.; Jameel, M.; Abouhawwash, M. Exponential distribution optimizer (EDO): A novel math-inspired algorithm for global optimization and engineering problems. Artif. Intell. Rev. 2023, 56, 9329–9400. [Google Scholar] [CrossRef]
Figure 1. Illustration of the RLNOA.
Figure 1. Illustration of the RLNOA.
Biomimetics 09 00596 g001
Figure 2. The variation of N e x p l o r a t i o n when T max = 10 , N P = 20 , and ζ = 8 .
Figure 2. The variation of N e x p l o r a t i o n when T max = 10 , N P = 20 , and ζ = 8 .
Biomimetics 09 00596 g002
Figure 3. The illustration of states in the exploitation population.
Figure 3. The illustration of states in the exploitation population.
Biomimetics 09 00596 g003
Figure 4. The convergence curves of different algorithms applied to the CEC-2014 test suite.
Figure 4. The convergence curves of different algorithms applied to the CEC-2014 test suite.
Biomimetics 09 00596 g004
Figure 5. The box plots of different algorithms applied to the CEC-2014 test suite.
Figure 5. The box plots of different algorithms applied to the CEC-2014 test suite.
Biomimetics 09 00596 g005
Figure 6. The convergence curves of different algorithms applied to the CEC-2017 test suite.
Figure 6. The convergence curves of different algorithms applied to the CEC-2017 test suite.
Biomimetics 09 00596 g006
Figure 7. The box plots of different algorithms applied to the CEC-2017 test suite.
Figure 7. The box plots of different algorithms applied to the CEC-2017 test suite.
Biomimetics 09 00596 g007
Figure 8. The convergence curves of different algorithms applied to the CEC-2020 test suite.
Figure 8. The convergence curves of different algorithms applied to the CEC-2020 test suite.
Biomimetics 09 00596 g008
Figure 9. The box plots of different algorithms applied to the CEC-2020 test suite.
Figure 9. The box plots of different algorithms applied to the CEC-2020 test suite.
Biomimetics 09 00596 g009
Figure 10. Q-table update process for the F1 function.
Figure 10. Q-table update process for the F1 function.
Biomimetics 09 00596 g010
Table 1. The common parameters of the experimental algorithms.
Table 1. The common parameters of the experimental algorithms.
AlgorithmSpecificationsPopulation Size NP
RLNOA Learning   rate   α = 0.5 ,   discount   factor   γ = 0.5 ,   P r p = 0.2 ,   δ = 0.05 ,   k = 20 ,   ζ = 1 100
NOA P a 1   decreases   linearly   from   2   to   0 ,   P a 2 = 0.2 ,   P r p = 0.2 ,   δ = 0.05 100
SO c 1 = 0.5 ,   c 2 = 0.05 ,   c 3 = 2 100
RSA α = 0.1 ,   β = 0.005 100
CPO The   number   of   cycles   T = 2 ,   the   convergence   rate   α = 0.2 ,   T f = 0.8 ,   N min = 20 100
GWOConvergence constant a decreases linearly from 2 to 0100
PSO ω = 1 ,   c 1 = 1.5 ,   c 2 = 2 100
RLTLBO Learning   rate   α = 0.5 ,   discount   factor   γ = 0.5 33 because this algorithm has three main stages
RLMPSO Learning   rate   α = 0.5 ,   discount   factor   γ = 0.5 ,   ω = 0.9   for   exploration ,   ω = 0.4   for   convergence ,   c 1 = 2.5   for   exploration ,   c 2 = 0.5   for   convergence ,   c 1 = 0.5   for   exploration ,   c 2 = 2.5   for   convergence ,   V min , V max is set as 0.2 of search range100
RLCGWO Learning   rate   α = 0.5 ,   discount   factor   γ = 0.5 ,   scaling   factor   e = 0.5 ,   the   minimum   learning   probability   a = 0.1 ,   the   maximum   learning   probability   b = 0.5 100
Table 2. Results of the CEC-2014 test suite.
Table 2. Results of the CEC-2014 test suite.
FunMetricsRLNOANOASORSACPOGWOPSORLTLBORLMPSORLCGWO
F1Ave1.00 × 1021.04 × 1023.86 × 1071.02 × 1081.64 × 1024.57 × 1061.24 × 1042.39 × 1043.28 × 1051.86 × 107
Std4.65 × 10−32.55 × 1002.05 × 1074.09 × 1074.73 × 1013.34 × 1061.33 × 1043.63 × 1042.44 × 1051.79 × 107
Rank12910374568
F2Ave2.00 × 1022.00 × 1022.16 × 1095.89 × 1092.00 × 1026.15 × 1031.10 × 1035.83 × 1022.04 × 1032.60 × 109
Std2.61 × 10−61.15 × 10−39.40 × 1081.50 × 1091.03 × 10−54.49 × 1031.17 × 1036.06 × 1022.90 × 1031.26 × 109
Rank13810275469
F3Ave3.00 × 1023.00 × 1021.54 × 1048.05 × 1033.00 × 1024.01 × 1033.39 × 1023.89 × 1024.27 × 1034.97 × 104
Std1.52 × 10−81.60 × 10−64.95 × 1033.32 × 1031.12 × 10−83.55 × 1036.59 × 1011.18 × 1021.70 × 1032.18 × 104
Rank23981645710
F4Ave4.00 × 1024.05 × 1028.49 × 1021.37 × 1034.09 × 1024.29 × 1024.28 × 1024.16 × 1024.31 × 1025.15 × 102
Std3.68 × 10−31.14 × 1012.81 × 1026.97 × 1021.51 × 1011.17 × 1011.34 × 1011.63 × 1018.74 × 1005.83 × 101
Rank12910365478
F5Ave5.13 × 1025.18 × 1025.21 × 1025.20 × 1025.17 × 1025.20 × 1025.20 × 1025.20 × 1025.19 × 1025.20 × 102
Std8.38 × 1006.18 × 1001.16 × 10−18.23 × 10−27.38 × 1005.93 × 10−21.06 × 10−36.58 × 10−24.55 × 1001.13 × 10−1
Rank13109275846
F6Ave6.00 × 1026.00 × 1026.10 × 1026.09 × 1026.00 × 1026.01 × 1026.02 × 1026.02 × 1026.03 × 1026.09 × 102
Std6.70 × 10−61.89 × 10−26.27 × 10−11.09 × 1004.34 × 10−26.05 × 10−11.47 × 1009.28 × 10−11.34 × 1001.19 × 100
Rank13109245678
F7Ave7.00 × 1027.00 × 1027.48 × 1027.79 × 1027.00 × 1027.01 × 1027.00 × 1027.00 × 1027.00 × 1027.23 × 102
Std8.66 × 10−32.32 × 10−21.69 × 1012.48 × 1013.57 × 10−21.03 × 1007.46 × 10−28.26 × 10−28.76 × 10−27.51 × 100
Rank12910374568
F8Ave8.00 × 1028.00 × 1028.58 × 1028.69 × 1028.00 × 1028.07 × 1028.15 × 1028.11 × 1028.16 × 1028.60 × 102
Std0.00 × 1003.58 × 10−131.10 × 1019.38 × 1002.83 × 10−103.19 × 1007.35 × 1006.43 × 1006.83 × 1009.87 × 100
Rank12810346579
F9Ave9.02 × 1029.04 × 1029.61 × 1029.58 × 1029.12 × 1029.11 × 1029.15 × 1029.22 × 1029.21 × 1029.59 × 102
Std6.20 × 10−11.14 × 1001.27 × 1015.78 × 1002.46 × 1004.78 × 1007.19 × 1001.26 × 1017.73 × 1008.83 × 100
Rank12108435769
F10Ave1.00 × 1031.00 × 1032.51 × 1031.97 × 1031.01 × 1031.30 × 1031.35 × 1031.52 × 1031.29 × 1032.01 × 103
Std5.16 × 10−29.51 × 10−12.37 × 1022.07 × 1023.48 × 1002.15 × 1021.25 × 1021.93 × 1021.88 × 1022.48 × 102
Rank12108356749
F11Ave1.22 × 1031.56 × 1033.18 × 1032.43 × 1032.16 × 1031.72 × 1031.80 × 1031.73 × 1032.70 × 1034.04 × 103
Std4.64 × 1011.65 × 1022.95 × 1021.80 × 1021.26 × 1023.94 × 1022.84 × 1022.64 × 1021.41 × 1024.54 × 102
Rank12976354810
F12Ave1.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 1031.20 × 103
Std1.74 × 10−25.36 × 10−24.70 × 10−12.00 × 10−16.69 × 10−24.50 × 10−11.23 × 10−13.44 × 10−11.55 × 10−13.60 × 10−1
Rank13109542786
F13Ave1.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 1031.30 × 103
Std8.47 × 10−31.91 × 10−26.58 × 10−18.19 × 10−12.70 × 10−26.50 × 10−27.76 × 10−28.02 × 10−25.51 × 10−22.29 × 10−1
Rank12910543678
F14Ave1.40 × 1031.40 × 1031.41 × 1031.41 × 1031.40 × 1031.40 × 1031.40 × 1031.40 × 1031.40 × 1031.40 × 103
Std1.29 × 10−23.13 × 10−25.57 × 1003.50 × 1003.96 × 10−21.66 × 10−15.65 × 10−21.02 × 10−14.79 × 10−21.93 × 100
Rank12109453768
F15Ave1.50 × 1031.50 × 1032.77 × 1032.68 × 1031.50 × 1031.50 × 1031.50 × 1031.50 × 1031.50 × 1032.14 × 103
Std8.24 × 10−21.35 × 10−11.83 × 1031.25 × 1032.30 × 10−16.79 × 10−13.66 × 10−14.93 × 10−19.82 × 10−19.17 × 102
Rank12109543678
F16Ave1.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 1031.60 × 103
Std2.10 × 10−12.95 × 10−11.86 × 10−11.87 × 10−12.46 × 10−14.42 × 10−15.19 × 10−13.45 × 10−14.04 × 10−12.30 × 10−1
Rank12109436578
F17Ave1.71 × 1031.73 × 1032.74 × 1054.34 × 1051.79 × 1034.06 × 1044.15 × 1032.57 × 1037.96 × 1034.44 × 105
Std2.72 × 1001.17 × 1012.38 × 1051.39 × 1052.59 × 1011.04 × 1052.11 × 1036.71 × 1023.88 × 1038.66 × 105
Rank12893754610
F18Ave1.80 × 1031.80 × 1031.95 × 1058.88 × 1041.80 × 1039.13 × 1031.25 × 1044.15 × 1039.55 × 1033.19 × 104
Std1.66 × 10−15.94 × 10−13.26 × 1051.83 × 1057.55 × 10−16.34 × 1037.24 × 1031.78 × 1036.03 × 1033.89 × 104
Rank12109357468
F19Ave1.90 × 1031.90 × 1031.91 × 1031.91 × 1031.90 × 1031.90 × 1031.90 × 1031.90 × 1031.90 × 1031.91 × 103
Std5.49 × 10−21.78 × 10−14.63 × 1002.64 × 1001.98 × 10−17.61 × 10−11.02 × 1005.08 × 10−11.15 × 1007.90 × 10−1
Rank12910356478
F20Ave2.00 × 1032.00 × 1031.22 × 1041.05 × 1042.00 × 1034.80 × 1032.67 × 1032.11 × 1032.34 × 1031.62 × 104
Std4.59 × 10−21.65 × 10−17.65 × 1033.22 × 1033.20 × 10−13.80 × 1038.27 × 1023.41 × 1015.31 × 1021.26 × 104
Rank12983764510
F21Ave2.10 × 1032.10 × 1031.02 × 1052.04 × 1052.11 × 1038.61 × 1032.27 × 1032.26 × 1036.38 × 1031.21 × 104
Std1.49 × 10−15.73 × 10−11.72 × 1051.76 × 1051.96 × 1004.53 × 1031.29 × 1029.87 × 1014.71 × 1031.13 × 104
Rank12910375468
F22Ave2.20 × 1032.20 × 1032.39 × 1032.35 × 1032.21 × 1032.28 × 1032.29 × 1032.23 × 1032.28 × 1032.29 × 103
Std6.58 × 10−26.00 × 10−18.30 × 1014.06 × 1011.22 × 1005.28 × 1016.40 × 1012.86 × 1015.90 × 1013.40 × 101
Rank12109367458
F23Ave2.50 × 1032.50 × 1032.67 × 1032.50 × 1032.50 × 1032.63 × 1032.63 × 1032.50 × 1032.63 × 1032.69 × 103
Std0.00 × 1000.00 × 1002.48 × 1010.00 × 1000.00 × 1003.56 × 1002.95 × 10−130.00 × 1002.77 × 10−71.17 × 101
Rank12934865710
F24Ave2.50 × 1032.50 × 1032.59 × 1032.60 × 1032.52 × 1032.52 × 1032.52 × 1032.59 × 1032.53 × 1032.58 × 103
Std2.76 × 1003.85 × 1001.62 × 1018.64 × 1002.98 × 1005.84 × 1008.79 × 1002.88 × 1019.68 × 1001.37 × 101
Rank12810435967
F25Ave2.61 × 1032.62 × 1032.70 × 1032.70 × 1032.64 × 1032.70 × 1032.68 × 1032.70 × 1032.64 × 1032.70 × 103
Std3.29 × 1003.40 × 1007.49 × 1001.10 × 10−22.25 × 1011.46 × 1012.73 × 1010.00 × 1001.74 × 1017.56 × 100
Rank12683759410
F26Ave2.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 1032.70 × 103
Std9.37 × 10−31.53 × 10−25.68 × 10−15.20 × 10−12.72 × 10−23.39 × 10−26.53 × 10−27.06 × 10−24.19 × 10−21.82 × 10−1
Rank12910634578
F27Ave2.70 × 1032.70 × 1033.04 × 1032.96 × 1032.72 × 1033.05 × 1032.99 × 1032.81 × 1032.97 × 1033.14 × 103
Std2.19 × 10−13.35 × 10−11.44 × 1021.33 × 1028.89 × 1013.79 × 1011.27 × 1021.01 × 1021.73 × 1021.29 × 102
Rank12853974610
F28Ave3.00 × 1033.00 × 1033.19 × 1033.27 × 1033.04 × 1033.26 × 1033.31 × 1033.00 × 1033.19 × 1033.26 × 103
Std0.00 × 1000.00 × 1002.19 × 1021.40 × 1028.85 × 1018.53 × 1017.12 × 1010.00 × 1001.08 × 1025.94 × 101
Rank12594710368
F29Ave3.05 × 1033.10 × 1031.14 × 1054.50 × 1043.12 × 1033.68 × 1033.24 × 1033.15 × 1031.98 × 1054.18 × 105
Std6.18 × 1001.85 × 1011.75 × 1051.10 × 1051.22 × 1015.20 × 1029.86 × 1011.56 × 1025.98 × 1058.48 × 105
Rank12873654910
F30Ave3.45 × 1033.49 × 1038.35 × 1037.21 × 1033.64 × 1034.10 × 1034.74 × 1033.37 × 1034.34 × 1033.92 × 103
Std4.79 × 1011.63 × 1013.30 × 1032.05 × 1033.27 × 1017.90 × 1024.97 × 1022.72 × 1027.15 × 1022.90 × 102
Rank23109468175
MeanRanking 1.0345 2.1724 8.8966 8.6897 3.4483 5.4828 5.1379 5.3103 6.3103 8.5172
FinalRank12109364578
Table 3. Results of the CEC-2017 test suite.
Table 3. Results of the CEC-2017 test suite.
FunMetricsRLNOANOASORSACPOGWOPSORLTLBORLMPSORLCGWO
F1Ave1.00 × 1021.00 × 1023.65 × 1099.70 × 1091.00 × 1021.30 × 1061.25 × 1032.12 × 1033.10 × 1031.95 × 109
Std8.32 × 10−51.59 × 10−21.44 × 1092.68 × 1093.70 × 10−44.96 × 1062.07 × 1032.45 × 1033.34 × 1036.61 × 108
Rank13910274568
F3Ave3.00 × 1023.00 × 1021.14 × 1047.33 × 1033.00 × 1027.61 × 1023.00 × 1023.00 × 1027.22 × 1021.42 × 104
Std1.32 × 10−115.52 × 10−83.10 × 1032.01 × 1039.79 × 10−78.18 × 1024.12 × 10−143.04 × 10−91.65 × 1026.44 × 103
Rank24985713610
F4Ave4.00 × 1024.00 × 1026.44 × 1029.52 × 1024.01 × 1024.08 × 1024.01 × 1024.07 × 1024.08 × 1025.30 × 102
Std8.57 × 10−67.44 × 10−31.22 × 1024.15 × 1024.06 × 10−12.34 × 1007.61 × 10−11.46 × 1011.36 × 1018.14 × 101
Rank12910364578
F5Ave5.03 × 1025.04 × 1025.70 × 1025.78 × 1025.12 × 1025.14 × 1025.17 × 1025.16 × 1025.22 × 1025.61 × 102
Std7.16 × 10−11.42 × 1001.13 × 1011.38 × 1011.70 × 1008.36 × 1006.65 × 1007.73 × 1008.66 × 1001.07 × 101
Rank12910346578
F6Ave6.00 × 1026.00 × 1026.36 × 1026.45 × 1026.00 × 1026.00 × 1026.01 × 1026.00 × 1026.05 × 1026.30 × 102
Std4.16 × 10−124.59 × 10−89.52 × 1007.07 × 1001.04 × 10−65.11 × 10−17.29 × 10−17.53 × 10−13.59 × 1007.44 × 100
Rank12910356478
F7Ave7.14 × 1027.15 × 1028.16 × 1028.01 × 1027.23 × 1027.26 × 1027.21 × 1027.31 × 1027.44 × 1028.58 × 102
Std6.13 × 10−11.58 × 1001.82 × 1011.26 × 1012.83 × 1009.37 × 1006.15 × 1009.14 × 1001.08 × 1014.68 × 101
Rank12984536710
F8Ave8.02 × 1028.04 × 1028.58 × 1028.51 × 1028.11 × 1028.10 × 1028.13 × 1028.16 × 1028.23 × 1028.78 × 102
Std7.86 × 10−19.54 × 10−19.68 × 1007.96 × 1002.33 × 1005.53 × 1006.70 × 1005.15 × 1009.85 × 1001.21 × 101
Rank12984356710
F9Ave9.00 × 1029.00 × 1021.46 × 1031.46 × 1039.00 × 1029.06 × 1029.00 × 1029.02 × 1029.08 × 1022.40 × 103
Std0.00 × 1002.61 × 10−142.65 × 1022.16 × 1020.00 × 1001.43 × 1014.52 × 10−141.48 × 1007.89 × 1005.80 × 102
Rank12893645710
F10Ave1.04 × 1031.19 × 1032.89 × 1032.50 × 1031.55 × 1031.58 × 1031.64 × 1031.47 × 1031.80 × 1032.11 × 103
Std1.95 × 1018.33 × 1012.00 × 1021.85 × 1021.44 × 1023.29 × 1022.47 × 1023.29 × 1022.30 × 1023.47 × 102
Rank12109456378
F11Ave1.10 × 1031.10 × 1036.78 × 1034.83 × 1031.10 × 1031.12 × 1031.11 × 1031.11 × 1031.72 × 1039.36 × 104
Std2.58 × 10−11.11 × 1001.02 × 1042.47 × 1034.60 × 10−11.74 × 1016.89 × 1009.19 × 1004.22 × 1021.38 × 105
Rank12983645710
F12Ave1.24 × 1031.36 × 1037.05 × 1073.70 × 1081.55 × 1036.79 × 1051.19 × 1041.16 × 1042.81 × 1054.66 × 107
Std1.38 × 1014.92 × 1016.17 × 1074.42 × 1088.08 × 1019.50 × 1059.58 × 1037.96 × 1031.10 × 1064.11 × 107
Rank12910375468
F13Ave1.30 × 1031.31 × 1032.15 × 1052.03 × 1071.31 × 1039.55 × 1037.69 × 1033.28 × 1031.25 × 1043.91 × 104
Std7.95 × 10−11.35 × 1002.41 × 1052.30 × 1072.99 × 1004.31 × 1035.51 × 1031.77 × 1038.14 × 1032.96 × 104
Rank12910365478
F14Ave1.40 × 1031.40 × 1038.12 × 1034.26 × 1031.41 × 1032.63 × 1031.46 × 1031.43 × 1031.53 × 1031.98 × 103
Std1.03 × 10−17.23 × 10−11.11 × 1042.20 × 1031.82 × 1001.65 × 1033.58 × 1011.10 × 1012.99 × 1016.48 × 102
Rank12109385467
F15Ave1.50 × 1031.50 × 1031.03 × 1048.90 × 1031.50 × 1033.18 × 1031.57 × 1031.55 × 1032.04 × 1035.38 × 103
Std4.70 × 10−22.08 × 10−14.77 × 1035.49 × 1032.71 × 10−11.66 × 1034.52 × 1013.08 × 1013.62 × 1024.70 × 103
Rank12109375468
F16Ave1.60 × 1031.60 × 1032.04 × 1032.09 × 1031.60 × 1031.72 × 1031.83 × 1031.64 × 1031.73 × 1031.75 × 103
Std1.11 × 10−13.35 × 10−11.11 × 1021.29 × 1023.50 × 10−11.19 × 1021.20 × 1026.58 × 1011.24 × 1026.74 × 101
Rank12910358467
F17Ave1.70 × 1031.70 × 1031.86 × 1031.82 × 1031.71 × 1031.74 × 1031.75 × 1031.74 × 1031.76 × 1031.86 × 103
Std3.84 × 10−11.65 × 1004.44 × 1012.78 × 1013.25 × 1001.90 × 1012.26 × 1011.22 × 1013.39 × 1017.27 × 101
Rank12108356479
F18Ave1.80 × 1031.80 × 1034.46 × 1061.53 × 1071.80 × 1032.64 × 1044.97 × 1034.05 × 1032.02 × 1045.60 × 104
Std1.09 × 10−15.46 × 10−16.83 × 1063.52 × 1071.12 × 1001.63 × 1044.20 × 1031.92 × 1031.66 × 1041.81 × 104
Rank12910375468
F19Ave1.90 × 1031.90 × 1032.85 × 1044.99 × 1051.90 × 1036.71 × 1032.19 × 1031.94 × 1032.53 × 1032.61 × 104
Std3.39 × 10−29.22 × 10−23.37 × 1045.34 × 1052.28 × 10−15.57 × 1036.10 × 1022.65 × 1011.16 × 1031.40 × 104
Rank12910375468
F20Ave2.00 × 1032.00 × 1032.18 × 1032.22 × 1032.00 × 1032.05 × 1032.07 × 1032.03 × 1032.10 × 1032.19 × 103
Std2.20 × 10−122.30 × 10−16.39 × 1013.92 × 1011.30 × 10−13.26 × 1016.49 × 1011.34 × 1016.33 × 1016.47 × 101
Rank12810356479
F21Ave2.20 × 1032.20 × 1032.35 × 1032.29 × 1032.21 × 1032.30 × 1032.29 × 1032.25 × 1032.26 × 1032.34 × 103
Std5.94 × 10−91.38 × 10−23.05 × 1015.86 × 1012.49 × 1014.07 × 1015.33 × 1015.36 × 1016.20 × 1014.49 × 101
Rank12107386459
F22Ave2.25 × 1032.21 × 1032.61 × 1032.87 × 1032.30 × 1032.35 × 1032.30 × 1032.30 × 1032.30 × 1032.50 × 103
Std5.13 × 1012.39 × 1011.68 × 1022.02 × 1025.58 × 10−11.81 × 1022.02 × 1011.12 × 1001.83 × 1019.39 × 101
Rank21910473658
F23Ave2.60 × 1032.61 × 1032.69 × 1032.69 × 1032.61 × 1032.62 × 1032.62 × 1032.62 × 1032.62 × 1032.64 × 103
Std1.08 × 1001.13 × 1002.25 × 1019.98 × 1003.18 × 1008.09 × 1001.31 × 1016.32 × 1006.65 × 1006.20 × 100
Rank12910346578
F24Ave2.50 × 1032.50 × 1032.82 × 1032.85 × 1032.62 × 1032.74 × 1032.72 × 1032.73 × 1032.72 × 1032.78 × 103
Std9.21 × 10−133.92 × 10−83.07 × 1015.95 × 1011.22 × 1021.11 × 1017.72 × 1015.34 × 1019.35 × 1016.29 × 100
Rank12910375648
F25Ave2.90 × 1032.88 × 1033.11 × 1033.30 × 1032.90 × 1032.93 × 1032.92 × 1032.92 × 1032.92 × 1033.04 × 103
Std9.33 × 10−136.65 × 1011.04 × 1021.13 × 1021.66 × 1011.60 × 1013.27 × 1012.39 × 1012.47 × 1013.27 × 101
Rank21910374658
F26Ave2.81 × 1032.83 × 1033.66 × 1034.01 × 1032.90 × 1033.02 × 1032.88 × 1032.99 × 1032.96 × 1033.29 × 103
Std1.30 × 1021.08 × 1022.85 × 1022.96 × 1022.24 × 1013.03 × 1027.36 × 1017.85 × 1012.28 × 1024.75 × 102
Rank12910473658
F27Ave3.09 × 1033.09 × 1033.16 × 1033.19 × 1033.09 × 1033.09 × 1033.10 × 1033.10 × 1033.10 × 1033.10 × 103
Std9.42 × 10−18.96 × 10−12.43 × 1015.12 × 1011.51 × 1003.54 × 1001.67 × 1011.20 × 1011.52 × 1011.10 × 101
Rank12910348657
F28Ave3.10 × 1033.10 × 1033.53 × 1033.73 × 1033.10 × 1033.33 × 1033.22 × 1033.20 × 1033.30 × 1033.33 × 103
Std9.46 × 10−125.12 × 10−51.13 × 1029.77 × 1017.76 × 10−91.00 × 1021.41 × 1021.00 × 1021.75 × 1027.95 × 101
Rank13910285467
F29Ave3.14 × 1033.15 × 1033.39 × 1033.33 × 1033.17 × 1033.18 × 1033.21 × 1033.17 × 1033.21 × 1033.25 × 103
Std1.75 × 1007.83 × 1007.79 × 1017.80 × 1016.72 × 1003.31 × 1015.19 × 1011.92 × 1014.20 × 1016.61 × 101
Rank12109356478
F30Ave3.41 × 1033.51 × 1031.35 × 1073.61 × 1063.69 × 1034.29 × 1052.96 × 1051.00 × 1052.73 × 1054.49 × 105
Std3.52 × 1005.74 × 1011.27 × 1074.38 × 1061.57 × 1027.89 × 1054.63 × 1052.25 × 1054.75 × 1054.78 × 105
Rank12109376458
MeanRanking1.10712.07149.14299.35713.17866.00004.96434.64296.21438.3214
FinalRank12910365478
Table 4. Results of the CEC-2020 test suite.
Table 4. Results of the CEC-2020 test suite.
FunMetricsRLNOANOASORSACPOGWOPSORLTLBORLMPSORLCGWO
F1Ave1.00 × 1021.44 × 1021.70 × 10102.91 × 10101.04 × 1021.40 × 1083.58 × 1033.41 × 1036.56 × 1031.57 × 1010
Std2.14 × 10−13.57 × 1015.57 × 1094.45 × 1093.33 × 1003.75 × 1083.13 × 1033.52 × 1034.52 × 1032.63 × 109
Rank13910275468
F2Ave1.36 × 1031.96 × 1035.71 × 1035.53 × 1032.70 × 1032.54 × 1032.45 × 1032.61 × 1033.49 × 1035.53 × 103
Std1.14 × 1021.62 × 1023.81 × 1022.06 × 1022.39 × 1025.79 × 1023.68 × 1025.94 × 1026.67 × 1024.30 × 102
Rank12109643578
F3Ave7.31 × 1027.42 × 1021.04 × 1031.01 × 1037.63 × 1027.67 × 1027.53 × 1027.99 × 1028.26 × 1021.74 × 103
Std3.04 × 1001.01 × 1013.51 × 1012.94 × 1014.60 × 1002.22 × 1011.03 × 1013.05 × 1013.73 × 1011.90 × 102
Rank12984536710
F4Ave1.90 × 1031.90 × 1031.54 × 1053.54 × 1051.91 × 1031.91 × 1031.90 × 1031.92 × 1031.91 × 1033.63 × 104
Std3.02 × 10−14.32 × 10−11.13 × 1051.81 × 1056.18 × 10−13.06 × 1006.75 × 10−11.04 × 1013.31 × 1002.18 × 104
Rank23910451768
F5Ave2.12 × 1032.40 × 1033.16 × 1065.01 × 1063.02 × 1033.95 × 1058.93 × 1046.03 × 1042.74 × 1052.75 × 106
Std5.42 × 1011.28 × 1021.41 × 1062.14 × 1062.16 × 1026.93 × 1055.25 × 1043.58 × 1041.55 × 1052.52 × 106
Rank12910375468
F6Ave1.60 × 1031.60 × 1032.88 × 1033.12 × 1031.61 × 1031.86 × 1031.88 × 1031.75 × 1031.94 × 1032.38 × 103
Std1.82 × 10−14.85 × 10−13.61 × 1024.69 × 1024.30 × 1001.49 × 1021.58 × 1021.10 × 1021.58 × 1022.12 × 102
Rank12910356478
F7Ave2.25 × 1032.39 × 1031.33 × 1063.21 × 1062.69 × 1031.36 × 1055.24 × 1041.85 × 1049.30 × 1041.04 × 106
Std3.46 × 1018.91 × 1011.32 × 1063.63 × 1061.06 × 1028.51 × 1047.68 × 1041.91 × 1047.48 × 1046.58 × 105
Rank12910375468
F8Ave2.30 × 1032.30 × 1034.78 × 1035.30 × 1032.30 × 1032.71 × 1032.40 × 1032.30 × 1032.30 × 1035.47 × 103
Std1.43 × 1011.44 × 10−49.68 × 1027.07 × 1024.04 × 10−67.46 × 1024.47 × 1025.71 × 1001.23 × 1001.08 × 103
Rank13892765410
F9Ave2.81 × 1032.81 × 1033.13 × 1033.19 × 1032.86 × 1032.85 × 1032.86 × 1032.86 × 1032.88 × 1032.94 × 103
Std4.14 × 1006.80 × 1016.79 × 1011.85 × 1021.01 × 1013.68 × 1013.03 × 1011.97 × 1013.45 × 1011.33 × 101
Rank21910634578
F10Ave2.91 × 1032.91 × 1034.26 × 1034.82 × 1032.93 × 1032.95 × 1032.93 × 1032.98 × 1032.95 × 1034.15 × 103
Std4.49 × 1003.87 × 10−25.13 × 1027.40 × 1022.54 × 1013.01 × 1013.04 × 1013.81 × 1013.41 × 1014.77 × 102
Rank12910453768
MeanRanking 1.2222 2.2222 9.0000 9.5556 3.6667 5.5556 4.2222 4.8889 6.2222 8.4444
FinalRank12910364578
Table 5. Results of the RLNOA with different values for the parameter P r p .
Table 5. Results of the RLNOA with different values for the parameter P r p .
P r p Fun
F1F4F17F23
0.81.00 × 1024.00 × 1021.72 × 1032.50 × 103
0.61.00 × 1024.00 × 1021.72 × 1032.50 × 103
0.41.00 × 1024.00 × 1021.72 × 1032.50 × 103
0.21.00 × 1024.00 × 1021.71 × 1032.50 × 103
Table 6. Results of the RLNOA with different values for the parameter δ .
Table 6. Results of the RLNOA with different values for the parameter δ .
δ Fun
F1F4F17F23
0.51.00 × 1024.00 × 1021.71 × 1032.50 × 103
0.21.00 × 1024.00 × 1021.71 × 1032.50 × 103
0.11.00 × 1024.00 × 1021.71 × 1032.50 × 103
0.051.00 × 1024.00 × 1021.71 × 1032.50 × 103
Table 7. Results of the RLNOA with different values for the parameter k .
Table 7. Results of the RLNOA with different values for the parameter k .
k Fun
F1F4F17F23
51.00 × 1024.00 × 1021.71 × 1032.50 × 103
101.00 × 1024.00 × 1021.71 × 1032.50 × 103
201.00 × 1024.00 × 1021.71 × 1032.50 × 103
501.00 × 1024.00 × 1021.71 × 1032.50 × 103
Table 8. Results of the RLNOA with different values for the parameter ζ .
Table 8. Results of the RLNOA with different values for the parameter ζ .
ζ Fun
F1F4F17F23
11.00 × 1024.00 × 1021.71 × 1032.50 × 103
21.01 × 1024.00 × 1021.72 × 1032.50 × 103
41.04 × 1024.00 × 1021.72 × 1032.50 × 103
81.17 × 1024.00 × 1021.72 × 1032.50 × 103
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, Y. A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics 2024, 9, 596. https://doi.org/10.3390/biomimetics9100596

AMA Style

Li Y, Zhang Y. A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics. 2024; 9(10):596. https://doi.org/10.3390/biomimetics9100596

Chicago/Turabian Style

Li, Yu, and Yan Zhang. 2024. "A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization" Biomimetics 9, no. 10: 596. https://doi.org/10.3390/biomimetics9100596

APA Style

Li, Y., & Zhang, Y. (2024). A Reinforcement Learning-Based Bi-Population Nutcracker Optimizer for Global Optimization. Biomimetics, 9(10), 596. https://doi.org/10.3390/biomimetics9100596

Article Metrics

Back to TopTop