Next Article in Journal
Algorithms for Two Types of Topological Indices
Previous Article in Journal
Learning System-Optimal and Individual-Optimal Collision Avoidance Behaviors by Autonomous Mobile Agents
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Q-Learning-Based Multi-Strategy Topology Particle Swarm Optimization Algorithm

1
School of Mechanical and Automation Engineering, Wuyi University, Jiangmen 529020, China
2
School of Mechanical Engineering, Dongguan University of Technology, Dongguan 523808, China
*
Author to whom correspondence should be addressed.
Algorithms 2025, 18(11), 672; https://doi.org/10.3390/a18110672
Submission received: 22 August 2025 / Revised: 1 October 2025 / Accepted: 13 October 2025 / Published: 22 October 2025

Abstract

In response to the issues of premature convergence and insufficient parameter control in Particle Swarm Optimization (PSO) for high-dimensional complex optimization problems, this paper proposes a Multi-Strategy Topological Particle Swarm Optimization algorithm (MSTPSO). The method builds upon a reinforcement learning-driven topological switching framework, where Q-learning dynamically selects among fully informed topology, small-world topology, and exemplar-set topology to achieve an adaptive balance between global exploration and local exploitation. Furthermore, the algorithm integrates differential evolution perturbations and a global optimal restart strategy based on stagnation detection, together with a dual-layer experience replay mechanism to enhance population diversity at multiple levels and strengthen the ability to escape local optima. Experimental results on 29 CEC2017 benchmark functions, compared against various PSO variants and other advanced evolutionary algorithms, show that MSTPSO achieves superior fitness performance and exhibits stronger stability on high-dimensional and complex functions. Ablation studies further validate the critical contribution of the Q-learning-based multi-topology control and stagnation detection mechanisms to performance improvement. Overall, MSTPSO demonstrates significant advantages in convergence accuracy and global search capability.

1. Introduction

Optimization techniques play a central role in fields such as engineering design, machine learning, artificial intelligence, and operations research. From parameter optimization of complex mechanical structures to hyperparameter tuning of deep learning models, efficient optimization algorithms are the key to solving practical problems. Current mainstream optimization algorithms include differential evolution (DE) [1,2], Genetic Algorithm (GA) [3], Bayesian Optimization (BO) [4], and Artificial Bee Colony (ABC) algorithm [5], among which Particle Swarm Optimization (PSO) has been widely applied due to its simple structure, easy parameter adjustment, and convenient implementation [6,7,8]. Since its introduction by Kennedy and Eberhart in 1995, PSO has simulated the social behavior of biological populations to perform optimization search and has spawned numerous improved variants [6]. However, traditional PSO still suffers from significant limitations: most variants are effective only for specific problem types, PSO tends to fall into local optima in high-dimensional multimodal functions, and fixed topology structures limit population diversity [9,10]. To overcome these bottlenecks, researchers have systematically improved PSO in several directions.
First, hybrid strategies integrate the strengths of different algorithms to enhance search performance. Recent studies include GWOPSO, which combines Grey Wolf Optimization (GWO) with PSO by leveraging PSO’s population collaboration mechanism and GWO’s hierarchical hunting strategy [11]; ECCSPSOA, which integrates chaotic crow search with PSO to improve the diversity of initial solutions for feature selection problems [12]; and HGSPSO, which merges gravitational search with PSO to optimize particle interactions via a gravity model [13]. In particular, hybridization of PSO and differential evolution (DE) has become a research hotspot: heterogeneous DE-PSO improves exploitation through local search [14], while Mutual Learning DE-PSO enhances global exploration by knowledge sharing among swarms [15]. These hybrid approaches have demonstrated significant advantages in complex function optimization tasks, as validated on CEC2017 benchmarks [16,17].
Second, learning strategies focus on building efficient guidance mechanisms. Liu et al. proposed simplified PSO models with cognitive, social, and temporal-hierarchical strategies (THSPSO), which dynamically update learning patterns to improve optimization efficiency and significantly shorten convergence time compared with standard PSO in complex function optimization [10]. Huang et al. introduced the LRMODE algorithm, which incorporates ruggedness analysis of the fitness landscape to refine particle learning directions [18]. Lu et al. employed reinforcement learning to guide particle behavior, integrating historical successful experiences into wastewater treatment control optimization [19]. Zhao et al. proposed the mean-hierarchy PSO (MHPSO), which stratifies particles according to population mean fitness and uses stochastic elite neighbor selection to maintain diversity [20]. Wang et al. developed the SCDLPSO algorithm with self-correction and dimension-wise learning ability to strengthen links between individual and global bests for solving complex multimodal problems [21]. Other studies have enhanced adaptability at different stages by introducing adaptive inertia weights and learning factors [22,23]. These methods dynamically adjust learning exemplars to strengthen PSO’s adaptability to complex problems.
Third, neighborhood topology innovations are critical to improving PSO performance. While traditional global topology risks premature convergence and local topology converges slowly [24], dynamic topologies provide balance. For example, SWD-PSO employs small-world networks with dense short-range links and random long-range connections to accelerate global information exchange [25]. DNSPSO adapts neighborhood range via dynamic switching mechanisms [26], while MNPSO applies distance-adaptive neighborhoods to improve nonlinear equation solving [27]. Pyramid PSO (PPSO) enhances population diversity through competition-cooperation strategies and multi-behavior learning [28]. Liang et al. proposed NRLPSO, a neighborhood differential mutation PSO guided by reinforcement learning, with a dynamic oscillatory inertia weight to adapt particle motion [29]. Jiang et al. developed HPSO-TS, a hybrid PSO with time-varying topology and search perturbation, which uses K-medoids clustering to dynamically divide the swarm into heterogeneous subgroups, facilitating intra-group information flow and global-local search transitions [30]. Zeng et al. further proposed a dynamic-neighborhood switching PSO (DNSPSO) with a novel velocity update rule and switching strategy, showing excellent performance on multimodal optimization problems [26]. These topology improvements have greatly enhanced search capability in high-dimensional spaces.
Finally, adaptive parameter control in PSO has evolved from heuristic rules to intelligent learning strategies. Tatsis et al. introduced an online adaptive parameter method that combines performance estimation with gradient search, significantly improving robustness in complex environments [31]. Chen et al. developed MACPSO, which applies multiple Actor-Critic networks to optimize parameters in continuous space, balancing global exploration and local exploitation [16]. Liu et al. proposed QLPSO, where Q-learning trains parameters and Q-tables guide particle actions based on states, improving search efficiency in complex solution spaces [32]. A deep reinforcement learning-based PSO (DRLPSO) [33] employs neural networks to learn state-action mappings and dynamically adjust parameters for high-dimensional tasks. Hamad et al. proposed QLESCA, which introduces Q-learning into the sine cosine algorithm (SCA) to achieve adaptive tuning of key parameters, significantly improving convergence speed and solution accuracy. Subsequent studies applied QLESCA to high-dimensional COVID-19 feature selection, verifying its potential in high-dimensional problems; however, the overall optimization strategy remains primarily parameter-focused [34,35]. RLPSO algorithm employs a reinforcement learning strategy to adaptively control the hierarchical structure of the swarm, adjusting search modes across different levels to balance exploration and exploitation [36]. Most existing RL-PSO methods still concentrate on parameter-level adaptation. Although this improves convergence performance to some extent, little work has addressed adaptive control of the swarm topology. As topology determines the interaction pattern among particles, it plays a crucial role in maintaining population diversity and balancing exploration and exploitation. Therefore, introducing reinforcement learning for dynamic topology selection remains an area requiring further investigation. Based on this background, the main contributions of this paper are as follows:
  • A reinforcement learning-based topology switching strategy is proposed, enabling particles to dynamically select among FIPS, small-world, and exemplar-set topologies to balance global exploration and local exploitation.
  • A dual-layer Q-learning experience replay mechanism is designed, integrating short-term and long-term memories to stabilize parameter control and improve learning efficiency.
  • A stagnation detection mechanism is constructed and combined with differential evolution perturbations and a global restart strategy to enhance population diversity and improve the ability to escape local optima.
Compared with existing RL-PSO methods that mainly focus on parameter adaptation, the proposed MSTPSO applies reinforcement learning to adaptive control of swarm topology, optimizing the information exchange pattern at the structural level and thereby improving convergence efficiency and robustness. The remainder of this paper is organized as follows: Section 2 reviews PSO and reinforcement learning–related theories. Section 3 describes the design and implementation of MSTPSO. Section 4 presents performance validation on the CEC2017 benchmark and ablation analysis of each component. Section 5 concludes the study and outlines future research directions.

2. Background Information

2.1. Particle Swarm Optimization

The Particle Swarm Optimization (PSO) algorithm was proposed by Kennedy and Eberhart in 1995 [7], inspired by the foraging behavior of bird flocks. In PSO, each individual in the population is abstracted as a “particle”, and each particle’s position corresponds to a potential solution to the optimization problem. Through cooperation and information sharing within the swarm, global optimal search can be achieved. In a standard PSO, a particle swarm consisting of N particles is described by two vector quantities: velocity vector v and position vector x. For a D-dimensional search space, the velocity and position of the i-th particle are defined as
v i = ( v i 1 , v i 2 , , v i D ) D , i = 1 ,   2 ,   ,   N .
x i = ( x i 1 , x i 2 , , x i D ) D , i = 1 ,   2 ,   ,   N .
Equation (1) represents the velocity vector of the particle in D-dimensional space, while Equation (2) denotes its position in the search space. Each particle has a memory ability and can store the best position it has visited so far, referred to as the individual best position p b i ( h ) . Among all individual bests p b i ( h ) , the best one is selected as the global best position g b ( h ) . The velocity and position of particles are updated iteratively using the following equations:
v i ( h + 1 ) = ω v i ( h ) + c 1 r 1 ( p b i ( h ) x i ( h ) ) + c 2 r 2 g b ( h ) x i ( h ) .
x i ( h + 1 ) = x i ( h ) + v i ( h + 1 ) .
In the above: ω is the inertia weight, which maintains the particle’s previous momentum and balances exploration and exploitation; c 1 and c 2 are cognitive and social acceleration coefficients, representing the particle’s self-learning and swarm-learning abilities, respectively; r 1 , r 2 [ 0 ,   1 ] are random numbers uniformly distributed within [ 0 ,   1 ] , introducing stochasticity to avoid premature convergence; h is the iteration index; v i ( h ) and x i ( h ) represent the velocity and position of the i-th particle at the h-th iteration, respectively. The PSO mechanism is driven by a dual principle of “individual experience + group cooperation”, enabling particles to update their positions by considering both their historical best positions p b i ( h ) and the global best position g b ( h ) , thereby forming a self-organizing search process. This mechanism allows PSO to dynamically balance global exploration and local exploitation through a simple structure and parameter configuration. However, traditional PSO often uses fixed parameter settings, which limits its adaptability. This paper designs an adaptive multi-strategy inertia weight adjustment mechanism based on particle state feedback to achieve dynamic balance. Moreover, coefficients such as c 1 and c 2 are linearly adjusted to enhance convergence capability and address the parameter sensitivity issue in standard PSO.

2.2. Reinforcement Learning

Reinforcement learning (RL) is a key branch of machine learning that focuses on enabling intelligent agents to make sequential decisions in dynamic environments. Its core objective is to learn optimal or near-optimal policies through interactions with the environment to maximize expected cumulative long-term rewards. This process can be modeled as a Markov Decision Process (MDP), formally defined as a quadruple ( S , A , P , R ) [37], where S represents the set of states, describing all possible situations the agent may encounter; A denotes the action set available to the agent; P : S × A × S [ 0 ,   1 ] is the state transition probability function, describing the probability of transitioning from one state to another given an action; R : S × A R is the reward function, quantifying the immediate feedback received by the agent for performing an action in a given state.
Through continuous interaction with the environment, the agent receives a sequence of observations and generates an experience trajectory { s , a , r , s , } , where the cumulative discounted R t return at time t is defined as
R t = i = t T γ i t r i , γ [ 0 ,   1 ] ,
where γ is the discount factor, used to balance the importance between immediate and future rewards. The core objective of RL is to find the optimal policy π * ( a | s ) that maximizes the expected cumulative return
π * ( a | s ) = arg max a E [ R t | s , a ] .
To achieve this, RL commonly defines a state-value function V * ( s ) and an action-value function Q * ( s , a ) , as follows:
V * ( s ) = E t [ r t + 1 + γ V * ( s t + 1 ) | s t = s ] .
Q * ( s , a ) = E t [ r t + 1 + γ Q * ( s t + 1 , a t + 1 ) | s t = s , a t = a ] .
When the optimal policy π * is achieved, the action-value function Q * ( s , a ) satisfies the Bellman optimality condition, which forms the theoretical foundation of the Q-learning algorithm [38].

2.3. Q-Learning

Q-learning, a representative algorithm in reinforcement learning (RL), aims to learn the optimal action-selection policy in an environment by maximizing the expected cumulative reward. RL typically models the learning process through continuous interactions between the agent and the environment, based on three key elements: state, action, and reward. The theoretical basis of Q-learning was proposed by Watkins in 1989 [39], which focuses on estimating the value (Q-value) of performing an action in a specific state without requiring a model of the environment’s dynamics. Q-learning uses an iterative update rule to approximate the optimal Q-value function, with the update equation given as
Q ( s t + 1 , a t + 1 , i ) = ( 1 α ) Q ( s t , a t , i ) + α R ( s t , a t ) + γ max a Q ( s t + 1 , a , i ) ,
where α [ 0 ,   1 ] is the learning rate, controlling how much newly acquired information overrides the old one; γ [ 0 ,   1 ] is the discount factor, controlling the importance of future rewards; r t + 1 is the immediate reward received after performing action a t in state s t ; max a Q ( s t + 1 , a , i ) represents the maximum estimated future return from the next state s t + 1 , assuming the best action is taken.
This update process is an off-policy method that uses the greedy policy to update Q-values while allowing exploratory behavior during the learning process through strategies such as ϵ -greedy selection. The algorithm bootstraps its Q-value estimations, progressively improving policy performance through value iteration. The core architecture of the Q-learning model is illustrated in Figure 1, and its workflow can be summarized as a closed-loop cycle of state perception → action selection → environment interaction → reward acquisition → Q-table update. This mechanism aligns well with the dynamic parameter adjustment requirements of Particle Swarm Optimization (PSO); by defining the search states of PSO (such as particle distribution and fitness variation) as RL states and defining parameter combinations as RL actions, Q-learning enables online optimization of PSO parameters through its adaptive updating capability [40].

3. Multi-Strategy Topology PSO

This paper proposes a multi-topology Particle Swarm Optimization algorithm integrated with reinforcement learning, and its workflow is illustrated in Figure 2. In each iteration, the Q-learning agent selects the topology and parameter configuration according to the current state, after which particles update their velocities and positions and perform fitness evaluation. Stagnation detection is then carried out: if swarm stagnation is detected, a partial restart based on the global best is triggered, while individual stagnation activates a differential evolution–based perturbation. The Q-table is updated in every iteration using immediate feedback, whereas prioritized experience replay from the dual-layer experience pool is triggered periodically to further optimize the strategy.

3.1. State

Under the Q-learning framework, the state space needs to simultaneously reflect both the convergence trend of individual particles and the diversity of the entire swarm, thereby characterizing the dynamic balance between exploration and exploitation. In this paper, the state is designed with two complementary components: the fitness improvement state and the diversity shrinkage state. For particle i at generation t, the change in fitness is defined as
Δ f i t = f i t 1 f i t ,
where f i t denotes the fitness of particle iii at generation t. According to the value of Δ f i t , the base state of particle i, denoted as base _ state i , is categorized as
b a s e _ s t a t e i = 1 , Δ f i t < 10 8 2 , 10 8 < Δ f i t < 0 3 , Δ f i t = 0 4 , Δ f i t > 0 ,
where the threshold 10 8 is commonly used in swarm intelligence optimization to distinguish between significant and non-significant improvements, ensuring the stability and robustness of numerical judgment. The states correspond respectively to significant degradation, slight degradation, no change, and improvement. To quantify the diversity of the swarm distribution, Shannon entropy is introduced,
H i t = i = 1 M p i · log 2 ( p i ) ,
where M = 10 denotes the number of partitions of the search space, and represents the proportion of particles falling into the j-th interval. According to the entropy value, the diversity level is categorized as
s i = ( e n t r o p y _ s t a t e i 1 ) × 4 + b a s e _ s t a t e i .
These correspond to low, medium, and high diversity, respectively. Shannon entropy has been widely applied in PSO to measure the distribution and convergence behavior of the swarm [32].

3.2. Action Design

This study selects three representative topologies as reinforcement learning actions: Fully-Informed Particle Swarm Topology (FIPS) [41], small-world topology [25], and exemplar-set topology [42]. Figure 3 shows the three topological structures adopted in this work. These topologies have been demonstrated to significantly enhance global searchability and convergence efficiency in multimodal and complex optimization problems. Their corresponding velocity update mechanisms are described below.
Under the FIPS topology, the velocity of particle i is influenced by all its neighborhood members with additional weights. Its update equation is
v i , d t + 1 = ω v i , d + ϕ j N i r j , d p j , d b e s t x i , d t ,
where N i denotes the set of neighbors of particle i, ϕ is an additional weight factor, and r j , d U ( 0 ,   1 ) is a random weight. The inertia weight ω is linearly decreased,
ω = ω max ( ω max ω min ) t T
where T is the maximum number of iterations, and t is the current iteration index. This design balances global exploration in early iterations and local exploitation in later stages. Under the small-world topology, particles mainly interact with nearby neighbors while introducing long-distance links with a small probability. Its velocity update is
v i , d t + 1 = ω v i , d t + c 1 r 1 p i , d best x i , d t + c 2 r 2 g s i , d t x i , d t ,
where g s i , d t is the best solution among neighbors, and the learning factors c 1 , c 2 vary linearly,
c 1 = c 1 , max ( c 1 , max c 1 , min ) t T .
c 2 = c 2 , max ( c 2 , max c 2 , min ) t T .
Under the exemplar-set topology, each particle dynamically selects exemplars from the constructed exemplar set as reference points to enhance diversity and effectively avoid premature convergence. Its velocity update is
v i , d t + 1 = ω v i , d t + c 1 r 1 ( p i , d b e s t x i , d t ) + c 2 r 2 ( p s b e s t i , d b e s t x i , d t ) ,
where p best , d best denotes the best individual in the exemplar set. This strategy introduces diverse references, strengthening swarm diversity and improving the ability to escape local optima.
In summary, combining the three topological structures with the dynamic adjustment mechanism of ω , c 1 and c 2 provides Q-learning with a set of structural actions, achieving a balance between exploration and exploitation. The FIPS topology strengthens fully informed aggregation to improve global convergence; the small-world topology balances information transmission and convergence speed, enhancing search robustness; and the exemplar-set topology maintains swarm diversity through diversified reference individuals, reducing the risk of premature convergence.

3.3. Reward Function

In the MSTPSO algorithm, the reward function is designed to quantify the search performance after executing the current strategy, and it serves as the updating signal for the Q-learning module. The design of the reward comprehensively considers both the improvement of particle fitness and the maintenance of population diversity, thereby guiding the search in the desired direction and avoiding premature convergence and local stagnation. Specifically, the reward of a particle in each generation consists of two parts: (1) the fitness improvement, which reflects the enhancement of a particle’s solution quality, and (2) the diversity maintenance, which encourages exploration behavior. The reward is defined as follows:
r i t = w 1 · Δ f i t + w 2 · H i t ,
where Δ f i t = f i t 1 f i t represents the fitness improvement between the current solution and the historical best of particle i, and H i t denotes the diversity contribution of the swarm. In terms of weight setting, this study refers to the related work of Liu et al. [32], in which a reinforcement learning-based PSO parameter control adopts a uniform weighting strategy to balance different optimization objectives. Accordingly, this paper sets w 1 = w 2 = 0.5 , thereby establishing a stable balance between fitness improvement and diversity maintenance.
The reward is then used to update the Q-value function, following the rule
Q ( s i , a i ) ( 1 α ) Q ( s i , a i ) + α r i + γ max a Q ( s i , a ) ,
where α is the learning rate and γ is the discount factor.
To balance the rapid response to new information and the long-term stability derived from past experiences, MSTPSO adopts a dual experience replay mechanism. The short-term memory (STM) stores the most recent samples of particle state transitions for rapid adaptation, while the long-term memory (LTM) preserves older and more stable samples to enhance the robustness of strategy learning. Each sample consists of ( s t , a t , r t , s t + 1 ) and its corresponding priority. The prioritized update rule is defined as
ω i = α T D ( T D i ) + α f Δ f i t + α H Δ H i t + λ ,
where α T D , α f , and α H are coefficients corresponding to T D error, fitness improvement, and diversity contribution, respectively, T D i denotes the temporal-difference ( T D ) error of the i-th sample, and λ = 10 6 is a smoothing term to avoid zero priority. The coefficients are set as 0.7 , 0.5 , and 0.3 , respectively.
During the search, STM and LTM are updated at fixed intervals (every 500 evaluations). A replacement strategy with a ratio of 0.7:0.3 is applied, where 70% of new samples replace those in STM, while 30% are promoted to LTM. In addition, the top 10% of high-priority samples in STM are transferred into LTM to preserve valuable experience, ensuring long-term diversity maintenance in strategy learning.

3.4. Stagnation Detection and Hierarchical Response Mechanism

To address the potential issues of premature convergence and search stagnation during the iterative process of Particle Swarm Optimization, this paper proposes a hierarchical response mechanism based on stagnation detection. The mechanism first determines the level at which stagnation occurs (population or individual) and subsequently triggers either a partial restart strategy based on the global best neighborhood or a differential evolution (DE)-based perturbation mechanism, thereby enhancing global exploration capability while maintaining local exploitation performance.
In each iteration, if the improvement of the global-best fitness over 20 consecutive iterations is less than 10 6 , or if the velocity magnitude of all particles falls below the lower bound v min = 10 5 , the population is deemed stagnant and the restart strategy is executed. If an individual particle’s personal best position remains unchanged for 10 consecutive generations, the particle is considered individually stagnant, and the DE-based perturbation mechanism is activated.

3.4.1. Global-Best-Oriented Restart Strategy

When population stagnation is detected, a partial restart strategy centered on the global-best neighborhood is employed: 20% of the particles are randomly selected for position and velocity reinitialization within an adaptive radius around the global best, thereby restoring search diversity. The new position is generated as
R d = λ · max i x i , d g d , λ ( 0 ,   1 ) ,
x i , d g d + rand ( R d , R d ) ,
where g d denotes the coordinate of the global best in dimension d, and rand ( · ) represents a uniformly distributed random number. This method redistributes particles around the global best within an adaptive radius, helping the swarm escape stagnation caused by local clustering and velocity decay.

3.4.2. DE-Based Perturbation Mechanism

For particles identified as individually stagnant, a DE-based perturbation mechanism is integrated into the stagnation detection framework to enhance diversity and promote escape from local optima. As shown in Figure 4, the specific operation is as follows:
(1)
Global-best region perturbation.
Based on Euclidean distance, the closest G near and farthest G far particles from the global best G best are selected to generate a new candidate,
G new = G near + F · ( G far G near ) ,
where F denotes the differential evolution scaling factor, set to F = 0.6 in our experiments. If the fitness of G new is better than that of G best , the latter is replaced by the new one.
(2)
Individual-level perturbation.
For each stagnant particle, the nearest and farthest neighbors p near , i and p far , i relative to p best , i are selected,
P new , i = P near , i + F · ( P far , i P near , i ) .
If P new , i is better, it updates p best , i ; otherwise, it replaces the farthest neighbor to increase local diversity.

3.5. Summary and Complexity Analysis

The MSTPSO algorithm begins with the initialization of particles, PSO parameters, the Q-table, and short-term and long-term experience buffers. In each iteration, particle states are obtained, topology actions are selected through the policy on the Q-table, and particle velocities and positions are updated accordingly. Fitness is then evaluated, personal and global bests are updated, and rewards with transitions are stored into replay buffers. The Q-table is updated regularly with prioritized experience replay. Stagnation detection is performed, and restart or DE-based perturbation strategies are applied when necessary. After completing the iterations, the algorithm returns the global best solution.
As shown in Algorithm 1, the computational complexity of MSTPSO consists of five components: fitness evaluation, Q-learning-based topology selection and particle swarm update, diversity entropy calculation, restart and DE perturbation, and the dual-layer experience replay mechanism.
Fitness evaluation is performed for N particles and costs O( N C f ), where N is the swarm size and C f is the cost of a single fitness computation. On CEC benchmark functions, C f = O ( D ) , with D denoting the problem dimension. In the Q-learning–controlled particle swarm update, each generation each particle selects one of three topologies. For FIPS and small-world topologies, the update is O ( N D ) . When the exemplar-set topology is chosen, it needs to select from msl = [0.1N] exemplars, incurring an additional O ( N 2 ) cost. The diversity entropy module relies on Shannon entropy statistics: interval frequency counting for N particles over D dimensions is O ( N D ) . Restart and DE perturbation are triggered only when stagnation occurs, with amortized complexity O ( N D ) . The dual-layer experience replay comprises a short-term memory (STM, size 2N) and a long-term memory (LTM, size 5N), with storage complexity O ( N ) . Each generation writes N transitions into STM and computes priorities in constant time, giving O ( N ) per generation. When replay is triggered, B = 3N samples are drawn and the Q-table is updated, costing O ( N ) . Amortized over all generations, this remains linear and negligible compared with the main terms. Considering the entire evolutionary process, the overall time complexity of MSTPSO is O ( N 2 D ) .
Algorithm 1 MSTPSO Algorithm
Input: population size N, dimension D, maximum iterations T
Output: global best solution G best
1: Initialize particle positions, velocities, and the Q-table.
2: Create dual-layer replay buffers.
3: Evaluate initial fitness and determine G best .
4: for  t = 1 to T do
5:   for each particle do
6:      Select topology using Q-learning policy.
7:      Update x i , d t and v i , d t according to the selected topology by (15), (17), and (20).
8:   end for
9:   Evaluate fitness and update personal and global bests.
10:   If stagnation is detected, apply perturbation or restart strategy.
11: end for
12: Return G best .

4. Numerical Experiments

4.1. Experimental Settings and Comparison Methods

In this experiment, the performance of the proposed MSTPSO algorithm is evaluated using 30 benchmark functions from the CEC2017 [40] test suite. Since function f 2 shows instability in high-dimensional cases, it is excluded from the analysis. Table 1 lists the detailed information of these functions, which are divided into four categories: unimodal functions ( f 1 , f 3 ), multimodal functions ( f 4 f 10 ), hybrid functions ( f 11 f 20 ), and composition functions ( f 21 f 30 ). Extensive numerical experiments are conducted to validate the effectiveness of MSTPSO.
For comparative studies, MSTPSO is benchmarked against nine mainstream PSO variants, including DQNPSO [43], APSO_SAC [44], EPSO [45], KGPSO [46], XPSO [22], MSORL [47], DRA [48], RFO [49], and RLACA [50]. The detailed information of these algorithms is given in Table 2. All functions are independently executed 51 times according to the CEC standard, with the termination condition set as the maximum number of evaluations equal to D × 10 4 , where D is the problem dimension.
To comprehensively evaluate the effectiveness of MSTPSO, experiments are conducted on the CEC2017 test suite in 10, 30, 50 and 100 dimensions, and the results are compared with the above-mentioned algorithms. Table 3, Table 4, Table 5 and Table 6 present the mean and standard deviation of the solution errors obtained after 51 runs. In addition, as shown in Table 7, the results of the Wilcoxon signed-rank test demonstrate the statistical significance of the algorithm. The data indicate that MSTPSO demonstrates significant advantages over existing PSO variants, confirming its superior performance.

4.2. Performance Comparison Across Different Dimensions

In the 10-dimensional tests, Table 3 reports the mean and standard deviation of solution errors after 51 independent runs. For the unimodal functions f 1 and f 3 , the proposed algorithm ranks first on f 3 and second on f 1 , with only a small gap from the top, reflecting its effectiveness in solving unimodal problems. For the multimodal functions ( f 4 f 10 ), MSTPSO achieves the best results on f 5 f 10 , which not only demonstrates its excellent global search ability but also highlights the effectiveness of its reinforcement learning strategy and perturbation mechanism in escaping local optima. In the hybrid ( f 11 f 20 ) and composition functions ( f 21 f 30 ), MSTPSO also achieves first place on f 11 , f 14 , f 16 , f 17 , f 20 f 23 , f 28 , and f 29 .
In the 30-dimensional case, Table 4 presents the mean and standard deviation of solution errors after 51 runs. MSTPSO ranks second on the unimodal functions f 1 and f 3 , slightly behind RLACA. For the multimodal functions ( f 4 f 10 ), MSTPSO maintains stable rankings, demonstrating its competitiveness in handling multimodal optimization problems. For the hybrid and composition functions ( f 11 f 30 ), MSTPSO shows significant improvement; however, f 14 experiences a performance drop, indicating its sensitivity to dimensionality, while other functions achieve notable gains. Specifically, MSTPSO ranks first on f 12 , f 13 , f 15 , f 26 and f 30 , underscoring its strong global search ability and remarkable performance in complex hybrid problems.
In the 50-dimensional case, Table 5 shows the mean and standard deviation of solution errors after 51 runs. MSTPSO achieves a significant improvement on unimodal functions, rising to first place on f 1 . For the multimodal functions ( f 4 f 10 ), MSTPSO obtains the best results on six functions ( f 5 f 10 ) and ranks third on f 4 , reflecting its robustness in escaping local optima under complex conditions. For the hybrid and composition functions ( f 11 f 30 ), MSTPSO achieves overall improvement, ranking first on 14 out of 20 functions, which demonstrates its powerful global search capability in high-dimensional settings.
In the 100-dimensional case, Table 6 presents the mean and standard deviation of solution errors, as well as the average rankings over 51 independent runs. MSTPSO achieves the best result on the unimodal function f 1 . For multimodal functions, MSTPSO maintains optimal performance on f 5 to f 10 and ranks second on f 4 . In hybrid and composition functions, MSTPSO maintains stable rankings, while other RL-PSO variants show significant performance degradation, fully demonstrating the robustness and stability of the topology structure in high-dimensional complex optimization tasks. Overall, compared with several state-of-the-art PSO algorithms, the results indicate that MSTPSO ranks first across the 10-D, 30-D, 50-D, and 100-D CEC 2017 test suites, highlighting its superior performance.

4.3. Convergence Curves and Dynamic Performance Analysis

To further evaluate the convergence performance of MSTPSO, convergence curves are plotted. It should be noted that due to minor differences in initialization strategies, the early-stage convergence behavior of different algorithms may vary. Figure 5 illustrates the convergence processes for functions f 1 f 17 in 50 dimensions. The proposed method achieves the fastest convergence on f 1 , demonstrating significant advantages in solving unimodal functions through multi-agent reinforcement learning-based parameter training. For multimodal functions, MSTPSO consistently shows the fastest convergence on f 5 f 10 . For hybrid functions, MSTPSO also converges quickly on f 11 f 13 , f 15 f 17 , with strong capability to escape local optima. These results confirm that MSTPSO possesses strong global search ability and robustness in solving complex hybrid optimization problems.

4.4. Boxplot Analysis and Robustness Evaluation

As shown in Figure 6, the boxplot analysis, indicates that MSTPSO demonstrates varying performance across functions. For unimodal function f 1 , it shows strong convergence and stability, approaching the optimal solution quickly with almost no outliers. For multimodal functions f 4 , f 6 , f 7 , and f 10 , MSTPSO effectively avoids local optima, although a few outliers appear on f 10 , indicating occasional influence by local traps. For hybrid functions f 12 , f 16 , f 19 , and f 20 , MSTPSO maintains good solution quality and stability, with only minor fluctuations on f 19 but avoiding many outliers overall. For composition functions f 25 and f 30 , especially under high-dimensional complex conditions, MSTPSO shows strong global search capability, successfully escaping local optima with fast convergence and fewer outliers. In summary, MSTPSO demonstrates excellent performance across different optimization problems, particularly excelling on unimodal, multimodal, and composition functions.

4.5. Significance Test Analysis (Wilcoxon Rank-Sum Test)

To further verify the statistical superiority of MSTPSO, Table 7 presents the Wilcoxon signed-rank test results of ten algorithms on the CEC2017 benchmark functions under 10-, 30-, and 50-dimensional settings. A significance level of 0.05 was used to determine whether the performance differences are statistically meaningful, and the p-values obtained from the Wilcoxon test indicate the degree of significance. Here, “+” denotes that MSTPSO significantly outperforms the compared algorithm, “−” indicates significant inferiority, and “≈” represents no significant difference. Compared with advanced variants such as DQNPSO, APSO_SAC, XPSO, and DRA, MSTPSO achieved statistically superior results on nearly all test functions, with complete wins across all dimensions against XPSO and DRA. Against EPSO, KGPSO, and MSORL, MSTPSO also maintained clear statistical superiority with only a few non-significant cases. Compared with RFO and RLACA, MSTPSO is slightly better at low dimensions and achieves greater superiority at 30-D and 50-D. Overall, the Wilcoxon results confirm that MSTPSO significantly outperforms most representative PSO variants across the majority of test functions and dimensions, verifying the statistical reliability of its performance gains.

4.6. Ablation Study on Topology Structures

4.6.1. Multi-Topology Ablation

To verify the contribution of each topology in MSTPSO, three ablation versions are tested: MSTPSO1 removes the FIPS topology, MSTPSO2 removes the small-world topology, and MSTPSO3 removes the exemplar topology. Their optimization performance on the 10- and 30-dimensional CEC2017 functions is compared with the full MSTPSO, as shown in Table 8.
The results show that MSTPSO achieves the best average ranking in both dimensions, confirming the effectiveness of the multi-topology integration in balancing global search and local convergence. The average rankings of MSTPSO are 2.31 in 10-D and 2.22 in 30-D, significantly better than the ablation versions. MSTPSO1 shows a clear decline in 10-D, especially on unimodal and multimodal functions, indicating the critical role of the FIPS topology in maintaining diversity and avoiding premature convergence. MSTPSO2 performs worse in multiple functions, particularly in 30-D, with an average rank of 3.81, demonstrating the importance of the small-world topology for global exploration. MSTPSO3 shows performance similar to MSTPSO2 but still inferior to the full version, especially on complex functions. Overall, MSTPSO’s superior performance validates the importance of multi-topology integration.

4.6.2. Ablation of Stagnation Detection and Hierarchical Response Mechanism

As shown in Table 9, the average ranking results indicate that introducing stagnation detection and the hierarchical response mechanism further improves overall performance in different dimensions. The complete algorithm achieves average ranks of 1.33, 1.43, and 1.47 in 10-D, 30-D, and 50-D, respectively, outperforming the version without this module. The improvement is particularly evident in high-complexity problems. On unimodal functions, both versions converge quickly, but MSTPSO ranks significantly higher in 10-D and 50-D, showing faster convergence and better escape ability. For multimodal functions, stagnation detection enhances global exploration, and in 30-D and 50-D, the average rankings are much better than the baseline, indicating higher solution quality and stability in complex spaces. For hybrid and composition functions, the advantage is most significant, with the complete MSTPSO achieving markedly better average rankings in 10-D and 30-D. These results confirm that stagnation detection and hierarchical response are key modules for improving MSTPSO’s robustness and adaptability in high-dimensional optimization.
To evaluate the impact of stagnation detection thresholds on the stability of the algorithm, a parameter sensitivity study was conducted on the CEC2017 30-D and 50-D test suites, with the Friedman average rankings shown in Table 10, Table 11 and Table 12. The individual stagnation threshold T p , the population stagnation threshold T i , and the lower velocity bound V min were set within the ranges 10, 15, 20, 30, 5, 10, 15, 20 and e 4 , e 5 , e 6 , respectively. The experimental results show that the performance of MSTPSO remains stable within these parameter ranges, with only slight fluctuations in the average rankings. The configuration T p = 20 , T i = 10 and V min = e 5 achieves the best average rankings in both the 30-D and 50-D cases.

5. Conclusions

This paper proposes a multi-topology Particle Swarm Optimization algorithm controlled by Q-learning (MSTPSO). The method introduces a reinforcement learning–driven adaptive topology selection mechanism into the standard PSO framework, enabling particles to dynamically switch among FIPS, small-world, and exemplar-set topologies to balance global exploration and local exploitation. Meanwhile, a diversity-entropy regulation mechanism is designed to maintain population diversity, and, based on stagnation detection, DE perturbations combined with a global best restart strategy are incorporated to avoid premature convergence and enhance local search capability. Unlike most existing RL-PSO variants, which mainly use reinforcement learning to adapt inertia weight or learning factors, the proposed method focuses on dynamic topology selection and improves performance through the combination of dual-layer experience replay, stagnation detection, and DE perturbations, thus complementing existing approaches in mechanism design. Systematic experiments on the CEC2017 benchmark functions demonstrate that MSTPSO achieves superior performance on unimodal, multimodal, hybrid, and composite functions across four dimensionalities (10, 30, 50, and 100). Compared with various advanced PSO variants, the proposed algorithm attains faster convergence, higher solution accuracy, and better stability, showing particularly strong robustness on high-dimensional complex functions. Further ablation studies reveal that the Q-learning-based topology selection mechanism significantly improves overall convergence quality, while the stagnation-driven DE perturbation and global restart strategy effectively enhance the ability to escape local optima. Nevertheless, some limitations remain. First, the Q-learning parameter settings may require task-specific adjustment and show certain sensitivity. Second, computing diversity entropy and pairwise distance matrices incurs higher computational cost for large populations and high-dimensional problems. Finally, current experiments are limited to benchmark functions, and the applicability of the algorithm to complex constrained optimization and real engineering problems requires further investigation. Future work may proceed in several directions: (1) optimizing the reinforcement learning control strategy to reduce parameter sensitivity and improve efficiency on large-scale, high-dimensional problems; (2) extending the algorithm to more complex scenarios such as constrained, dynamic, and multiobjective optimization to further verify its generality; (3) exploring parallel computing and hardware acceleration methods, such as GPU computing and distributed architectures, to enhance scalability; and (4) applying the algorithm to real engineering tasks—including industrial scheduling, image and vision inspection, and path planning—to validate its practical value.

Author Contributions

X.H., conceptualization; work preparation; S.W., conceptualization; methodology; software; validation; formal analysis; investigation; resources; data curation; visualization; writing—original draft preparation; writing—review and editing; X.L., work preparation; data curation; T.W., conceptualization; supervision; writing—review and editing; project administration; funding acquisition; G.Q., conceptualization; supervision; Z.Z., writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article; our code is released at https://github.com/shenweiwang767-ai/MSTPSO.git, accessed on 15 October 2025.

Acknowledgments

The authors gratefully acknowledge the support from the Scientific Research Project of the Department of Education of Guangdong Province (Grant No. 2022ZDZX3034) and the Key Platform and Research Project for Ordinary Universities of Guangdong Province (Grant No. 2024ZDZX1009). The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tanabe, R.; Fukunaga, A. Success-history based parameter adaptation for differential evolution. In Proceedings of the 2013 IEEE Congress on Evolutionary Computation, Cancun, Mexico, 20–23 June 2013; pp. 71–78. [Google Scholar]
  2. Liu, Y.; Liu, J.; Ding, J.; Yang, S.; Jin, Y. A surrogate-assisted differential evolution with knowledge transfer for expensive incremental optimization problems. IEEE Trans. Evol. Comput. 2023, 28, 1039–1053. [Google Scholar] [CrossRef]
  3. Lambora, A.; Gupta, K.; Chopra, K. Genetic algorithm—A literature review. In Proceedings of the 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, 14–16 February 2019; pp. 380–384. [Google Scholar]
  4. Ngo, L.; Ha, H.; Chan, J.; Nguyen, V.; Zhang, H. High-dimensional Bayesian optimization via covariance matrix adaptation strategy. arXiv 2024, arXiv:2402.03104. [Google Scholar] [CrossRef]
  5. Xiao, S.; Wang, H.; Wang, W.; Huang, Z.; Zhou, X.; Xu, M. Artificial bee colony algorithm based on adaptive neighborhood search and Gaussian perturbation. Appl. Soft Comput. 2021, 100, 106955. [Google Scholar] [CrossRef]
  6. Shi, Y.; Eberhart, R. A modified particle swarm optimizer. In Proceedings of the 1998 IEEE International Conference on Evolutionary Computation Proceedings, San Diego, CA, USA, 25–27 March 1998; IEEE World Congress on Computational Intelligence (Cat. No. 98TH8360). pp. 69–73. [Google Scholar]
  7. Eberhart, R.; Shi, Y. Particle swarm optimization and its applications to VLSI design and video technology. In Proceedings of the 2005 IEEE International Workshop on VLSI Design and Video Technology, Suzhou, China, 28–30 May 2005; p. xxiii. [Google Scholar]
  8. Ratnaweera, A.; Halgamuge, S.K.; Watson, H.C. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients. IEEE Trans. Evol. Comput. 2004, 8, 240–255. [Google Scholar] [CrossRef]
  9. Chen, Q.; Li, C.; Guo, W. Railway passenger volume forecast based on IPSO-BP neural network. In Proceedings of the 2009 International Conference on Information Technology and Computer Science, Kiev, Ukraine, 25–26 July 2009; Volume 2, pp. 255–258. [Google Scholar]
  10. Liu, H.R.; Cui, J.C.; Lu, Z.D.; Liu, D.Y.; Deng, Y.J. A hierarchical simple particle swarm optimization with mean dimensional information. Appl. Soft Comput. 2019, 76, 712–725. [Google Scholar] [CrossRef]
  11. El-Kenawy, E.S.; Eid, M. Hybrid gray wolf and particle swarm optimization for feature selection. Int. J. Innov. Comput. Inf. Control 2020, 16, 831–844. [Google Scholar]
  12. Adamu, A.; Abdullahi, M.; Junaidu, S.B.; Hassan, I.H. An hybrid particle swarm optimization with crow search algorithm for feature selection. Mach. Learn. Appl. 2021, 6, 100108. [Google Scholar] [CrossRef]
  13. Khan, T.A.; Ling, S.H. A novel hybrid gravitational search particle swarm optimization algorithm. Eng. Appl. Artif. Intell. 2021, 102, 104263. [Google Scholar] [CrossRef]
  14. Lin, A.; Liu, D.; Li, Z.; Hasanien, H.M.; Shi, Y. Heterogeneous differential evolution particle swarm optimization with local search. Complex Intell. Syst. 2023, 9, 6905–6925. [Google Scholar] [CrossRef]
  15. Lin, A.; Li, S.; Liu, R. Mutual learning differential particle swarm optimization. Egypt. Inform. J. 2022, 23, 469–481. [Google Scholar] [CrossRef]
  16. Chen, H.; Shen, L.Y.; Wang, C.; Tian, L.; Zhang, S. Multi Actors-Critic based particle swarm optimization algorithm. Neurocomputing 2025, 624, 129460. [Google Scholar] [CrossRef]
  17. Zhongxing, L.; Fangxi, Z.; Bingchen, L.; Shaohua, X. Quality monitoring of tractor hydraulic oil based on improved PSO-BPNN. J. Chin. Agric. Mech. 2024, 45, 140. [Google Scholar]
  18. Huang, Y.; Li, W.; Tian, F.; Meng, X. A fitness landscape ruggedness multiobjective differential evolution algorithm with a reinforcement learning strategy. Appl. Soft Comput. 2020, 96, 106693. [Google Scholar] [CrossRef]
  19. Lu, L.; Zheng, H.; Jie, J.; Zhang, M.; Dai, R. Reinforcement learning-based particle swarm optimization for sewage treatment control. Complex Intell. Syst. 2021, 7, 2199–2210. [Google Scholar] [CrossRef]
  20. Janson, S.; Middendorf, M. A hierarchical particle swarm optimizer and its adaptive variant. IEEE Trans. Syst. Man Cybern. Part 2005, 35, 1272–1282. [Google Scholar] [CrossRef] [PubMed]
  21. Zhang, J.; Zhang, J.; Ji, W.; Sun, X.; Zhang, L. Particle swarm optimization algorithm with self-correcting and dimension by dimension learning capabilities. J. Chin. Comput. Syst. 2021, 42, 919–926. [Google Scholar]
  22. Xia, X.; Gui, L.; He, G.; Wei, B.; Zhang, Y.; Yu, F.; Wu, H.; Zhan, Z.H. An expanded particle swarm optimization based on multi-exemplar and forgetting ability. Inf. Sci. 2020, 508, 105–120. [Google Scholar] [CrossRef]
  23. Jiang, S.; Ding, J.; Zhang, L. A personalized recommendation algorithm based on weighted information entropy and particle swarm optimization. Mob. Inf. Syst. 2021, 2021, 3209140. [Google Scholar] [CrossRef]
  24. Engelbrecht, A.P. Computational Intelligence: An Introduction; Wiley Online Library: Hoboken, NJ, USA, 2007; Volume 2. [Google Scholar]
  25. Liu, Q.; Van Wyk, B.J.; Sun, Y. Small world network based dynamic topology for particle swarm optimization. In Proceedings of the 2015 11th International Conference on Natural Computation (ICNC), Zhangjiajie, China, 15–17 August 2015; pp. 289–294. [Google Scholar]
  26. Zeng, N.; Wang, Z.; Liu, W.; Zhang, H.; Hone, K.; Liu, X. A dynamic neighborhood-based switching particle swarm optimization algorithm. IEEE Trans. Cybern. 2020, 52, 9290–9301. [Google Scholar] [CrossRef]
  27. Pan, L.; Zhao, Y.; Li, L. Neighborhood-based particle swarm optimization with discrete crossover for nonlinear equation systems. Swarm Evol. Comput. 2022, 69, 101019. [Google Scholar] [CrossRef]
  28. Li, T.; Shi, J.; Deng, W.; Hu, Z. Pyramid particle swarm optimization with novel strategies of competition and cooperation. Appl. Soft Comput. 2022, 121, 108731. [Google Scholar] [CrossRef]
  29. Li, W.; Liang, P.; Sun, B.; Sun, Y.; Huang, Y. Reinforcement learning-based particle swarm optimization with neighborhood differential mutation strategy. Swarm Evol. Comput. 2023, 78, 101274. [Google Scholar] [CrossRef]
  30. Jiang, C.; Wang, J. Time domain waveform synthesis method of shock response spectrum based on PSO-LSN algorithm. J. Vib. Shock 2024, 43, 102–107. [Google Scholar]
  31. Tatsis, V.A.; Parsopoulos, K.E. Dynamic parameter adaptation in metaheuristics using gradient approximation and line search. Appl. Soft Comput. 2019, 74, 368–384. [Google Scholar] [CrossRef]
  32. Liu, Y.; Lu, H.; Cheng, S.; Shi, Y. An adaptive online parameter control algorithm for particle swarm optimization based on reinforcement learning. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation (CEC), Wellington, New Zealand, 10–13 June 2019; pp. 815–822. [Google Scholar]
  33. Yin, S.; Jin, M.; Lu, H.; Gong, G.; Mao, W.; Chen, G.; Li, W. Reinforcement-learning-based parameter adaptation method for particle swarm optimization. Complex Intell. Syst. 2023, 9, 5585–5609. [Google Scholar] [CrossRef]
  34. Hamad, Q.S.; Samma, H.; Suandi, S.A.; Mohamad-Saleh, J. Q-learning embedded sine cosine algorithm (QLESCA). Expert Syst. Appl. 2022, 193, 116417. [Google Scholar] [CrossRef]
  35. Hamad, Q.S.; Samma, H.; Suandi, S.A. Feature selection of pre-trained shallow CNN using the QLESCA optimizer: COVID-19 detection as a case study. Appl. Intell. 2023, 53, 18630–18652. [Google Scholar] [CrossRef]
  36. Wang, F.; Wang, X.; Sun, S. A reinforcement learning level-based particle swarm optimization algorithm for large-scale optimization. Inf. Sci. 2022, 602, 298–312. [Google Scholar] [CrossRef]
  37. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 1998; Volume 1. [Google Scholar]
  38. Watkins, C.J.; Dayan, P. Q-learning. Mach. Learn. 1992, 8, 279–292. [Google Scholar] [CrossRef]
  39. Qiang, W.; Zhongli, Z. Reinforcement learning model, algorithms and its application. In Proceedings of the 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), Jilin, China, 19–22 August 2011; pp. 1143–1146. [Google Scholar]
  40. Awad, N.H.; Ali, M.Z.; Suganthan, P.N. Ensemble sinusoidal differential covariance matrix adaptation with Euclidean neighborhood for solving CEC2017 benchmark problems. In Proceedings of the 2017 IEEE Congress on Evolutionary Computation (CEC), Donostia/San Sebastian, Spain, 5–8 June 2017; pp. 372–379. [Google Scholar]
  41. Mendes, R.; Kennedy, J.; Neves, J. The fully informed particle swarm: Simpler, maybe better. IEEE Trans. Evol. Comput. 2004, 8, 204–210. [Google Scholar] [CrossRef]
  42. Wu, X.; Han, J.; Wang, D.; Gao, P.; Cui, Q.; Chen, L.; Liang, Y.; Huang, H.; Lee, H.P.; Miao, C.; et al. Incorporating surprisingly popular algorithm and euclidean distance-based adaptive topology into PSO. Swarm Evol. Comput. 2023, 76, 101222. [Google Scholar] [CrossRef]
  43. Aoun, O. Deep Q-network-enhanced self-tuning control of particle swarm optimization. Modelling 2024, 5, 1709–1728. [Google Scholar] [CrossRef]
  44. von Eschwege, D.; Engelbrecht, A. Soft actor-critic approach to self-adaptive particle swarm optimisation. Mathematics 2024, 12, 3481. [Google Scholar] [CrossRef]
  45. Yuan, Y.L.; Hu, C.M.; Li, L.; Mei, Y.; Wang, X.Y. Regional-modal optimization problems and corresponding normal search particle swarm optimization algorithm. Swarm Evol. Comput. 2023, 78, 101257. [Google Scholar] [CrossRef]
  46. Zhang, D.; Ma, G.; Deng, Z.; Wang, Q.; Zhang, G.; Zhou, W. A self-adaptive gradient-based particle swarm optimization algorithm with dynamic population topology. Appl. Soft Comput. 2022, 130, 109660. [Google Scholar] [CrossRef]
  47. Wang, X.; Wang, F.; He, Q.; Guo, Y. A multi-swarm optimizer with a reinforcement learning mechanism for large-scale optimization. Swarm Evol. Comput. 2024, 86, 101486. [Google Scholar] [CrossRef]
  48. Mozhdehi, A.T.; Khodadadi, N.; Aboutalebi, M.; El-kenawy, E.S.M.; Hussien, A.G.; Zhao, W.; Nadimi-Shahraki, M.H.; Mirjalili, S. Divine Religions Algorithm: A novel social-inspired metaheuristic algorithm for engineering and continuous optimization problems. Clust. Comput. 2025, 28, 253. [Google Scholar] [CrossRef]
  49. Braik, M.; Al-Hiary, H. Rüppell’s fox optimizer: A novel meta-heuristic approach for solving global optimization problems. Clust. Comput. 2025, 28, 1–77. [Google Scholar] [CrossRef]
  50. Liu, X.; Wang, T.; Zeng, Z.; Tian, Y.; Tong, J. Three stage based reinforcement learning for combining multiple metaheuristic algorithms. Swarm Evol. Comput. 2025, 95, 101935. [Google Scholar] [CrossRef]
Figure 1. The model of Q-learning.
Figure 1. The model of Q-learning.
Algorithms 18 00672 g001
Figure 2. Algorithmic flow diagram.
Figure 2. Algorithmic flow diagram.
Algorithms 18 00672 g002
Figure 3. Topology comparison chart.
Figure 3. Topology comparison chart.
Algorithms 18 00672 g003
Figure 4. Differential evolutionary perturbations.
Figure 4. Differential evolutionary perturbations.
Algorithms 18 00672 g004
Figure 5. The convergence curves of MSTPSO with PSO variants in the CEC2017 (50-D).
Figure 5. The convergence curves of MSTPSO with PSO variants in the CEC2017 (50-D).
Algorithms 18 00672 g005
Figure 6. Stability effect of MSTPSO with PSO variants in the CEC2017 (50-D).
Figure 6. Stability effect of MSTPSO with PSO variants in the CEC2017 (50-D).
Algorithms 18 00672 g006
Table 1. The information of the test suite on CEC2017.
Table 1. The information of the test suite on CEC2017.
No.Function NameRange
f 1 Shifted and Rotated Bent Cigar Function[ 100 , 100 ]
f 3 Shifted and Rotated Zakharov Function[ 100 , 100 ]
f 4 Shifted and Rotated Rosenbrock’s Function[ 100 , 100 ]
f 5 Shifted and Rotated Rastrigin’s Function[ 100 , 100 ]
f 6 Shifted and Rotated Expanded Scaffer’s Function[ 100 , 100 ]
f 7 Shifted and Rotated Lunacek Bi-Rastrigin’s Function[ 100 , 100 ]
f 8 Shifted and Rotated Non-Continuous Rastrigin’s Function[ 100 , 100 ]
f 9 Shifted and Rotated Levy Function[ 100 , 100 ]
f 10 Shifted and Rotated Schwefel’s Function[ 100 , 100 ]
f 11 Hybrid Function 1 (N = 3)[ 100 , 100 ]
f 12 Hybrid Function 2 (N = 3)[ 100 , 100 ]
f 13 Hybrid Function 3 (N = 3)[ 100 , 100 ]
f 14 Hybrid Function 4 (N = 4)[ 100 , 100 ]
f 15 Hybrid Function 5 (N = 4)[ 100 , 100 ]
f 16 Hybrid Function 6 (N = 4)[ 100 , 100 ]
f 17 Hybrid Function 6 (N = 5)[ 100 , 100 ]
f 18 Hybrid Function 6 (N = 5)[ 100 , 100 ]
f 19 Hybrid Function 6 (N = 5)[ 100 , 100 ]
f 20 Hybrid Function 6 (N = 6)[ 100 , 100 ]
f 21 Composition Function 1 (N = 3)[ 100 , 100 ]
f 22 Composition Function 2 (N = 3)[ 100 , 100 ]
f 23 Composition Function 3 (N = 4)[ 100 , 100 ]
f 24 Composition Function 4 (N = 4)[ 100 , 100 ]
f 25 Composition Function 5 (N = 5)[ 100 , 100 ]
f 26 Composition Function 6 (N = 5)[ 100 , 100 ]
f 27 Composition Function 7 (N = 6)[ 100 , 100 ]
f 28 Composition Function 8 (N = 6)[ 100 , 100 ]
f 29 Composition Function 9 (N = 3)[ 100 , 100 ]
f 30 Composition Function 10 (N = 3)[ 100 , 100 ]
Table 2. Parameter settings for the nine compared algorithms.
Table 2. Parameter settings for the nine compared algorithms.
AlgorithmsYearParameters Setting
DQNPSO2024 c 1 = 2 , c 2 = 2 , ω = 0.5 , γ = 0.001
APSO_SAC2024 β = 1 , τ = 0.005 , γ = 0.001
EPSO2023 c 1 = 2 , c 2 = 2 , ω = 0.5
KGPSO2020 c 1 = 2.13 1.13 , c 2 = [ 0.93 ,   1.75 ] , c 3 = [ 0.6 ,   1.3 ] , ω = 0.8 0.35
XPSO2023 η = 0.2 , S t a g m a x = 5 , p = 0.2 , ω = [ 0.4 ,   0.9 ]
MSORL2025 ϕ 1 = ϕ 2 = 0.4 , α = 0.4 , γ = 0.8
DRA2025 N = 50 , B S P = 0.5 , R P = 0.2
RFO2025 L = 100 , β = 10 4
RLACA2023 c 1 = 2.2 , c 2 = 1.8 , ω = [ 0 ,   1 ]
Table 3. The comparative results of MSTPSO and advanced algorithms on 10-D using the CEC2017 benchmark suite.
Table 3. The comparative results of MSTPSO and advanced algorithms on 10-D using the CEC2017 benchmark suite.
Function10-DMSTPSODQNPSOAPSO_SACEPSOKGPSOXPSOMSORLDRARFORLACA
f1Mean6.07 × 1022.15 × 1086.67 × 1081.02 × 1039.45 × 1076.76 × 1067.66 × 1071.20 × 10106.44 × 1052.77 × 102
Std2.65 × 10−21.09 × 1082.581.03 × 10−31.08 × 1064.83 × 1094.09 × 10−25.66 × 1087.01 × 10−52.27 × 10−2
f3Mean01.061.156.49 × 10−101.09 × 1021.50 × 1024.07 × 1021.50 × 1041.64 × 1020
Std2.10 × 10−18.37 × 1051.16 × 1025.82 × 10−112.54 × 10−133.50 × 1096.31 × 1031.66 × 1031.85 × 10−80
f4Mean7.11 × 10−21.95 × 1013.80 × 1013.711.14 × 1017.062.97 × 1019.12 × 1028.600
Std5.78 × 10−48.35 × 1017.08 × 10−11.42 × 10−34.871.13 × 1034.85 × 10−81.29 × 1025.77 × 10−110
f5Mean2.201.45 × 1012.20 × 1013.06 × 1019.832.48 × 1014.69 × 1019.43 × 1012.04 × 1011.88 × 101
Std3.54 × 10−22.39 × 1011.561.543.27 × 1013.87 × 1012.64 × 1011.50 × 1015.02 × 10−112.48 × 10−7
f6Mean4.78 × 10−75.32 × 10−18.51 × 10−11.251.86 × 10−12.112.41 × 1015.23 × 1014.788.05 × 10−1
Std1.10 × 10−11.35 × 10−11.64 × 10−22.80 × 10−58.37 × 10−91.27 × 1013.48 × 1017.812.36 × 10−71.27 × 10−9
f7Mean1.18 × 1011.81 × 1012.39 × 1014.04 × 1012.38 × 1014.83 × 1012.31 × 1011.19 × 1022.99 × 1012.37 × 101
Std3.35 × 10−47.71 × 1013.032.71 × 10−13.32 × 1013.58 × 1013.091.22 × 1013.27 × 10−109.71 × 10−14
f8Mean1.581.16 × 1012.02 × 1011.56 × 1016.902.25 × 1011.92 × 1016.06 × 1011.73 × 1011.91 × 101
Std3.88 × 10−12.54 × 1011.281.60 × 10−13.29 × 1012.41 × 1013.10 × 1018.066.25 × 10−111.99 × 10−9
f9Mean01.62 × 10−11.65 × 1015.53 × 10103.162.85 × 1017.33 × 1022.97 × 1011.78 × 10−2
Std2.96 × 10−21.49 × 10−118.37 × 10−24.37 × 10−19.72 × 10−144.71 × 1021.05 × 1033.00 × 1022.12 × 10−101.38 × 10−14
f10Mean1.54 × 1024.53 × 1025.64 × 1028.77 × 1024.05 × 1028.28 × 1021.86 × 1031.75 × 1036.96 × 1026.63 × 102
Std7.36 × 10−125.44 × 1026.02 × 1018.533.31 × 1026.93 × 1023.45 × 1022.24 × 1023.93 × 10−91.18
f11Mean7.48 × 10−12.65 × 1014.66 × 1011.24 × 1019.311.63 × 1014.86 × 1011.87 × 1035.65 × 1011.77 × 101
Std1.76 × 10−42.23 × 1071.832.81 × 10−11.52 × 1012.98 × 1063.39 × 10−11.33 × 1042.01 × 10−104.66 × 10−7
f12Mean1.71 × 1046.71 × 1052.75 × 1067.51 × 1036.58 × 1054.65 × 1051.92 × 1042.99 × 1081.91 × 1038.35 × 103
Std3.63 × 10−14.72 × 1082.38 × 1055.21 × 10−23.19 × 1071.25 × 1091.97 × 10−31.52 × 1085.73 × 10−73.33
f13Mean1.89 × 1039.03 × 1031.04 × 1048.01 × 1036.25 × 1037.36 × 1034.36 × 1025.24 × 1042.64 × 1021.73 × 102
Std2.48 × 1045.32 × 1083.47 × 1032.44 × 10−11.51 × 1073.18 × 1082.59 × 1041.82 × 1076.71 × 10−81.08 × 10−1
f14Mean4.00 × 1011.17 × 1021.93 × 1021.13 × 1035.88 × 1017.48 × 1014.49 × 1012.29 × 1023.03 × 1015.56 × 101
Std1.59 × 1021.84 × 1087.48 × 1025.55 × 10−16.54 × 1063.65 × 1084.03 × 1042.63 × 1042.90 × 10−92.36 × 10−2
f15Mean2.36 × 1029.14 × 1024.52 × 1021.38 × 1036.83 × 1021.70 × 1027.61 × 1015.62 × 1035.38 × 1014.18 × 101
Std4.75 × 1041.98 × 1092.35 × 1034.38 × 10−12.58 × 1076.49 × 1082.05 × 1053.41 × 1047.83 × 10−89.40 × 10−1
f16Mean9.10 × 10−17.53 × 1018.93 × 1011.32 × 1021.38 × 1027.68 × 1013.62 × 1024.81 × 1026.40 × 1011.11 × 102
Std1.99 × 10−25.48 × 1021.22 × 1011.58 × 1012.08 × 1026.45 × 1022.12 × 1021.46 × 1027.40 × 10−81.76 × 10−1
f17Mean1.88 × 1014.73 × 1017.12 × 1014.61 × 1014.60 × 1014.21 × 1011.14 × 1021.32 × 1024.48 × 1015.02 × 101
Std1.501.05 × 1047.551.121.63 × 1028.43 × 1021.70 × 1027.43 × 1015.22 × 10−91.33 × 10−3
f18Mean2.82 × 1031.12 × 1042.22 × 1041.17 × 1041.73 × 1041.39 × 1041.24 × 1028.39 × 1068.42 × 1012.56 × 102
Std7.89 × 1041.75 × 1092.98 × 1043.111.62 × 1083.15 × 1091.42 × 10−12.82 × 1087.11 × 10−81.22 × 10−2
f19Mean8.71 × 1012.76 × 1031.25 × 1035.81 × 1034.92 × 1031.40 × 1022.36 × 1014.52 × 1031.98 × 1012.46 × 101
Std4.30 × 1049.72 × 1082.47 × 1034.65 × 10−19.36 × 1071.66 × 1093.00 × 1042.97 × 1076.42 × 10−102.12 × 10−3
f20Mean1.39 × 1011.94 × 1013.87 × 1013.77 × 1017.36 × 1016.04 × 1012.96 × 1022.06 × 1024.56 × 1017.54 × 101
Std2.38 × 10−18.52 × 1012.416.294.74 × 1013.47 × 1021.68 × 1026.90 × 1013.83 × 10−81.10 × 10−3
f21Mean1.55 × 1022.10 × 1022.15 × 1022.16 × 1021.99 × 1022.12 × 1022.35 × 1022.40 × 1022.10 × 1021.57 × 102
Std4.292.68 × 1011.253.70 × 10−13.07 × 1015.88 × 1016.00 × 1012.85 × 1014.78 × 10−118.87 × 10−5
f22Mean1.00 × 1021.32 × 1021.18 × 1021.03 × 1021.14 × 1021.11 × 1021.43 × 1021.09 × 1031.08 × 1021.02 × 102
Std8.10 × 10−11.57 × 1016.76 × 10−13.60 × 10−22.981.69 × 1021.25 × 1039.00 × 1013.93 × 10−103.96 × 10−13
f23Mean3.01 × 1023.23 × 1023.36 × 1023.22 × 1023.24 × 1023.25 × 1024.13 × 1024.45 × 1023.23 × 1023.20 × 102
Std2.05 × 10−11.83 × 1016.27 × 10−11.316.758.53 × 1014.01 × 1022.69 × 1011.15 × 10−101.77 × 10−2
f24Mean3.15 × 1023.38 × 1023.67 × 1023.38 × 1023.21 × 1023.34 × 1022.82 × 1024.39 × 1023.35 × 1023.42 × 102
Std7.65 × 10−12.87 × 1011.028.03 × 10−11.66 × 1018.27 × 1011.33 × 1022.47 × 1019.55 × 10−111.00 × 10−2
f25Mean4.32 × 1024.39 × 1024.50 × 1024.29 × 1024.30 × 1024.33 × 1024.35 × 1021.08 × 1034.34 × 1024.25 × 102
Std1.04 × 10−12.74 × 1017.72 × 10−17.40 × 10−32.31 × 10−27.22 × 1024.886.35 × 1018.83 × 10−111.53 × 10−3
f26Mean3.00 × 1025.16 × 1025.45 × 1023.59 × 1023.66 × 1022.98 × 1026.95 × 1021.50 × 1035.34 × 1023.51 × 102
Std4.49 × 10−11.02 × 1012.995.38 × 10−67.336.92 × 1025.69 × 1021.49 × 1023.00 × 10−104.71 × 10−4
f27Mean3.95 × 1024.08 × 1024.14 × 1023.96 × 1024.16 × 1024.01 × 1025.04 × 1024.81 × 1024.07 × 1023.75 × 102
Std1.812.03 × 1019.94 × 10−22.73 × 10−24.05 × 10−28.65 × 1023.11 × 1022.98 × 1015.35 × 10−116.27 × 10−1
f28Mean3.00 × 1025.71 × 1025.21 × 1025.13 × 1025.12 × 1025.04 × 1025.85 × 1029.46 × 1025.09 × 1024.02 × 102
Std8.681.94 × 1016.76 × 10−28.13 × 10−37.84 × 10−131.22 × 1033.79 × 1015.00 × 1016.02 × 10−112.62 × 101
f29Mean2.52 × 1022.88 × 1023.13 × 1023.08 × 1022.87 × 1022.92 × 1023.83 × 1025.27 × 1022.75 × 1022.83 × 102
Std4.69 × 10−22.10 × 1039.916.912.19 × 1025.59 × 1052.91 × 1031.34 × 1023.73 × 10−94.64 × 10−1
f30Mean1.00 × 1054.68 × 1055.62 × 1055.39 × 1052.97 × 1054.01 × 1056.26 × 1048.06 × 1063.53 × 1052.13 × 103
Std3.29 × 1062.68 × 1081.17 × 1041.20 × 1021.12 × 1077.71 × 1082.40 × 1071.12 × 1073.01 × 10−91.16 × 105
Table 4. The comparative results of MSTPSO and advanced algorithms on 30-D using the CEC2017 benchmark suite.
Table 4. The comparative results of MSTPSO and advanced algorithms on 30-D using the CEC2017 benchmark suite.
Function30-DMSTPSODQNPSOAPSO_SACEPSOKGPSOXPSOMSORLDRARFORLACA
f1Mean1.42 × 1037.01 × 1096.54 × 1093.80 × 1031.06 × 1091.34 × 1091.33 × 10106.40 × 10102.69 × 10101.66 × 10−2
Std1.42 × 10−11.47 × 1082.19 × 1034.46 × 10−34.79 × 1036.91 × 1096.85 × 10−34.06 × 1074.71 × 10−28.71 × 10−8
f3Mean1.58 × 1036.79 × 1034.57 × 1045.59 × 1036.64 × 1028.48 × 1033.03 × 1048.50 × 1046.48 × 1040
Std3.19 × 1076.95 × 10132.89 × 1036.295.16 × 10−42.05 × 10132.80 × 10−75.61 × 1039.33 × 10−77.15 × 10−14
f4Mean8.66 × 1011.06 × 1037.13 × 1027.89 × 1011.72 × 1022.59 × 1022.25 × 1031.63 × 1044.18 × 1035.02 × 101
Std2.46 × 1025.39 × 1011.132.34 × 10−36.553.01 × 1041.51 × 10−77.43 × 1011.07 × 10−86.33 × 10−1
f5Mean1.23 × 1011.22 × 1021.66 × 1021.47 × 1024.67 × 1011.87 × 1022.43 × 1024.49 × 1022.55 × 1021.11 × 102
Std6.84 × 10−12.762.821.056.62 × 1015.66 × 1015.68 × 1011.13 × 1011.43 × 10−91.02 × 10−4
f6Mean1.88 × 10−31.18 × 1011.68 × 1012.63 × 1011.011.60 × 1015.69 × 1019.21 × 1014.86 × 1018.75
Std1.12 × 10−64.69 × 10−11.09 × 10−11.78 × 10−41.46 × 10−89.762.24 × 1013.605.00 × 10−107.64 × 10−8
f7Mean3.89 × 1012.18 × 1022.32 × 1022.35 × 1021.02 × 1023.10 × 1023.17 × 1027.85 × 1025.63 × 1021.29 × 102
Std6.043.461.47 × 1012.128.00 × 1019.00 × 1012.13 × 10−11.19 × 1015.94 × 10−91.48 × 10−13
f8Mean1.14 × 1011.21 × 1021.38 × 1021.25 × 1024.01 × 1011.83 × 1021.77 × 1023.65 × 1022.17 × 1021.08 × 102
Std3.752.023.681.836.54 × 1015.13 × 1015.92 × 1017.891.35 × 10−93.60 × 10−3
f9Mean6.91 × 10−21.64 × 1032.16 × 1032.98 × 1036.47 × 1016.70 × 1022.65 × 1031.04 × 1044.14 × 1032.23 × 102
Std4.64 × 10−63.32 × 1035.82 × 1023.70 × 1015.22 × 1012.07 × 1031.28 × 1031.31 × 1031.84 × 10−71.52 × 10−2
f10Mean1.77 × 1033.22 × 1033.69 × 1033.63 × 1032.30 × 1036.67 × 1036.78 × 1038.25 × 1035.08 × 1033.37 × 103
Std1.201.09 × 1037.65 × 1024.721.74 × 1031.10 × 1038.34 × 1022.07 × 1023.76 × 10−81.81 × 101
f11Mean2.77 × 1013.80 × 1027.50 × 1021.21 × 1029.27 × 1013.62 × 1026.48 × 1027.86 × 1031.36 × 1031.25 × 102
Std1.91 × 1014.42 × 1041.016.75 × 10−53.231.29 × 1081.15 × 10−51.05 × 1041.73 × 10−81.76 × 10−1
f12Mean2.33 × 1043.50 × 1082.70 × 1081.52 × 1056.60 × 1071.11 × 1089.22 × 1081.63 × 10109.87 × 1081.55 × 105
Std1.02 × 1054.09 × 1071.02 × 1064.16 × 1015.21 × 1062.57 × 1092.15 × 10−36.72 × 1075.46 × 10−31.02 × 104
f13Mean5.45 × 1031.64 × 1085.68 × 1081.19 × 1044.71 × 1071.21 × 1072.75 × 1079.49 × 1091.63 × 1051.39 × 104
Std1.37 × 1011.67 × 1084.36 × 1012.97 × 10−13.08 × 1016.75 × 1097.35 × 10−13.74 × 1082.30 × 10−52.99 × 10−1
f14Mean2.60 × 1032.65 × 1046.00 × 1041.16 × 1049.28 × 1032.73 × 1043.21 × 1033.80 × 1063.11 × 1023.11 × 102
Std2.313.29 × 1084.91 × 1045.56 × 10−16.88 × 1064.60 × 1083.19 × 10−85.38 × 1066.18 × 10−87.95 × 10−2
f15Mean3.15 × 1036.16 × 1046.42 × 1042.35 × 1036.17 × 1035.24 × 1051.31 × 1041.02 × 1091.91 × 1043.43 × 103
Std8.838.60 × 1072.92 × 1042.42 × 10−36.41 × 1052.56 × 1094.04 × 10−73.43 × 1075.06 × 10−61.08 × 10−2
f16Mean2.12 × 1028.88 × 1021.34 × 1031.14 × 1036.14 × 1021.39 × 1031.86 × 1034.43 × 1031.24 × 1039.41 × 102
Std1.95 × 1014.61 × 1021.09 × 1011.86 × 1013.70 × 1022.89 × 1034.03 × 1023.81 × 1028.63 × 10−91.45
f17Mean3.05 × 1014.64 × 1027.04 × 1024.14 × 1022.65 × 1023.18 × 1028.90 × 1023.64 × 1034.95 × 1024.69 × 102
Std1.14 × 10−29.03 × 1032.00 × 1014.40 × 1011.93 × 1023.27 × 1042.31 × 1023.34 × 1023.99 × 10−92.13
f18Mean8.13 × 1042.42 × 1051.57 × 1061.26 × 1052.35 × 1058.33 × 1051.35 × 1053.51 × 1074.53 × 1042.38 × 104
Std4.28 × 1011.02 × 1092.89 × 1055.509.20 × 1071.98 × 1094.79 × 10−72.79 × 1073.24 × 10−61.85 × 104
f19Mean5.31 × 1031.94 × 1066.53 × 1074.85 × 1039.62 × 1031.02 × 1063.11 × 1059.23 × 1082.28 × 1051.23 × 103
Std5.07 × 10−29.31 × 1062.82 × 1025.34 × 10−48.94 × 1051.48 × 1095.08 × 1013.37 × 1071.70 × 10−52.96 × 10−2
f20Mean1.38 × 1023.00 × 1024.91 × 1024.02 × 1022.32 × 1024.01 × 1027.35 × 1021.03 × 1034.18 × 1025.16 × 102
Std2.78 × 10−22.35 × 1023.69 × 1011.84 × 1011.62 × 1025.72 × 1023.99 × 1021.32 × 1025.76 × 10−91.36
f21Mean2.12 × 1023.27 × 1023.55 × 1023.16 × 1022.58 × 1023.78 × 1024.52 × 1026.69 × 1024.48 × 1023.11 × 102
Std5.96 × 10−11.034.332.756.82 × 1016.36 × 1017.99 × 1011.50 × 1011.32 × 10−91.85 × 10−2
f22Mean1.00 × 1022.55 × 1033.76 × 1033.57 × 1028.81 × 1026.13 × 1023.45 × 1037.71 × 1034.58 × 1031.47 × 103
Std1.34 × 10−36.98 × 1023.59 × 1021.02 × 10−87.62 × 1011.03 × 1032.82 × 1031.40 × 1022.64 × 10−88.03
f23Mean3.48 × 1025.86 × 1026.79 × 1024.93 × 1025.69 × 1025.43 × 1029.94 × 1021.27 × 1037.15 × 1025.29 × 102
Std2.826.21 × 10−12.242.236.942.84 × 1026.45 × 1025.20 × 1011.41 × 10−91.28 × 10−1
f24Mean4.23 × 1027.15 × 1027.31 × 1025.60 × 1026.58 × 1026.05 × 1029.45 × 1021.40 × 1037.64 × 1026.23 × 102
Std1.331.49 × 1016.483.682.17 × 1012.14 × 1022.35 × 1013.12 × 1011.20 × 10−96.97 × 10−2
f25Mean3.88 × 1026.64 × 1026.70 × 1023.98 × 1023.99 × 1024.95 × 1028.71 × 1023.04 × 1031.49 × 1033.80 × 102
Std8.403.67 × 1018.41 × 10−14.81 × 10−135.13 × 10−132.24 × 1039.03 × 10−132.14 × 1012.98 × 10−99.24 × 10−2
f26Mean9.17 × 1023.47 × 1033.47 × 1032.14 × 1031.65 × 1032.13 × 1035.25 × 1039.73 × 1035.13 × 1031.87 × 103
Std1.00 × 10−22.80 × 1011.44 × 1011.10 × 1017.20 × 1012.39 × 1039.83 × 1017.23 × 1011.42 × 10−86.39 × 10−1
f27Mean5.17 × 1026.04 × 1026.04 × 1025.30 × 1025.88 × 1025.64 × 1021.09 × 1031.71 × 1036.55 × 1025.00 × 102
Std6.225.28 × 10−13.68 × 10−14.83 × 10−51.81 × 10−11.49 × 1034.55 × 1027.85 × 1015.51 × 10−109.05 × 101
f28Mean3.57 × 1021.19 × 1031.05 × 1034.01 × 1024.97 × 1025.74 × 1021.35 × 1035.23 × 1032.03 × 1033.66 × 102
Std2.84 × 1015.91 × 10−44.59 × 10−12.24 × 10−31.70 × 1015.63 × 1034.56 × 10−127.742.88 × 10−91.58 × 10−1
f29Mean4.43 × 1021.04 × 1031.16 × 1038.33 × 1026.55 × 1021.07 × 1032.32 × 1036.62 × 1031.40 × 1037.72 × 102
Std2.54 × 10−13.62 × 1046.943.86 × 1011.20 × 1026.77 × 1051.37 × 1024.12 × 1025.56 × 10−92.41 × 101
f30Mean4.16 × 1034.87 × 1061.85 × 1064.96 × 1039.40 × 1044.27 × 1061.32 × 1071.81 × 1092.16 × 1061.17 × 104
Std1.58 × 1031.19 × 1063.77 × 1033.21 × 10−21.67 × 1059.84 × 1088.786.46 × 1074.54 × 10−52.85 × 105
Table 5. The comparative results of MSTPSO and advanced algorithms on 50-D using the CEC2017 benchmark suite.
Table 5. The comparative results of MSTPSO and advanced algorithms on 50-D using the CEC2017 benchmark suite.
Function50-DMSTPSODQNPSOAPSO_SACEPSOKGPSOXPSOMSORLDRARFORLACA
f1Mean1.97 × 1033.59 × 10102.14 × 10101.62 × 1033.64 × 1096.26 × 1094.81 × 10101.21 × 10118.59 × 10103.43 × 103
Std2.27 × 10−32.97 × 1081.75 × 1061.02 × 10−44.95 × 1011.04 × 10108.56 × 10−34.19 × 1068.60 × 10−24.35 × 10−5
f3Mean1.60 × 1043.14 × 1041.35 × 1053.25 × 1043.69 × 1032.47 × 1049.15 × 1041.81 × 1051.67 × 1050
Std1.11 × 1037.63 × 10148.05 × 1038.242.00 × 1076.11 × 10141.09 × 1052.06 × 1052.27 × 10−67.01 × 10−13
f4Mean1.84 × 1024.37 × 1032.85 × 1031.16 × 1023.56 × 1027.01 × 1029.18 × 1034.37 × 1041.78 × 1048.65 × 101
Std1.59 × 1021.54 × 1022.567.57 × 10−42.19 × 1011.59 × 1042.86 × 10−106.562.77 × 10−88.86 × 10−1
f5Mean2.45 × 1012.77 × 1023.40 × 1022.70 × 1028.98 × 1013.91 × 1024.55 × 1027.29 × 1025.39 × 1022.48 × 102
Std1.06 × 10−21.93 × 10−12.864.56 × 10−36.03 × 1016.10 × 1015.65 × 1014.132.17 × 10−94.24 × 10−9
f6Mean3.77 × 10−23.09 × 1013.52 × 1014.04 × 1011.882.43 × 1017.02 × 1011.05 × 1026.90 × 1012.11 × 101
Std6.42 × 10−21.11 × 10−12.71 × 10−18.75 × 10−53.08 × 10−88.041.83 × 1012.875.57 × 10−101.23 × 10−7
f7Mean6.70 × 1018.42 × 1027.46 × 1025.36 × 1021.48 × 1026.55 × 1028.06 × 1021.44 × 1031.32 × 1033.25 × 102
Std7.842.56 × 1015.714.521.06 × 1021.31 × 1022.98 × 10−108.978.97 × 10−91.77 × 10−13
f8Mean2.65 × 1012.90 × 1023.57 × 1022.78 × 1029.09 × 1013.87 × 1024.71 × 1027.48 × 1025.57 × 1022.41 × 102
Std1.32 × 10−13.35 × 10−11.251.436.28 × 1016.35 × 1015.53 × 1018.062.14 × 10−91.14 × 10−2
f9Mean7.86 × 10−17.51 × 1031.20 × 1049.69 × 1038.36 × 1022.74 × 1031.42 × 1043.78 × 1042.07 × 1042.19 × 103
Std7.19 × 10−108.59 × 1037.00 × 1033.03 × 1017.03 × 1027.86 × 1034.15 × 1032.41 × 1039.71 × 10−77.20 × 10−2
f10Mean3.35 × 1036.33 × 1037.09 × 1036.85 × 1034.32 × 1031.25 × 1041.20 × 1041.41 × 1041.06 × 1046.55 × 103
Std1.001.51 × 1031.28 × 1031.22 × 1022.25 × 1031.42 × 1031.46 × 1032.69 × 1025.34 × 10−84.04 × 101
f11Mean6.72 × 1018.17 × 1021.83 × 1032.03 × 1022.01 × 1029.85 × 1024.49 × 1032.62 × 1041.16 × 1042.75 × 102
Std1.07 × 1021.91 × 1087.83 × 1012.98 × 10−51.522.29 × 1093.38 × 10−104.86 × 1021.10 × 10−75.25 × 10−2
f12Mean1.82 × 1056.56 × 1098.12 × 1099.43 × 1052.46 × 1091.22 × 1091.41 × 10101.01 × 10112.40 × 10102.21 × 106
Std4.44 × 1059.40 × 1072.34 × 1075.74 × 1015.54 × 1071.19 × 10105.73 × 10−36.81 × 1074.94 × 10−23.08 × 103
f13Mean2.24 × 1031.77 × 1092.00 × 1094.13 × 1037.36 × 1081.95 × 1082.55 × 1096.29 × 10102.75 × 1091.74 × 104
Std1.78 × 1021.43 × 1083.06 × 1078.48 × 10−22.349.68 × 1092.591.58 × 1081.65 × 10−22.97 × 10−1
f14Mean1.68 × 1042.42 × 1051.60 × 1064.24 × 1042.01 × 1054.01 × 1051.04 × 1069.07 × 1075.39 × 1043.68 × 103
Std1.19 × 1062.17 × 1082.04 × 1051.918.98 × 1068.12 × 1082.73 × 10−77.65 × 1062.47 × 10−61.62 × 101
f15Mean1.06 × 1037.50 × 1071.25 × 1088.53 × 1031.41 × 1062.15 × 1075.53 × 1071.25 × 10107.00 × 1059.03 × 103
Std9.52 × 1011.68 × 1081.05 × 1018.85 × 10−51.61 × 1012.87 × 1093.72 × 10−41.14 × 1079.66 × 10−57.55 × 10−2
f16Mean5.49 × 1022.02 × 1032.26 × 1031.92 × 1031.05 × 1032.74 × 1033.83 × 1039.23 × 1033.14 × 1031.74 × 103
Std2.962.36 × 1026.42 × 1014.72 × 1012.28 × 1021.99 × 1034.46 × 1021.98 × 1021.60 × 10−82.33
f17Mean3.02 × 1021.73 × 1032.10 × 1031.49 × 1038.10 × 1021.99 × 1032.00 × 1032.45 × 1041.81 × 1031.28 × 103
Std5.47 × 10−74.15 × 1064.11 × 1013.89 × 1016.12 × 1027.09 × 1041.91 × 1021.43 × 1021.14 × 10−88.05 × 10−1
f18Mean7.53 × 1041.97 × 1065.41 × 1062.00 × 1058.94 × 1053.65 × 1066.00 × 1062.78 × 1084.55 × 1057.15 × 104
Std3.60 × 1048.49 × 1089.47 × 1052.32 × 1014.09 × 1072.27 × 1092.87 × 10−47.53 × 1061.01 × 10−53.60 × 102
f19Mean8.05 × 1031.09 × 1075.09 × 1071.62 × 1041.53 × 1051.15 × 1079.31 × 1064.97 × 1092.09 × 1075.33 × 103
Std2.66 × 10−21.32 × 1071.56 × 1011.59 × 10−38.18 × 10−11.06 × 1093.00 × 10−21.66 × 1073.91 × 10−43.49 × 10−2
f20Mean1.34 × 1021.02 × 1031.25 × 1031.04 × 1034.18 × 1021.38 × 1031.34 × 1032.53 × 1031.08 × 1031.12 × 103
Std7.74 × 10−56.70 × 1022.26 × 1026.07 × 1017.06 × 1026.61 × 1026.27 × 1021.11 × 1028.26 × 10−91.15
f21Mean2.26 × 1025.24 × 1025.59 × 1024.21 × 1023.33 × 1025.78 × 1027.81 × 1021.19 × 1037.91 × 1024.43 × 102
Std8.27 × 10−54.40 × 10−18.115.034.83 × 1016.63 × 1011.70 × 1021.76 × 1012.03 × 10−93.17 × 10−2
f22Mean1.70 × 1026.63 × 1037.60 × 1037.92 × 1033.84 × 1039.63 × 1031.30 × 1041.50 × 1041.07 × 1046.48 × 103
Std2.13 × 10−31.33 × 1031.03 × 1036.09 × 1011.27 × 1031.58 × 1031.34 × 1032.86 × 1024.87 × 10−82.37 × 101
f23Mean4.39 × 1021.08 × 1031.13 × 1037.24 × 1029.82 × 1028.31 × 1021.81 × 1032.40 × 1031.36 × 1038.00 × 102
Std1.46 × 10−21.559.60 × 10−15.315.334.34 × 1028.50 × 1025.80 × 1012.43 × 10−91.79 × 10−1
f24Mean5.02 × 1021.20 × 1031.17 × 1038.01 × 1021.06 × 1038.87 × 1021.68 × 1032.41 × 1031.39 × 1039.54 × 102
Std5.71 × 10−104.972.723.313.88 × 1013.72 × 1021.71 × 1013.55 × 1011.73 × 10−91.89 × 10−1
f25Mean5.65 × 1023.11 × 1031.79 × 1035.71 × 1025.74 × 1021.02 × 1035.28 × 1031.45 × 1049.42 × 1034.65 × 102
Std1.47 × 1024.30 × 1012.462.81 × 10−41.52 × 1015.20 × 1039.92 × 10−113.161.33 × 10−88.57 × 10−1
f26Mean1.24 × 1037.65 × 1037.98 × 1033.76 × 1032.62 × 1033.79 × 1031.08 × 1041.55 × 1041.23 × 1042.97 × 103
Std4.43 × 10−43.85 × 1012.06 × 1012.91 × 1011.10 × 1023.43 × 1032.911.38 × 1012.03 × 10−81.39 × 10−1
f27Mean5.68 × 1021.23 × 1031.13 × 1037.79 × 1021.01 × 1037.86 × 1023.33 × 1034.11 × 1031.47 × 1035.00 × 102
Std2.07 × 1013.631.075.06 × 10−52.86 × 10−14.92 × 1039.51 × 1021.16 × 1021.49 × 10−91.44 × 102
f28Mean4.70 × 1025.80 × 1034.40 × 1035.14 × 1029.21 × 1027.81 × 1024.74 × 1031.24 × 1047.32 × 1034.96 × 102
Std1.16 × 1029.289.75 × 10−17.34 × 10−41.30 × 1014.41 × 1034.42 × 10−115.367.14 × 10−91.19 × 102
f29Mean4.70 × 1022.51 × 1032.61 × 1031.47 × 1031.03 × 1032.34 × 1035.75 × 1031.83 × 1054.63 × 1031.24 × 103
Std5.871.14 × 1045.472.10 × 1011.77 × 1011.16 × 1071.34 × 1014.28 × 1031.48 × 10−81.88 × 101
f30Mean1.22 × 1061.14 × 1083.55 × 1071.11 × 1064.66 × 1061.50 × 1083.46 × 1089.48 × 1092.21 × 1088.17 × 104
Std1.45 × 1054.47 × 1061.31 × 1052.332.20 × 1065.37 × 1097.35 × 1023.78 × 1071.59 × 10−31.32 × 106
Table 6. The comparative results of MSTPSO and advanced algorithms on 100-D using the CEC2017 benchmark suite.
Table 6. The comparative results of MSTPSO and advanced algorithms on 100-D using the CEC2017 benchmark suite.
100-DMSTPSODQNPSOAPSO_SACEPSOKGPSOXPSOMSORLDRARFORLACA
f1Mean4.16 × 1031.77 × 10119.16 × 10105.48 × 1031.38 × 10106.26 × 1094.81 × 10102.81 × 10112.47 × 10111.72 × 104
Std2.54 × 10−36.49 × 1087.57 × 1073.23 × 10−24.06 × 1071.04 × 10108.56 × 10−36.35 × 1051.55 × 10−15.74 × 10−3
f3Mean1.40 × 1051.93 × 1056.58 × 1051.61 × 1053.18 × 1042.47 × 1049.15 × 1043.47 × 1054.17 × 1052.56 × 10−1
Std6.65 × 10112.66 × 10164.44 × 1041.52 × 1011.42 × 10126.11 × 10141.09 × 1053.04 × 1055.09 × 10−63.69 × 10−6
f4Mean2.97 × 1023.15 × 1041.35 × 1042.71 × 1021.47 × 1037.01 × 1029.18 × 1031.24 × 1057.04 × 1042.37 × 102
Std1.84 × 1023.06 × 1011.72 × 1011.10 × 10−39.071.59 × 1042.86 × 10−104.508.30 × 10−86.23 × 10−1
f5Mean6.77 × 1019.76 × 1029.62 × 1027.09 × 1022.48 × 1023.91 × 1024.55 × 1021.67 × 1031.44 × 1036.66 × 102
Std1.56 × 10−51.587.191.815.39 × 1016.10 × 1015.65 × 1014.103.16 × 10−98.89 × 10−10
f6Mean3.53 × 10−16.06 × 1015.87 × 1015.08 × 1013.842.43 × 1017.02 × 1011.16 × 1029.22 × 1014.43 × 101
Std3.32 × 10−91.53 × 10−23.43 × 10−18.74 × 10−58.16 × 10−98.041.83 × 1017.65 × 10−14.52 × 10−106.09 × 10−6
f7Mean1.53 × 1023.93 × 1033.18 × 1031.64 × 1033.22 × 1026.55 × 1028.06 × 1023.43 × 1033.73 × 1031.13 × 103
Std1.64 × 10−125.632.26 × 1011.681.07 × 1021.31 × 1022.98 × 10−109.421.28 × 10−84.44 × 10−13
f8Mean6.95 × 1011.03 × 1031.05 × 1037.16 × 1022.90 × 1023.87 × 1024.71 × 1021.87 × 1031.57 × 1037.08 × 102
Std8.81 × 10−132.591.901.13 × 1014.10 × 1016.35 × 1015.53 × 1014.533.17 × 10−99.84 × 10−10
f9Mean1.44 × 1012.41 × 1044.35 × 1042.21 × 1044.62 × 1032.74 × 1031.42 × 1048.38 × 1045.88 × 1041.25 × 104
Std1.77 × 10−102.64 × 1041.87 × 1041.23 × 1022.36 × 1037.86 × 1034.15 × 1032.29 × 1032.08 × 10−69.58 × 10−8
f10Mean8.51 × 1031.50 × 1041.58 × 1041.60 × 1041.13 × 1041.25 × 1041.20 × 1043.25 × 1042.59 × 1041.50 × 104
Std1.59 × 10−61.78 × 1032.51 × 1038.29 × 1014.59 × 1031.42 × 1031.46 × 1032.18 × 1027.82 × 10−81.17 × 101
f11Mean5.18 × 1021.40 × 1048.85 × 1041.50 × 1033.44 × 1039.85 × 1024.49 × 1032.32 × 1051.66 × 1051.40 × 103
Std2.16 × 1022.98 × 1099.38 × 1031.81 × 10−19.98 × 10−12.29 × 1093.38 × 10−104.89 × 1058.76 × 10−76.83 × 10−3
f12Mean5.38 × 1054.56 × 10102.74 × 10101.98 × 1068.21 × 1091.22 × 1091.41 × 10102.22 × 10111.24 × 10111.42 × 107
Std5.78 × 1062.63 × 1082.55 × 1071.21 × 1021.82 × 1081.19 × 10105.73 × 10−32.92 × 1061.27 × 10−12.54 × 104
f13Mean1.67 × 1034.80 × 1094.70 × 1096.25 × 1031.14 × 1091.95 × 1082.55 × 1095.36 × 10102.43 × 10101.43 × 104
Std4.41 × 1011.62 × 1083.81 × 1061.43 × 10−31.61 × 1089.68 × 1092.591.86 × 1063.66 × 10−21.78 × 10−4
f14Mean5.39 × 1041.20 × 1079.72 × 1062.45 × 1054.92 × 1054.01 × 1051.04 × 1061.39 × 1088.13 × 1064.39 × 103
Std1.98 × 1011.36 × 1071.06 × 1061.61 × 1012.26 × 1078.12 × 1082.73 × 10−71.31 × 1068.34 × 10−57.40 × 10−4
f15Mean5.34 × 1021.50 × 1091.66 × 1092.25 × 1033.32 × 1082.15 × 1075.53 × 1072.95 × 10105.92 × 1096.16 × 103
Std6.47 × 1014.33 × 1071.15 × 1072.68 × 10−53.71 × 1072.87 × 1093.72 × 10−42.07 × 1061.73 × 10−24.59 × 10−4
f16Mean1.83 × 1036.03 × 1036.18 × 1034.37 × 1032.84 × 1032.74 × 1033.83 × 1032.59 × 1041.17 × 1043.78 × 103
Std2.26 × 10−62.96 × 1023.76 × 1015.30 × 1011.37 × 1021.99 × 1034.46 × 1021.09 × 1023.33 × 10−81.48
f17Mean9.62 × 1022.90 × 1041.11 × 1043.52 × 1033.27 × 1031.99 × 1032.00 × 1031.88 × 1079.93 × 1043.58 × 103
Std4.45 × 10−63.90 × 1051.38 × 1012.51 × 1012.96 × 1027.09 × 1041.91 × 1026.41 × 1046.47 × 10−72.27
f18Mean2.02 × 1051.48 × 1071.22 × 1075.09 × 1051.02 × 1063.65 × 1066.00 × 1063.70 × 1081.61 × 1072.47 × 104
Std5.01 × 1011.26 × 1072.26 × 1064.74 × 1011.96 × 1072.27 × 1092.87 × 10−41.23 × 1061.25 × 10−44.39 × 10−2
f19Mean9.17 × 1021.14 × 1099.68 × 1081.65 × 1033.93 × 1081.15 × 1079.31 × 1062.93 × 10106.21 × 1094.43 × 103
Std7.82 × 10−53.45 × 1075.41 × 1062.16 × 10−21.89 × 1081.06 × 1093.00 × 10−22.62 × 1061.71 × 10−21.12 × 10−3
f20Mean9.85 × 1023.13 × 1033.65 × 1033.25 × 1032.06 × 1031.38 × 1031.34 × 1035.84 × 1033.63 × 1032.99 × 103
Std1.64 × 10−41.55 × 1036.81 × 1023.75 × 1011.63 × 1036.61 × 1026.27 × 1021.80 × 1021.98 × 10−89.64
f21Mean2.92 × 1021.44 × 1031.34 × 1038.60 × 1028.66 × 1025.78 × 1027.81 × 1022.93 × 1032.17 × 1039.61 × 102
Std3.85 × 10−51.252.991.46 × 1013.11 × 1016.63 × 1011.70 × 1024.16 × 1013.30 × 10−91.08 × 10−2
f22Mean4.46 × 1031.66 × 1041.75 × 1041.85 × 1041.24 × 1049.63 × 1031.30 × 1043.30 × 1042.69 × 1041.59 × 104
Std3.85 × 10−52.25 × 1031.86 × 1031.46 × 1013.11 × 1011.58 × 1031.34 × 1032.26 × 1027.59 × 10−81.08 × 10−2
f23Mean6.49 × 1022.44 × 1032.37 × 1031.31 × 1032.24 × 1038.31 × 1021.81 × 1034.22 × 1032.83 × 1031.55 × 103
Std9.13 × 10−11.021.681.44 × 1023.36 × 1034.34 × 1028.50 × 1026.59 × 1012.75 × 10−92.16 × 101
f24Mean8.98 × 1023.79 × 1033.62 × 1031.92 × 1033.31 × 1038.87 × 1021.68 × 1037.90 × 1034.65 × 1032.18 × 103
Std4.33 × 10−23.573.728.77 × 10−14.20 × 10−23.72 × 1021.71 × 1016.73 × 1013.27 × 10−92.16 × 10−1
f25Mean8.66 × 1022.14 × 1047.96 × 1038.33 × 1021.07 × 1031.02 × 1035.28 × 1032.91 × 1042.38 × 1048.27 × 102
Std5.37 × 10−21.02 × 1025.67 × 1019.434.15 × 10−15.20 × 1039.92 × 10−113.53 × 10−12.58 × 10−86.30 × 10−1
f26Mean3.46 × 1032.83 × 1042.62 × 1041.18 × 1041.08 × 1043.79 × 1031.08 × 1045.32 × 1044.21 × 1041.17 × 104
Std2.31 × 1023.09 × 1011.55 × 1011.32 × 10−31.16 × 1013.43 × 1032.914.09 × 1013.48 × 10−85.42 × 10−1
f27Mean7.13 × 1022.24 × 1031.67 × 1039.60 × 1021.49 × 1037.86 × 1023.33 × 1031.23 × 1043.92 × 1035.00 × 102
Std6.39 × 10−46.728.09 × 10−13.65 × 10−31.98 × 10−54.92 × 1039.51 × 1029.82 × 1013.85 × 10−98.90 × 10−1
f28Mean6.66 × 1022.15 × 1041.61 × 1046.23 × 1022.19 × 1037.81 × 1024.74 × 1032.96 × 1042.87 × 1045.72 × 102
Std5.944.78 × 1011.09 × 1011.17 × 10−41.84 × 10−14.41 × 1034.42 × 10−111.75 × 1012.12 × 10−82.55 × 102
f29Mean1.55 × 1031.28 × 1048.68 × 1034.15 × 1032.98 × 1032.34 × 1035.75 × 1031.15 × 1062.26 × 1043.51 × 103
Std1.27 × 1027.49 × 1027.443.58 × 10−33.161.16 × 1071.34 × 1012.14 × 1038.80 × 10−84.80 × 10−1
f30Mean1.22 × 1043.68 × 1094.40 × 1099.88 × 1031.15 × 1091.50 × 1083.46 × 1084.86 × 10101.09 × 10101.26 × 105
Std8.83 × 1054.65 × 1072.61 × 1046.94 × 10−22.15 × 1035.37 × 1097.35 × 1023.05 × 1062.30 × 10−24.87 × 105
Table 7. Wilcoxon signed-rank test results of ten algorithms versus MSTPSO on CEC 2017 (counts over functions).
Table 7. Wilcoxon signed-rank test results of ten algorithms versus MSTPSO on CEC 2017 (counts over functions).
AlgorithmsMSTPSO VS10-D30-D50-D
DQNPSO+272829
000
210
APSO_SAC+262929
100
200
EPSO+262425
233
121
KGPSO+252828
211
200
XPSO+292929
000
000
MSORL+262729
210
110
DRA+292929
000
000
RFO+212728
720
001
RLACA+162022
1175
222
Table 8. Experiments of MSTPSO algorithm and change action algorithm on 10D and 30D.
Table 8. Experiments of MSTPSO algorithm and change action algorithm on 10D and 30D.
Optimization10-D30-D
MSTPSOMSTPSO1MSTPSO2MSTPSO3MSTPSOMSTPSO1MSTPSO2MSTPSO3
Average2.312.822.512.352.222.183.152.43
Unimodal2.342.902.362.401.983.812.152.05
Multimodal2.302.422.262.312.183.242.382.41
Mixing and compound2.352.682.602.352.123.323.292.16
Table 9. Friedman ranking of four individual stagnation threshold parameters.
Table 9. Friedman ranking of four individual stagnation threshold parameters.
Optimization30-D50-D
T I = 10 T I = 15 T I = 20 T I = 30 T I = 10 T I = 15 T I = 20 T I = 30
Average2.532.502.482.492.532.492.462.52
Unimodal2.452.662.332.562.512.422.362.71
Multimodal2.572.332.562.542.462.522.482.54
Mixing and compound2.522.542.472.472.552.492.462.50
Table 10. Experiments of MSTPSO algorithm and stop detection module on 10-D, 30-D and 50-D.
Table 10. Experiments of MSTPSO algorithm and stop detection module on 10-D, 30-D and 50-D.
Optimization10-D30-D50-D
StagnationNo StagnationStagnationNo StagnationStagnationNo Stagnation
Average1.331.671.431.571.471.53
Unimodal1.731.271.171.321.381.62
Multimodal1.331.671.371.771.521.55
Mixing and compound1.301.701.441.551.491.51
Table 11. Friedman ranking of the stagnation threshold parameters for four groups.
Table 11. Friedman ranking of the stagnation threshold parameters for four groups.
Optimization30-D50-D
T I = 5 T I = 10 T I = 15 T I = 20 T I = 5 T I = 10 T I = 15 T I = 20
Average2.532.482.512.482.532.452.492.53
Unimodal2.802.332.352.512.552.412.312.72
Multimodal2.502.512.532.452.472.482.442.62
Mixing and compound2.512.492.512.492.482.432.542.55
Table 12. Friedman ranking of lower bound parameters at three speeds.
Table 12. Friedman ranking of lower bound parameters at three speeds.
Optimization30-D50-D
V min = e 4 V min = e 5 V min = e 6 V min = e 4 V min = e 5 V min = e 6
Average2.572.332.562.462.522.48
Unimodal2.522.542.472.552.492.46
Multimodal2.532.502.482.532.492.46
Mixing and compound2.452.662.332.512.422.36
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hao, X.; Wang, S.; Liu, X.; Wang, T.; Qiu, G.; Zeng, Z. Q-Learning-Based Multi-Strategy Topology Particle Swarm Optimization Algorithm. Algorithms 2025, 18, 672. https://doi.org/10.3390/a18110672

AMA Style

Hao X, Wang S, Liu X, Wang T, Qiu G, Zeng Z. Q-Learning-Based Multi-Strategy Topology Particle Swarm Optimization Algorithm. Algorithms. 2025; 18(11):672. https://doi.org/10.3390/a18110672

Chicago/Turabian Style

Hao, Xiaoxi, Shenwei Wang, Xiaotong Liu, Tianlei Wang, Guangfan Qiu, and Zhiqiang Zeng. 2025. "Q-Learning-Based Multi-Strategy Topology Particle Swarm Optimization Algorithm" Algorithms 18, no. 11: 672. https://doi.org/10.3390/a18110672

APA Style

Hao, X., Wang, S., Liu, X., Wang, T., Qiu, G., & Zeng, Z. (2025). Q-Learning-Based Multi-Strategy Topology Particle Swarm Optimization Algorithm. Algorithms, 18(11), 672. https://doi.org/10.3390/a18110672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop