1. Introduction
Particle swarm optimization (PSO) was introduced by Kennedy and Eberhart [
1] as a population-based stochastic search method driven by social interaction. In its canonical form, each particle
maintains a position
and a velocity
, and evolves according to:
where
is the inertia weight,
are the cognitive and social parameters,
are independent random multipliers,
is the personal best of particle
i, and
is the global best over the swarm (all-to-all topology). The best positions satisfy the best-so-far monotonicity:
where
.
PSO is important because it combines a very simple update rule with robust performance in derivative-free optimization, engineering design, control tuning, parameter identification, and simulation-based search problems where gradients are unavailable, unreliable, or too expensive to compute. This practical success has motivated a large theoretical literature, but a complete convergence theory for the standard stochastic multi-particle algorithm remains difficult. The main obstacle is that PSO is neither a classical stochastic gradient method nor a purely deterministic dynamical system: the update contains inertial dynamics, random acceleration coefficients, and history-dependent memory variables whose locations change according to best-so-far rules.
Existing convergence analyses have clarified many aspects of PSO stability, including parameter restrictions, moment stability, stochastic-process convergence, Lyapunov stability, and continuous-time or mean-field limits. However, these analyses usually work directly with PSO-specific recursions or with simplified stagnation models. In contrast, modern stochastic optimization theory provides mature Lyapunov templates for momentum methods on smooth strongly convex objectives; see, for example, [
2]. The question addressed in this paper is whether the standard global-best PSO recursion can be represented in a form close enough to stochastic momentum to permit a comparable Lyapunov drift analysis while retaining the personal-best and global-best memory structure.
For each particle, we define the quadratic surrogate:
so that
which is exactly the mean attraction term in (
1). Writing
with
gives:
Thus, PSO can be viewed as a stochastic heavy-ball recursion on a time-varying surrogate determined by the current memory variables. The convergence theorem in this paper follows from a composite Lyapunov function coupling the momentum potential with the best-value gaps and .
The scope of the result is deliberately explicit. The mean-square convergence statement is derived from the PSO dynamics after imposing smooth strong convexity, bounded trajectories, and a mean improvement condition for the best-value gaps. The first two assumptions control the surrogate gradient and noise moments. The third assumption is not a consequence of monotonicity alone; it rules out stagnation of the personal and global bests away from the minimizer. Therefore, the theorem should be read as a conditional convergence result: the stochastic-momentum reduction and Lyapunov drift are derived from the PSO recursion, whereas boundedness and systematic improvement in best-value gaps are imposed structural hypotheses.
Contributions.
We give an equation-level reduction in the standard global-best (all-to-all) multi-particle PSO recursion (
1) and (
2) to a stochastic heavy-ball method on the quadratic surrogate (
3), with an explicit martingale-difference noise term (
4).
We construct a composite Lyapunov function that incorporates both momentum-style error terms and PSO memory variables via best-value gaps, leveraging the monotonicity of personal and global best values while making clear which additional improvement condition is required.
Under smoothness and strong convexity of f, boundedness of the trajectories, and mean improvement in the best-value gaps, we establish mean-square convergence of particle positions and convergence of the personal-best and global-best objective gaps.
We include a numerical illustration on high-dimensional strongly convex quadratics to connect the theoretical stabilization mechanism with observed behavior under a standard stable parameter regime.
2. Related Work
PSO convergence theory has developed along several complementary lines. Jiang et al. [
3] analyzed the standard PSO algorithm as a stochastic process and derived convergence and parameter-selection conditions accounting for the randomness in the update coefficients. Chen and Li [
4] proposed a modified PSO structure with an additional exploration component and established convergence through Lyapunov arguments for stochastic processes. Kadirkamanathan et al. [
5] treated particle dynamics using control-theoretic and Lyapunov stability tools, deriving sufficient conditions for boundedness and stability of trajectories under stochastic updates of best positions.
Moment-based analyses provide another important perspective. Poli [
6] characterized the evolution of the sampling distribution of particle states and identified parameter regions for stability of the first and second moments under simplifying assumptions. Bonyadi and Michalewicz [
7] studied PSO stability without imposing a stagnation assumption on the best positions, deriving conditions for convergence of mean and variance under broad distributions of the bests. These results are closely related to the present paper because they emphasize the role of parameter regimes and memory variables in the stability of the canonical recursion.
Probabilistic convergence analyses have used Markov, martingale, and metric-space tools. Xu and Yu [
8] constructed supermartingale sequences tied to the swarm’s best fitness value. Hu et al. [
9] developed almost-sure convergence results for stochastic PSO models without stagnation assumptions. Dong and Zhang [
10] proposed a composite drift–diffusion model and obtained Lyapunov moment bounds controlling diffusion effects. More recently, weak-convergence viewpoints have been developed for PSO trajectories and swarm-level sampling regimes [
11], while modified velocity-control schemes such as constriction-based PSO have continued to motivate convergence analyses for practically used variants [
12].
Continuous-time, mean-field, and structure-based analyses are also relevant. Huang, Qiu, and Riedl [
13] established global convergence results through continuous-time and mean-field modeling, proving consensus formation through variance dissipation and linking the consensus point to a global minimizer under additional assumptions. Cui [
14] proposed a symmetry-based framework for PSO variants, deriving relationships between hyperparameters and noise characteristics that guarantee convergence under stated structural assumptions. These works differ from the present paper in their modeling level and assumptions, but they confirm the current interest in deriving rigorous stability statements for particle-based optimization methods.
The present contribution is narrower but more explicit in a different direction. Rather than replacing PSO by a continuous-time or mean-field model, and rather than analyzing only a stagnated recursion, we rewrite the discrete-time global-best PSO update itself as a stochastic heavy-ball recursion on a time-varying quadratic surrogate. This allows us to import a Lyapunov drift template from stochastic momentum analysis while preserving the personal-best and global-best terms. The price is that the final convergence theorem is conditional: boundedness and mean improvement in best-value gaps are assumed, not derived. This positioning separates the part of the proof that follows algebraically from the PSO dynamics from the part that relies on structural assumptions about successful improvement in the memory variables.
3. Problem Setting and PSO Dynamics
We consider:
where
satisfies the following assumption. The problem statement is to analyze whether the particle positions generated by the standard global-best PSO recursion converge in mean square to the unique minimizer
, and whether the personal-best and global-best objective gaps converge to zero. We do not aim to prove global convergence for arbitrary nonconvex objectives; the analysis is restricted to smooth strongly convex objectives so that distance-to-solution and objective-gap estimates can be connected through standard curvature inequalities.
Assumption 1 (strong convexity and smoothness)
. The function f is μ–strongly convex and L–smooth: for all ,In particular, f has a unique minimizer with , and . Definition 1 (global-best PSO (all-to-all, unprojected))
. Let be the natural filtration generated by all particle variables up to time t. For each particle , the updates are:where are i.i.d. and independent of . Personal bests are updated by:Define the global-best index with a deterministic tie-break rule:and update by the best-so-far rule:so that holds pathwise. The best-so-far property gives, for all
:
Assumption 2 (bounded trajectories)
. There exist constants such that for all particles and all : Remark 1 (role of the boundedness assumption)
. Assumption 2 is imposed as a technical condition to control the surrogate gradients and the second moments of the stochastic perturbation in the unprojected recursion of Definition 1. It is not derived from the PSO dynamics in this paper. In practical implementations, boundedness is often enforced by box constraints, absorbing or reflecting boundaries, and/or velocity clamping. Such mechanisms make the assumption plausible for implemented algorithms, but they also modify the exact recursion and may introduce projection or clamping effects not included in the present proof. Thus, all estimates below should be interpreted as conditional on bounded trajectories for the stated recursion.
Remark 2 (deterministic PSO and applicability of the reformulation)
. The stochastic-momentum reformulation applies to the system (5) and (6). If the random multipliers are replaced by their means, , then the update becomes:with the same surrogate as in (3). Hence, the deterministic PSO recursion is exactly the noise-free special case of the system analyzed below. In this case, the martingale-difference term is identically zero and the Lyapunov drift statements hold pathwise rather than only after taking conditional expectations. 4. PSO as Stochastic Momentum on a Quadratic Surrogate
Fix a particle index i and suppress it in notation (the analysis is particle-wise, with shared across particles). Write for .
Definition 2 (quadratic surrogate)
. Lemma 1. Letand definetogether withThen, for all ,and henceMoreover, the velocity update (5) can be written as follows:and Proof. The gradient formula follows immediately from:
and the definitions of
. This yields:
hence
Next, write:
Substituting this into (
5), we obtain:
Finally, since
and the random variables are conditionally independent of
, we get
. □
5. Lyapunov Function and Structural Bounds
Define the base Lyapunov function:
Lemma 3 (conditional drift of
)
. Proof. Expand squares using (
8) and condition on
; the cross term with
vanishes since
. □
Proof. Expand and similarly for and subtract . □
Proof. Use and . □
Proof. Conditioned on , are independent, zero-mean, and satisfy . Thus, cross terms vanish by independence and zero mean, and Use and similarly for , then scale by . □
The bounds in this section isolate the terms that must be controlled in the Lyapunov drift. Lemma 3 gives the exact one-step identity for the IMA potential, while Lemmas 4–6 bound the surrogate gap, gradient norm, and stochastic variance in terms of distances to the true minimizer and to the PSO memory variables. These estimates are the ingredients used in the drift closure below.
6. Drift Inequality and Convergence
From (
9):
hence:
Substitute (
6) into (
3) and denote
:
Since
is convex:
hence
and therefore
Lemma 7. Under Assumptions 1 and 2, there exist constants and (depending only on ) such that for all , Proof. Take full expectation in (
11) and apply Lemma 4 to
and
. Bound the cross terms by Young’s inequality and control the remaining gradient and noise terms via Lemmas 5 and 6. Collect coefficients. □
Composite Lyapunov and Closure of the Drift
The drift inequality in Lemma 7 involves the memory variables and . To close the recursion, we convert their distance terms into best-value gaps and incorporate these gaps into the Lyapunov function.
Lemma 8. Under Assumption 1, for any random variable y taking values in , Proof. Strong convexity implies . □
Lemma 9. Under Assumptions 1 and 2, there exist constants , , and such that for all , Proof. Apply Lemma 7 and then Lemma 8 to and . □
The monotonicity of and alone does not guarantee that these quantities approach ; PSO may stagnate. We therefore impose an explicit improvement condition.
Assumption 3 (mean improvement in best-value gaps)
. There exist constants such that for all , Assumption 3 is stronger than best-so-far monotonicity. Monotonicity only gives nonincreasing nonnegative sequences and ; such sequences may converge to a positive value if the swarm stagnates. The constants and encode a mean geometric improvement in the memory variables. This type of condition is most natural in settings where the current swarm distribution keeps a nonzero probability of sampling a sufficiently better point whenever the best value is not yet close to . It is not expected to hold uniformly for arbitrary nonconvex landscapes, deceptive multimodal objectives, or implementations whose diversity collapses prematurely. The theorem below therefore identifies a sufficient convergence mechanism rather than a universal PSO convergence guarantee.
Definition 3 (composite Lyapunov (augmented))
. Fix weights and a constant , and define: Lemma 10. Under Assumptions 1–3, there exist constants and a choice of and such that for all , Remark 3 (qualitative dependence of the weights)
. The constants inherit the noise scaling and therefore typically grow as due to the factor in Lemma 6. Hence the required weights scale as and .
Proof. Start from Lemma 9 and add
to both sides. Choose
so that the lagged term is dominated, yielding a net negative coefficient on
. Next, apply Assumption 3 to obtain:
Pick
and
to absorb the remainder terms from Lemma 9. Collecting terms yields the stated drift inequality. □
The preceding lemmas show how the proof is assembled. Lemma 7 gives a distance-level drift for the momentum potential but leaves positive terms involving the PSO memory variables. Lemma 9 converts these memory terms into objective gaps using strong convexity. Assumption 3 then supplies the missing decrease in the memory gaps, and the augmented Lyapunov function absorbs the remaining positive terms through the weights
A and
B. Thus,
Section 4 and
Section 5 should be read as a closure argument: the stochastic-momentum representation gives the basic negative drift in the current particle position, while the improvement assumption closes the recursion through the personal-best and global-best variables.
Theorem 1 (mean-square convergence of global-best PSO)
. Suppose Assumptions 1–3 hold. Assume the PSO (5) and (6) parameters lie in a stability region such that the constants in Lemma 7 satisfy . Then, for each particle i:Moreover,and by strong convexity, Remark 4 (interpretation of the stability condition)
. The condition is the point at which the negative drift generated by the attraction toward the surrogate centers dominates the lagged-position term, the gradient-bound remainder, and the stochastic noise variance. It is a conservative sufficient condition, not a sharp characterization of the practical PSO stability region. In the notation of Appendix E, one obtains explicit coefficients after choosing Young-inequality parameters such as ; a sufficient check is together with finite memory coefficients that can be absorbed by the augmented Lyapunov weights. This interpretation is consistent with standard PSO parameter practice. Larger inertia ω increases the factor in the noise bound and increases the lag coefficient through ; so, the sufficient condition becomes harder to satisfy as ω approaches one. Larger acceleration parameters and strengthen the mean attraction toward and , but they also increase the stochastic variance. The stability inequality therefore formalizes the familiar trade-off in PSO parameter selection: inertia and acceleration must be large enough to move the swarm, but not so large that oscillation and noise dominate the contraction mechanism. The commonly used stable regimes discussed in [7] are compatible with this qualitative balance, although the present bound is intentionally conservative. Proof. By Lemma 10,
has a negative drift with respect to
,
, and
. Summing from
to
T and telescoping yields:
Since the summands are nonnegative, this implies
and
,
. Finally, strong convexity gives
and
by Lemma 8. □
Remark 5 (deterministic PSO)
. If , then and the above proof holds pathwise.
7. Numerical Illustration on a High-Dimensional Convex Quadratic
To supplement the theoretical analysis, we report one additional experiment on a high-dimensional convex objective. The purpose of this experiment is not to provide an extensive empirical comparison, but rather to illustrate that the predicted stabilization and convergence behavior is also observed in a simple large-scale convex setting.
We consider the quadratic objective:
where
and
is diagonal and positive definite. The diagonal entries are chosen deterministically as follows:
with eigenvalues linearly spaced between 1 and 5. Hence,
f is smooth and strongly convex, and its unique minimizer is
.
We run the canonical global-best PSO recursion with:
using 1000 particles. The PSO parameters are chosen in accordance with the stability analysis of Bonyadi and Michalewicz [
7]. The experiment is repeated for dimensions:
Particles are initialized randomly in the box
, and velocities are clamped component-wise to the interval
. The random seed is fixed as 20,260,408 in order to make the reported trajectories reproducible. We track the best objective gap:
where
denotes the global-best position found by the swarm up to iteration
t. The run is continued until the objective gap reaches the tolerance:
or until the maximum budget of 100,000 iterations is reached. For the run reported here, the tolerance was reached for every tested dimension. The required iteration counts were:
where
denotes the first recorded block endpoint at which the stopping tolerance is met.
Figure 1 shows that the best objective gap decreases steadily over many orders of magnitude for all tested dimensions, with every run reaching the tolerance
within the prescribed iteration budget. In particular, the trajectories do not exhibit variance explosion or visible instability under the parameter regime considered here. The nearly linear decay on the logarithmic scale after the initial transient is consistent with the mean-square stabilization mechanism established in the main text.
Figure 2 shows the early iterations of the same experiment. Since the particles are initialized in the large box
, the initial objective gaps are far from zero. Thus, the observed convergence is not an artifact of starting close to the minimizer.
This experiment is intended only as a supplementary illustration. The main contribution of this paper is theoretical: the stochastic-momentum reformulation of PSO, the associated Lyapunov drift analysis, and the resulting mean-square convergence guarantee. Different objective environments can change the observed behavior substantially. Strongly convex quadratics provide a controlled setting in which the assumptions are most transparent. Ill-conditioned convex functions may slow the decrease in the best-value gap; nonsmooth functions can break the smoothness estimates used in the drift proof; multimodal nonconvex functions can violate the mean-improvement condition through premature stagnation; and constrained or noisy simulation-based problems can introduce boundary and sampling effects that are not represented in the unprojected recursion. These cases require additional analysis beyond the theorem proved here.
8. Conclusions
This paper connects the canonical global-best PSO recursion to modern stochastic optimization theory by identifying an explicit stochastic-momentum structure. In particular, the velocity update can be decomposed into an inertial term, a deterministic drift term corresponding to a gradient step on a time-varying quadratic surrogate centered at the personal and global best positions, and a conditionally zero-mean perturbation induced by the random acceleration coefficients. An explicit heavy-ball to iterate-moving-average (IMA) transformation then yields a one-step recursion for an auxiliary sequence, which makes it possible to carry out a Lyapunov drift analysis in the spirit of standard stochastic momentum frameworks.
The main emphasis of this paper is theoretical rather than empirical. Our primary contribution is a mean-square convergence analysis of PSO based on this stochastic-momentum representation, together with explicit drift inequalities and a Lyapunov function that captures both the momentum dynamics and the memory terms. Under smoothness and strong convexity of the objective, together with standard boundedness assumptions on the trajectories (consistent with box constraints and/or velocity clamping used in practice), the resulting drift inequality implies summability of the mean-square error and hence mean-square convergence of particle positions to the unique minimizer, as well as convergence of the personal-best and global-best objective gaps (see Theorem 1). The deterministic PSO dynamics obtained by replacing the random multipliers with their expectations is recovered as a special case of the same framework.
From this perspective, the numerical experiments play only an illustrative role. They are included to confirm that the predicted behavior is consistent with the observed dynamics, rather than to provide a comprehensive benchmarking study (see
Section 7 for an additional high-dimensional convex experiment).
The present analysis also makes clear both the essential ingredients of the proof and its current limitations. On the one hand, the argument relies crucially on the surrogate-gradient representation, the martingale structure of the noise term, and a Lyapunov function compatible with the memory updates. On the other hand, the current setting is restricted to smooth strongly convex objectives, boundedness is assumed rather than derived from the dynamics, and the result is formulated only for constant parameters and the global-best (all-to-all) topology.
Several natural directions remain for future work. These include deriving explicit parameter regions that certify the drift condition and comparing them quantitatively with existing PSO stability boundaries, weakening or removing the boundedness assumption, extending the analysis to time-varying parameters and neighborhood topologies, and investigating whether related Lyapunov constructions can yield convergence rates or guarantees beyond the strongly convex regime.
Author Contributions
Conceptualization, B.B.; Methodology, B.B.; Formal Analysis, G.V.; Investigation, G.V.; Writing—Original Draft, G.V.; Writing—Review & Editing, B.B.; Funding Acquisition, B.B. All authors have read and agreed to the published version of the manuscript.
Funding
The work of the first author was partially supported by a grant from the scientific program of Chinese universities “Program to support the stability of Higher Education” (section Shenzhen 2022—Commission on Science, Technology and Innovation of the Shenzhen Municipality 20220819092520001).
Data Availability Statement
The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.
Acknowledgments
The authors gratefully acknowledge the support of the National Key Research and Development Program of China (Grant No. 2025YFE0113400).
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. IMA Transformation: Full Algebra
This appendix provides a detailed derivation of Lemma 2. Starting from the stochastic heavy-ball form (
7):
assume
is constant and define:
Equivalently:
Subtract the corresponding identity at time
:
Now use (
7) to substitute
:
Combine (
A2) and (
A3) and divide by
:
This is exactly (
8). Finally, (
A1) is (
9). If
, then
.
Appendix B. Lyapunov Drift Expansion: Full Calculation
This appendix expands the drift identity behind Lemma 3. Define:
From the IMA recursion (
8):
Let
. Then:
Subtract
and condition on
:
Since
are
-measurable and
:
Thus:
which is (
3).
Appendix C. Explicit Bounds for ∥∇ Ut (xt)∥ 2
This appendix justifies Lemma 5 with explicit constants. From Lemma 1:
Using
:
Next,
and similarly for
, hence:
which is (
5).
Appendix D. Explicit Bounds for the Noise Variance
This appendix gives details for Lemma 6. Recall:
Conditioned on
,
are independent, mean zero, and
. Hence:
Using
and similarly for
:
Finally,
implies:
yielding (
10) (with the same constants, scaled by
).
Appendix E. Detailed Proof of the Distance-Level Drift Inequality
This appendix expands the “collect coefficients” step in Lemma 7. Start from (
11):
Using Lemma 4 at
,
Apply Young’s inequality: for any
:
Thus:
Similarly, apply Lemma 4 and Young to
to get an upper bound of the form:
for arbitrary
.
Plug (
A5) and (
A6) into (
A4). Then, bound the remaining terms by Lemmas 5 and 6:
where (from Lemmas 5 and 6):
Collecting coefficients gives a bound of the form:
with
Take full expectation to obtain the unconditional version. Choosing, for instance,
makes all coefficients explicit and finite. On the role of
, under the surrogate scaling
and
with
, the leading terms in
involve products of the form
and
, for which
cancels. Therefore, positivity of
is governed primarily by the parameter combination
(and the auxiliary constants
) under this reduction;
does not act as an independent “small step-size” knob in the usual sense.
References
- Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks; IEEE: New York, NY, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
- Garrigos, G.; Gower, R.M. Handbook of Convergence Theorems for (Stochastic) Gradient Methods. arXiv 2024, arXiv:2301.11235. [Google Scholar] [CrossRef]
- Jiang, M.; Luo, Y.; Yang, S. Stochastic convergence analysis and parameter selection of the standard PSO algorithm. Inf. Process. Lett. 2007, 102, 8–16. [Google Scholar] [CrossRef]
- Chen, X.; Li, Y. A modified PSO structure resulting in high exploration ability with convergence guaranteed. IEEE Trans. Syst. Man Cybern. B Cybern. 2007, 37, 1271–1289. [Google Scholar] [CrossRef] [PubMed]
- Kadirkamanathan, V.; Selvarajah, K.; Fleming, P. Stability analysis of the particle dynamics in particle swarm optimizer. IEEE Trans. Evol. Comput. 2006, 10, 245255. [Google Scholar] [CrossRef]
- Poli, R. Dynamics and stability of the sampling distribution of PSO via moment analysis. J. Artif. Evol. Appl. 2008, 2008, 761459. [Google Scholar]
- Bonyadi, M.R.; Michalewicz, Z. Stability Analysis of the Particle Swarm Optimization Without Stagnation Assumption. IEEE Trans. Evol. Comput. 2016, 20, 814–819. [Google Scholar] [CrossRef]
- Xu, G.; Yu, X. On convergence analysis of particle swarm optimization algorithm. J. Comput. Appl. Math. 2018, 333, 65–73. [Google Scholar] [CrossRef]
- Hu, D.; Qiu, X.; Liu, Y.; Zhou, X. Probabilistic convergence analysis of the stochastic PSO model without the stagnation assumption. Inf. Sci. 2021, 547, 996–1007. [Google Scholar] [CrossRef]
- Dong, W.; Zhang, R. Stochastic stability analysis of composite dynamic system for particle swarm optimization. Inf. Sci. 2022, 592, 227–243. [Google Scholar] [CrossRef]
- Bruned, V.; Mas, A.; Wlodarczyk, S. Weak convergence of particle swarm optimization. arXiv 2018, arXiv:1811.04924. [Google Scholar]
- Tarekegn Nigatu, D.; Gemechu Dinka, T.; Luleseged Tilahun, S. Convergence analysis of particle swarm optimization algorithm by a velocity control method. Front. Appl. Math. Stat. 2024, 10, 1304268. [Google Scholar] [CrossRef]
- Huang, X.; Qiu, H.; Riedl, M. On the Global Convergence of Particle Swarm Optimization Methods. Appl. Math. Optim. 2023, 88, 30. [Google Scholar] [CrossRef]
- Cui, X. Symmetry-Based Convergence Theory for PSO: From Heuristic to Provably Convergent Optimization. Symmetry 2026, 18, 28. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |