Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis

Budak, Boris; Vorontsov, Georgii

doi:10.3390/math14122107

Open AccessArticle

Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis

by

Boris Budak

^1,2

and

Georgii Vorontsov

^1,*

¹

Faculty of Computational Mathematics and Cybernetics, Shenzhen MSU-BIT University, Shenzhen 518172, China

²

Faculty of Computational Mathematics and Cybernetics, Lomonosov Moscow State University, Moscow 119991, Russia

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(12), 2107; https://doi.org/10.3390/math14122107 (registering DOI)

Submission received: 29 April 2026 / Revised: 2 June 2026 / Accepted: 9 June 2026 / Published: 12 June 2026

(This article belongs to the Section E: Applied Mathematics)

Download

Browse Figures

Versions Notes

Abstract

We analyze the standard multi-particle particle swarm optimization (PSO) algorithm with global-best (all-to-all) topology and constant hyperparameters on smooth strongly convex objectives. By rewriting the PSO velocity recursion as a stochastic heavy-ball method acting on a time-varying quadratic surrogate defined by the personal and global bests, and by applying a Lyapunov drift argument in the style of stochastic momentum analyses, we obtain mean-square convergence of particle positions to the unique minimizer and convergence of the best-so-far objective gaps. The deterministic PSO obtained by fixing the random coefficients at their mean values appears as a noise-free special case of the same Lyapunov framework.

Keywords:

particle swarm optimization; stochastic momentum; Lyapunov drift; mean-square convergence; strong convexity

MSC:

90C15; 90C26

1. Introduction

Particle swarm optimization (PSO) was introduced by Kennedy and Eberhart [1] as a population-based stochastic search method driven by social interaction. In its canonical form, each particle

i \in {1, \dots, N}

maintains a position

x_{i, t} \in R^{d}

and a velocity

v_{i, t} \in R^{d}

, and evolves according to:

v_{i, t + 1} = ω v_{i, t} + c_{1} r_{1, i, t} (p_{i, t} - x_{i, t}) + c_{2} r_{2, i, t} (g_{t} - x_{i, t}),

(1)

x_{i, t + 1} = x_{i, t} + v_{i, t + 1},

(2)

where

ω \in (0, 1)

is the inertia weight,

c_{1}, c_{2} > 0

are the cognitive and social parameters,

r_{1, i, t}, r_{2, i, t} \sim Unif (0, 1)

are independent random multipliers,

p_{i, t}

is the personal best of particle i, and

g_{t}

is the global best over the swarm (all-to-all topology). The best positions satisfy the best-so-far monotonicity:

f (p_{i, t + 1}) \leq f (p_{i, t}), f (g_{t + 1}) \leq f (g_{t}), f (p_{i, t}), f (g_{t}) \geq f^{★},

where

f^{★} : = {min}_{x} f (x)

.

PSO is important because it combines a very simple update rule with robust performance in derivative-free optimization, engineering design, control tuning, parameter identification, and simulation-based search problems where gradients are unavailable, unreliable, or too expensive to compute. This practical success has motivated a large theoretical literature, but a complete convergence theory for the standard stochastic multi-particle algorithm remains difficult. The main obstacle is that PSO is neither a classical stochastic gradient method nor a purely deterministic dynamical system: the update contains inertial dynamics, random acceleration coefficients, and history-dependent memory variables whose locations change according to best-so-far rules.

Existing convergence analyses have clarified many aspects of PSO stability, including parameter restrictions, moment stability, stochastic-process convergence, Lyapunov stability, and continuous-time or mean-field limits. However, these analyses usually work directly with PSO-specific recursions or with simplified stagnation models. In contrast, modern stochastic optimization theory provides mature Lyapunov templates for momentum methods on smooth strongly convex objectives; see, for example, [2]. The question addressed in this paper is whether the standard global-best PSO recursion can be represented in a form close enough to stochastic momentum to permit a comparable Lyapunov drift analysis while retaining the personal-best and global-best memory structure.

For each particle, we define the quadratic surrogate:

U_{i, t} (x) : = α ∥ x - p_{i, t} ∥^{2} + β {∥ x - g_{t} ∥}^{2}, α : = \frac{c_{1}}{4 η}, β : = \frac{c_{2}}{4 η},

(3)

so that

- η \nabla U_{i, t} (x_{i, t}) = \frac{c_{1}}{2} (p_{i, t} - x_{i, t}) + \frac{c_{2}}{2} (g_{t} - x_{i, t}),

which is exactly the mean attraction term in (1). Writing

r_{k, i, t} = \frac{1}{2} + ξ_{k, i, t}

with

E [ξ_{k, i, t} ∣ F_{t}] = 0

gives:

v_{i, t + 1} = ω v_{i, t} - η \nabla U_{i, t} (x_{i, t}) + ε_{i, t + 1}, E [ε_{i, t + 1} ∣ F_{t}] = 0 .

(4)

Thus, PSO can be viewed as a stochastic heavy-ball recursion on a time-varying surrogate determined by the current memory variables. The convergence theorem in this paper follows from a composite Lyapunov function coupling the momentum potential with the best-value gaps

f (p_{i, t}) - f^{★}

and

f (g_{t}) - f^{★}

.

The scope of the result is deliberately explicit. The mean-square convergence statement is derived from the PSO dynamics after imposing smooth strong convexity, bounded trajectories, and a mean improvement condition for the best-value gaps. The first two assumptions control the surrogate gradient and noise moments. The third assumption is not a consequence of monotonicity alone; it rules out stagnation of the personal and global bests away from the minimizer. Therefore, the theorem should be read as a conditional convergence result: the stochastic-momentum reduction and Lyapunov drift are derived from the PSO recursion, whereas boundedness and systematic improvement in best-value gaps are imposed structural hypotheses.

Contributions.

We give an equation-level reduction in the standard global-best (all-to-all) multi-particle PSO recursion (1) and (2) to a stochastic heavy-ball method on the quadratic surrogate (3), with an explicit martingale-difference noise term (4).
We construct a composite Lyapunov function that incorporates both momentum-style error terms and PSO memory variables via best-value gaps, leveraging the monotonicity of personal and global best values while making clear which additional improvement condition is required.
Under smoothness and strong convexity of f, boundedness of the trajectories, and mean improvement in the best-value gaps, we establish mean-square convergence of particle positions and convergence of the personal-best and global-best objective gaps.
We include a numerical illustration on high-dimensional strongly convex quadratics to connect the theoretical stabilization mechanism with observed behavior under a standard stable parameter regime.

2. Related Work

PSO convergence theory has developed along several complementary lines. Jiang et al. [3] analyzed the standard PSO algorithm as a stochastic process and derived convergence and parameter-selection conditions accounting for the randomness in the update coefficients. Chen and Li [4] proposed a modified PSO structure with an additional exploration component and established convergence through Lyapunov arguments for stochastic processes. Kadirkamanathan et al. [5] treated particle dynamics using control-theoretic and Lyapunov stability tools, deriving sufficient conditions for boundedness and stability of trajectories under stochastic updates of best positions.

Moment-based analyses provide another important perspective. Poli [6] characterized the evolution of the sampling distribution of particle states and identified parameter regions for stability of the first and second moments under simplifying assumptions. Bonyadi and Michalewicz [7] studied PSO stability without imposing a stagnation assumption on the best positions, deriving conditions for convergence of mean and variance under broad distributions of the bests. These results are closely related to the present paper because they emphasize the role of parameter regimes and memory variables in the stability of the canonical recursion.

Probabilistic convergence analyses have used Markov, martingale, and metric-space tools. Xu and Yu [8] constructed supermartingale sequences tied to the swarm’s best fitness value. Hu et al. [9] developed almost-sure convergence results for stochastic PSO models without stagnation assumptions. Dong and Zhang [10] proposed a composite drift–diffusion model and obtained Lyapunov moment bounds controlling diffusion effects. More recently, weak-convergence viewpoints have been developed for PSO trajectories and swarm-level sampling regimes [11], while modified velocity-control schemes such as constriction-based PSO have continued to motivate convergence analyses for practically used variants [12].

Continuous-time, mean-field, and structure-based analyses are also relevant. Huang, Qiu, and Riedl [13] established global convergence results through continuous-time and mean-field modeling, proving consensus formation through variance dissipation and linking the consensus point to a global minimizer under additional assumptions. Cui [14] proposed a symmetry-based framework for PSO variants, deriving relationships between hyperparameters and noise characteristics that guarantee convergence under stated structural assumptions. These works differ from the present paper in their modeling level and assumptions, but they confirm the current interest in deriving rigorous stability statements for particle-based optimization methods.

The present contribution is narrower but more explicit in a different direction. Rather than replacing PSO by a continuous-time or mean-field model, and rather than analyzing only a stagnated recursion, we rewrite the discrete-time global-best PSO update itself as a stochastic heavy-ball recursion on a time-varying quadratic surrogate. This allows us to import a Lyapunov drift template from stochastic momentum analysis while preserving the personal-best and global-best terms. The price is that the final convergence theorem is conditional: boundedness and mean improvement in best-value gaps are assumed, not derived. This positioning separates the part of the proof that follows algebraically from the PSO dynamics from the part that relies on structural assumptions about successful improvement in the memory variables.

3. Problem Setting and PSO Dynamics

We consider:

min_{x \in R^{d}} f (x),

where

f : R^{d} \to R

satisfies the following assumption. The problem statement is to analyze whether the particle positions generated by the standard global-best PSO recursion converge in mean square to the unique minimizer

x^{★}

, and whether the personal-best and global-best objective gaps converge to zero. We do not aim to prove global convergence for arbitrary nonconvex objectives; the analysis is restricted to smooth strongly convex objectives so that distance-to-solution and objective-gap estimates can be connected through standard curvature inequalities.

Assumption 1 (strong convexity and smoothness).

The function f is μ–strongly convex and L–smooth: for all

x, y \in R^{d}

,

\begin{matrix} f (y) & \geq f (x) + 〈 \nabla f (x), y - x 〉 + \frac{μ}{2} {∥ y - x ∥}^{2}, \\ ∥ \nabla f (x) - \nabla f (y) ∥ & \leq L ∥ x - y ∥ . \end{matrix}

In particular, f has a unique minimizer

x^{★}

with

\nabla f (x^{★}) = 0

, and

f^{★} : = f (x^{★})

.

Definition 1 (global-best PSO (all-to-all, unprojected)).

Let

{(F_{t})}_{t \geq 0}

be the natural filtration generated by all particle variables up to time t. For each particle

i \in {1, \dots, N}

, the updates are:

v_{i, t + 1} = ω v_{i, t} + c_{1} r_{1, i, t} (p_{i, t} - x_{i, t}) + c_{2} r_{2, i, t} (g_{t} - x_{i, t}),

(5)

x_{i, t + 1} = x_{i, t} + v_{i, t + 1},

(6)

where

r_{1, i, t}, r_{2, i, t} \sim Unif (0, 1)

are i.i.d. and independent of

F_{t}

. Personal bests are updated by:

p_{i, t + 1} = \{\begin{matrix} x_{i, t + 1}, & f (x_{i, t + 1}) < f (p_{i, t}), \\ p_{i, t}, & otherwise . \end{matrix}

Define the global-best index with a deterministic tie-break rule:

J_{t} : = min \{j \in {1, \dots, N} : f (p_{j, t}) = min_{1 \leq k \leq N} f (p_{k, t})\}, g_{t} : = p_{J_{t}, t},

and update

g_{t + 1}

by the best-so-far rule:

g_{t + 1} : = \{\begin{matrix} p_{J_{t + 1}, t + 1}, & min_{1 \leq j \leq N} f (p_{j, t + 1}) < f (g_{t}), \\ g_{t}, & otherwise, \end{matrix}

so that

f (g_{t + 1}) \leq f (g_{t})

holds pathwise.

The best-so-far property gives, for all

i, t

:

f (p_{i, t + 1}) \leq f (p_{i, t}), f (g_{t + 1}) \leq f (g_{t}), f (p_{i, t}), f (g_{t}) \geq f^{★} .

Assumption 2 (bounded trajectories).

There exist constants

R, R_{v} > 0

such that for all particles

i \in {1, \dots, N}

and all

t \geq 0

:

∥ x_{i, t} ∥ \leq R, ∥ v_{i, t} ∥ \leq R_{v}, ∥ p_{i, t} ∥ \leq R, ∥ g_{t} ∥ \leq R .

Remark 1 (role of the boundedness assumption).

Assumption 2 is imposed as a technical condition to control the surrogate gradients and the second moments of the stochastic perturbation in the unprojected recursion of Definition 1. It is not derived from the PSO dynamics in this paper. In practical implementations, boundedness is often enforced by box constraints, absorbing or reflecting boundaries, and/or velocity clamping. Such mechanisms make the assumption plausible for implemented algorithms, but they also modify the exact recursion and may introduce projection or clamping effects not included in the present proof. Thus, all estimates below should be interpreted as conditional on bounded trajectories for the stated recursion.

Remark 2 (deterministic PSO and applicability of the reformulation).

The stochastic-momentum reformulation applies to the system (5) and (6). If the random multipliers are replaced by their means,

r_{1, i, t} \equiv r_{2, i, t} \equiv \frac{1}{2}

, then the update becomes:

v_{i, t + 1} = ω v_{i, t} + \frac{c_{1}}{2} (p_{i, t} - x_{i, t}) + \frac{c_{2}}{2} (g_{t} - x_{i, t}) = ω v_{i, t} - η \nabla U_{i, t} (x_{i, t}),

with the same surrogate

U_{i, t}

as in (3). Hence, the deterministic PSO recursion is exactly the noise-free special case of the system analyzed below. In this case, the martingale-difference term is identically zero and the Lyapunov drift statements hold pathwise rather than only after taking conditional expectations.

4. PSO as Stochastic Momentum on a Quadratic Surrogate

Fix a particle index i and suppress it in notation (the analysis is particle-wise, with

g_{t}

shared across particles). Write

(x_{t}, v_{t}, p_{t})

for

(x_{i, t}, v_{i, t}, p_{i, t})

.

Definition 2 (quadratic surrogate).

Fix

η > 0

and define:

α : = \frac{c_{1}}{4 η}, β : = \frac{c_{2}}{4 η}, U_{t} (x) : = α ∥ x - p_{t} ∥^{2} + β {∥ x - g_{t} ∥}^{2} .

Lemma 1.

Let

U_{t} (x) = α ∥ x - p_{t} ∥^{2} + β {∥ x - g_{t} ∥}^{2},

and define

ξ_{k, t} : = r_{k, t} - \frac{1}{2}, k = 1, 2,

together with

ε_{t + 1} : = c_{1} ξ_{1, t} (p_{t} - x_{t}) + c_{2} ξ_{2, t} (g_{t} - x_{t}) .

Then, for all

x \in R^{d}

,

\nabla U_{t} (x) = 2 α (x - p_{t}) + 2 β (x - g_{t}),

and hence

\frac{c_{1}}{2} (p_{t} - x) + \frac{c_{2}}{2} (g_{t} - x) = - η \nabla U_{t} (x) .

Moreover, the velocity update (5) can be written as follows:

v_{t + 1} = ω v_{t} - η \nabla U_{t} (x_{t}) + ε_{t + 1},

and

E [ε_{t + 1} ∣ F_{t}] = 0 .

Proof.

The gradient formula follows immediately from:

\nabla_{x} {∥ x - a ∥}^{2} = 2 (x - a)

and the definitions of

α, β

. This yields:

\nabla U_{t} (x) = 2 α (x - p_{t}) + 2 β (x - g_{t}),

hence

- η \nabla U_{t} (x) = \frac{c_{1}}{2} (p_{t} - x) + \frac{c_{2}}{2} (g_{t} - x) .

Next, write:

r_{k, t} = \frac{1}{2} + ξ_{k, t}, k = 1, 2 .

Substituting this into (5), we obtain:

v_{t + 1} = ω v_{t} - η \nabla U_{t} (x_{t}) + ε_{t + 1} .

Finally, since

E [ξ_{k, t} ∣ F_{t}] = 0

and the random variables are conditionally independent of

F_{t}

, we get

E [ε_{t + 1} ∣ F_{t}] = 0

. □

Heavy-Ball and IMA Forms (Fully Explicit)

Define

m_{t} : = - v_{t}

. Then, (1) is equivalent to:

\begin{matrix} m_{t + 1} & = ω m_{t} + η \nabla U_{t} (x_{t}) - ε_{t + 1}, \\ x_{t + 1} & = x_{t} - m_{t + 1} . \end{matrix}

Eliminating

m_{t}

using

m_{t} = x_{t - 1} - x_{t}

yields the stochastic heavy-ball form:

x_{t + 1} = x_{t} - η \nabla U_{t} (x_{t}) + ω (x_{t} - x_{t - 1}) + ε_{t + 1} .

(7)

Lemma 2.

Let

ω \in [0, 1)

and define:

λ : = \frac{ω}{1 - ω} z_{t} : = \frac{x_{t + 1} - ω x_{t}}{1 - ω} = x_{t + 1} + λ (x_{t + 1} - x_{t}) .

Then, (7) is equivalent to:

\begin{matrix} z_{t} & = z_{t - 1} - \tilde{η} \nabla U_{t} (x_{t}) + {\tilde{ε}}_{t + 1}, \end{matrix}

(8)

\begin{matrix} x_{t + 1} & = \frac{λ}{λ + 1} x_{t} + \frac{1}{λ + 1} z_{t} = ω x_{t} + (1 - ω) z_{t}, \end{matrix}

(9)

where

\tilde{η} : = η / (1 - ω)

and

{\tilde{ε}}_{t + 1} : = ε_{t + 1} / (1 - ω)

. Moreover,

E [{\tilde{ε}}_{t + 1} ∣ F_{t}] = 0

.

Proof.

From (2),

z_{t} - z_{t - 1} = \frac{x_{t + 1} - (1 + ω) x_{t} + ω x_{t - 1}}{1 - ω}

. Using (7), the numerator equals

- η \nabla U_{t} (x_{t}) + ε_{t + 1}

, giving (8). Solving (2) for

x_{t + 1}

gives

x_{t + 1} = ω x_{t} + (1 - ω) z_{t}

, which is (9). □

Note that

z_{t}

depends on

x_{t + 1}

and is therefore

F_{t + 1}

-measurable, while

(x_{t}, p_{t}, g_{t})

are

F_{t}

-measurable.

5. Lyapunov Function and Structural Bounds

Define the base Lyapunov function:

L_{t} : = {∥ z_{t} - x^{★} ∥}^{2} .

Lemma 3 (conditional drift of

L_{t}

).

For all

t \geq 1

:

E [L_{t} - L_{t - 1} ∣ F_{t}] = - 2 \tilde{η} 〈 \nabla U_{t} (x_{t}), z_{t - 1} - x^{★} 〉 + {\tilde{η}}^{2} ∥ \nabla U_{t} (x_{t}) ∥^{2} + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] .

Proof.

Expand squares using (8) and condition on

F_{t}

; the cross term with

{\tilde{ε}}_{t + 1}

vanishes since

E [{\tilde{ε}}_{t + 1} ∣ F_{t}] = 0

. □

Lemma 4.

For any

x \in R^{d}

:

U_{t} (x) - U_{t} (x^{★}) = (α + β) {∥ x - x^{★} ∥}^{2} + 2 α 〈 x - x^{★}, x^{★} - p_{t} 〉 + 2 β 〈 x - x^{★}, x^{★} - g_{t} 〉 .

Proof.

Expand

∥ x - p_{t} ∥^{2} = {∥ x - x^{★} + x^{★} - p_{t} ∥}^{2}

and similarly for

g_{t}

and subtract

U_{t} (x^{★})

. □

Lemma 5.

For all t:

∥ \nabla U_{t} (x_{t}) ∥^{2} \leq \frac{2 c_{1}^{2} + 2 c_{2}^{2}}{η^{2}} ∥ x_{t} - x^{★} ∥^{2} + \frac{2 c_{1}^{2}}{η^{2}} ∥ p_{t} - x^{★} ∥^{2} + \frac{2 c_{2}^{2}}{η^{2}} {∥ g_{t} - x^{★} ∥}^{2} .

Proof.

Use

\nabla U_{t} (x_{t}) = \frac{c_{1}}{2 η} (x_{t} - p_{t}) + \frac{c_{2}}{2 η} (x_{t} - g_{t})

and

{(a + b)}^{2} \leq 2 a^{2} + 2 b^{2}

. □

Lemma 6.

For all t:

\begin{matrix} E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] & \leq \frac{c_{1}^{2}}{6 {(1 - ω)}^{2}} (∥ x_{t} - x^{★} ∥^{2} + {∥ p_{t} - x^{★} ∥}^{2}) \\ + \frac{c_{2}^{2}}{6 {(1 - ω)}^{2}} (∥ x_{t} - x^{★} ∥^{2} + {∥ g_{t} - x^{★} ∥}^{2}) . \end{matrix}

(10)

Proof.

Conditioned on

F_{t}

,

ξ_{1, t}, ξ_{2, t}

are independent, zero-mean, and satisfy

E [ξ_{k, t}^{2} ∣ F_{t}] = 1 / 12

. Thus, cross terms vanish by independence and zero mean, and

E [∥ ε_{t + 1} ∥^{2} ∣ F_{t}] = \frac{1}{12} (∥ c_{1} (p_{t} - x_{t}) ∥^{2} + ∥ c_{2} (g_{t} - x_{t}) ∥^{2}) .

Use

∥ p_{t} - x_{t} ∥^{2} \leq 2 (∥ p_{t} - x^{★} ∥^{2} + ∥ x_{t} - x^{★} ∥^{2})

and similarly for

g_{t} - x_{t}

, then scale by

{(1 - ω)}^{- 2}

. □

The bounds in this section isolate the terms that must be controlled in the Lyapunov drift. Lemma 3 gives the exact one-step identity for the IMA potential, while Lemmas 4–6 bound the surrogate gap, gradient norm, and stochastic variance in terms of distances to the true minimizer and to the PSO memory variables. These estimates are the ingredients used in the drift closure below.

6. Drift Inequality and Convergence

From (9):

z_{t - 1} = \frac{x_{t} - ω x_{t - 1}}{1 - ω} = (λ + 1) x_{t} - λ x_{t - 1},

hence:

z_{t - 1} - x^{★} = (λ + 1) (x_{t} - x^{★}) - λ (x_{t - 1} - x^{★}) .

Substitute (6) into (3) and denote

s_{t} : = \nabla U_{t} (x_{t})

:

\begin{matrix} E [L_{t} - L_{t - 1} ∣ F_{t}] & = - 2 \tilde{η} (λ + 1) 〈 s_{t}, x_{t} - x^{★} 〉 \\ + 2 \tilde{η} λ 〈 s_{t}, x_{t - 1} - x^{★} 〉 \\ + {\tilde{η}}^{2} ∥ s_{t} ∥^{2} + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] . \end{matrix}

Since

U_{t}

is convex:

〈 s_{t}, x_{t} - x^{★} 〉 \geq U_{t} (x_{t}) - U_{t} (x^{★}), 〈 s_{t}, x_{t - 1} - x_{t} 〉 \leq U_{t} (x_{t - 1}) - U_{t} (x_{t}),

hence

〈 s_{t}, x_{t - 1} - x^{★} 〉 \leq U_{t} (x_{t - 1}) - U_{t} (x^{★})

and therefore

\begin{matrix} E [L_{t} - L_{t - 1} ∣ F_{t}] & \leq - 2 \tilde{η} ((λ + 1) (U_{t} (x_{t}) - U_{t} (x^{★})) - λ (U_{t} (x_{t - 1}) - U_{t} (x^{★}))) \\ + {\tilde{η}}^{2} ∥ s_{t} ∥^{2} + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] . \end{matrix}

(11)

Lemma 7.

Under Assumptions 1 and 2, there exist constants

a > 0

and

b \geq 0

(depending only on

μ, L, ω, c_{1}, c_{2}, η

) such that for all

t \geq 1

,

E [L_{t} - L_{t - 1}] \leq - a E ∥ x_{t} - x^{★} ∥^{2} + b E ∥ x_{t - 1} - x^{★} ∥^{2} + b E ∥ p_{t} - x^{★} ∥^{2} + b E {∥ g_{t} - x^{★} ∥}^{2} .

Proof.

Take full expectation in (11) and apply Lemma 4 to

x_{t}

and

x_{t - 1}

. Bound the cross terms by Young’s inequality and control the remaining gradient and noise terms via Lemmas 5 and 6. Collect coefficients. □

Composite Lyapunov and Closure of the Drift

The drift inequality in Lemma 7 involves the memory variables

p_{t}

and

g_{t}

. To close the recursion, we convert their distance terms into best-value gaps and incorporate these gaps into the Lyapunov function.

Lemma 8.

Under Assumption 1, for any random variable y taking values in

R^{d}

,

∥ y - x^{★} ∥^{2} \leq \frac{2}{μ} (f (y) - f^{★}) .

Proof.

Strong convexity implies

f (y) - f^{★} \geq \frac{μ}{2} {∥ y - x^{★} ∥}^{2}

. □

Lemma 9.

Under Assumptions 1 and 2, there exist constants

a > 0

,

b \geq 0

, and

{\tilde{b}}_{p}, {\tilde{b}}_{g} \geq 0

such that for all

t \geq 1

,

\begin{matrix} E [L_{t} - L_{t - 1}] \leq & - a E ∥ x_{t} - x^{★} ∥^{2} + b E {∥ x_{t - 1} - x^{★} ∥}^{2} \\ + {\tilde{b}}_{p} E [f (p_{t}) - f^{★}] + {\tilde{b}}_{g} E [f (g_{t}) - f^{★}] . \end{matrix}

Proof.

Apply Lemma 7 and then Lemma 8 to

p_{t}

and

g_{t}

. □

The monotonicity of

f (p_{t})

and

f (g_{t})

alone does not guarantee that these quantities approach

f^{★}

; PSO may stagnate. We therefore impose an explicit improvement condition.

Assumption 3 (mean improvement in best-value gaps).

There exist constants

ρ_{p}, ρ_{g} \in (0, 1]

such that for all

t \geq 0

,

\begin{matrix} E [f (p_{i, t + 1}) - f^{★} ∣ F_{t}] \leq (1 - ρ_{p}) (f (p_{i, t}) - f^{★}), \\ E [f (g_{t + 1}) - f^{★} ∣ F_{t}] \leq (1 - ρ_{g}) (f (g_{t}) - f^{★}) . \end{matrix}

Assumption 3 is stronger than best-so-far monotonicity. Monotonicity only gives nonincreasing nonnegative sequences

f (p_{i, t}) - f^{★}

and

f (g_{t}) - f^{★}

; such sequences may converge to a positive value if the swarm stagnates. The constants

ρ_{p}

and

ρ_{g}

encode a mean geometric improvement in the memory variables. This type of condition is most natural in settings where the current swarm distribution keeps a nonzero probability of sampling a sufficiently better point whenever the best value is not yet close to

f^{★}

. It is not expected to hold uniformly for arbitrary nonconvex landscapes, deceptive multimodal objectives, or implementations whose diversity collapses prematurely. The theorem below therefore identifies a sufficient convergence mechanism rather than a universal PSO convergence guarantee.

Definition 3 (composite Lyapunov (augmented)).

Fix weights

A, B > 0

and a constant

c \geq 0

, and define:

{\tilde{V}}_{t} : = E [L_{t}] + c E {∥ x_{t} - x^{★} ∥}^{2} + A E [f (p_{t}) - f^{★}] + B E [f (g_{t}) - f^{★}] .

Lemma 10.

Under Assumptions 1–3, there exist constants

γ_{x}, γ_{p}, γ_{g} > 0

and a choice of

A, B > 0

and

c \geq 0

such that for all

t \geq 1

,

{\tilde{V}}_{t} - {\tilde{V}}_{t - 1} \leq - γ_{x} E {∥ x_{t} - x^{★} ∥}^{2} - γ_{p} E [f (p_{t}) - f^{★}] - γ_{g} E [f (g_{t}) - f^{★}] .

Remark 3 (qualitative dependence of the weights).

The constants

{\tilde{b}}_{p}, {\tilde{b}}_{g}

inherit the noise scaling and therefore typically grow as

O ({(1 - ω)}^{- 2})

due to the factor

{(1 - ω)}^{- 2}

in Lemma 6. Hence the required weights scale as

A = Ω ({(1 - ω)}^{- 2} ρ_{p}^{- 1})

and

B = Ω ({(1 - ω)}^{- 2} ρ_{g}^{- 1})

.

Proof.

Start from Lemma 9 and add

c (E ∥ x_{t} - x^{★} ∥^{2} - E {∥ x_{t - 1} - x^{★} ∥}^{2})

to both sides. Choose

c \in (b, a)

so that the lagged term is dominated, yielding a net negative coefficient on

E ∥ x_{t} - x^{★} ∥^{2}

. Next, apply Assumption 3 to obtain:

\begin{matrix} E [f (p_{t}) - f^{★}] - E [f (p_{t - 1}) - f^{★}] & \leq - ρ_{p} E [f (p_{t - 1}) - f^{★}], \\ E [f (g_{t}) - f^{★}] - E [f (g_{t - 1}) - f^{★}] & \leq - ρ_{g} E [f (g_{t - 1}) - f^{★}] . \end{matrix}

Pick

A \geq {\tilde{b}}_{p} / ρ_{p}

and

B \geq {\tilde{b}}_{g} / ρ_{g}

to absorb the remainder terms from Lemma 9. Collecting terms yields the stated drift inequality. □

The preceding lemmas show how the proof is assembled. Lemma 7 gives a distance-level drift for the momentum potential but leaves positive terms involving the PSO memory variables. Lemma 9 converts these memory terms into objective gaps using strong convexity. Assumption 3 then supplies the missing decrease in the memory gaps, and the augmented Lyapunov function absorbs the remaining positive terms through the weights A and B. Thus, Section 4 and Section 5 should be read as a closure argument: the stochastic-momentum representation gives the basic negative drift in the current particle position, while the improvement assumption closes the recursion through the personal-best and global-best variables.

Theorem 1 (mean-square convergence of global-best PSO).

Suppose Assumptions 1–3 hold. Assume the PSO (5) and (6) parameters

(ω, c_{1}, c_{2})

lie in a stability region such that the constants in Lemma 7 satisfy

a > b

. Then, for each particle i:

E ∥ x_{i, t} - x^{★} ∥^{2} \to 0 as t \to \infty .

Moreover,

E [f (p_{i, t}) - f^{★}] \to 0, E [f (g_{t}) - f^{★}] \to 0,

and by strong convexity,

E ∥ p_{i, t} - x^{★} ∥^{2} \to 0, E {∥ g_{t} - x^{★} ∥}^{2} \to 0 .

Remark 4 (interpretation of the stability condition).

The condition

a > b

is the point at which the negative drift generated by the attraction toward the surrogate centers dominates the lagged-position term, the gradient-bound remainder, and the stochastic noise variance. It is a conservative sufficient condition, not a sharp characterization of the practical PSO stability region. In the notation of Appendix E, one obtains explicit coefficients

a_{0}, b_{0}, e_{0}, f_{0}

after choosing Young-inequality parameters such as

δ_{p} = δ_{g} = δ_{p}^{'} = δ_{g}^{'} = 1 / 4

; a sufficient check is

a_{0} > b_{0}

together with finite memory coefficients that can be absorbed by the augmented Lyapunov weights.

This interpretation is consistent with standard PSO parameter practice. Larger inertia ω increases the factor

{(1 - ω)}^{- 2}

in the noise bound and increases the lag coefficient through

λ = ω / (1 - ω)

; so, the sufficient condition becomes harder to satisfy as ω approaches one. Larger acceleration parameters

c_{1}

and

c_{2}

strengthen the mean attraction toward

p_{t}

and

g_{t}

, but they also increase the stochastic variance. The stability inequality therefore formalizes the familiar trade-off in PSO parameter selection: inertia and acceleration must be large enough to move the swarm, but not so large that oscillation and noise dominate the contraction mechanism. The commonly used stable regimes discussed in [7] are compatible with this qualitative balance, although the present bound is intentionally conservative.

Proof.

By Lemma 10,

{\tilde{V}}_{t}

has a negative drift with respect to

E ∥ x_{t} - x^{★} ∥^{2}

,

E [f (p_{t}) - f^{★}]

, and

E [f (g_{t}) - f^{★}]

. Summing from

t = 1

to T and telescoping yields:

\sum_{t = 1}^{\infty} E {∥ x_{t} - x^{★} ∥}^{2} < \infty, \sum_{t = 1}^{\infty} E [f (p_{t}) - f^{★}] < \infty, \sum_{t = 1}^{\infty} E [f (g_{t}) - f^{★}] < \infty .

Since the summands are nonnegative, this implies

E ∥ x_{t} - x^{★} ∥^{2} \to 0

and

E [f (p_{t}) - f^{★}]

,

E [f (g_{t}) - f^{★}] \to 0

. Finally, strong convexity gives

E ∥ p_{t} - x^{★} ∥^{2} \to 0

and

E ∥ g_{t} - x^{★} ∥^{2} \to 0

by Lemma 8. □

Remark 5 (deterministic PSO).

If

r_{1, i, t} \equiv r_{2, i, t} \equiv \frac{1}{2}

, then

ε_{t + 1} \equiv 0

and the above proof holds pathwise.

7. Numerical Illustration on a High-Dimensional Convex Quadratic

To supplement the theoretical analysis, we report one additional experiment on a high-dimensional convex objective. The purpose of this experiment is not to provide an extensive empirical comparison, but rather to illustrate that the predicted stabilization and convergence behavior is also observed in a simple large-scale convex setting.

We consider the quadratic objective:

f (x) = \frac{1}{2} x^{⊤} Q x,

where

x \in R^{d}

and

Q = diag (q_{1}, \dots, q_{d})

is diagonal and positive definite. The diagonal entries are chosen deterministically as follows:

q_{i} \in [μ, L], μ = 1, L = 5,

with eigenvalues linearly spaced between 1 and 5. Hence, f is smooth and strongly convex, and its unique minimizer is

x^{★} = 0

.

We run the canonical global-best PSO recursion with:

ω = 0.6, c_{1} = 1.7, c_{2} = 1.7,

using 1000 particles. The PSO parameters are chosen in accordance with the stability analysis of Bonyadi and Michalewicz [7]. The experiment is repeated for dimensions:

d = 49, 50, \dots, 58 .

Particles are initialized randomly in the box

{[- 100, 100]}^{d}

, and velocities are clamped component-wise to the interval

[- 1, 1]

. The random seed is fixed as 20,260,408 in order to make the reported trajectories reproducible. We track the best objective gap:

f (g_{t}) - f^{★},

(12)

where

g_{t}

denotes the global-best position found by the swarm up to iteration t. The run is continued until the objective gap reaches the tolerance:

f (g_{t}) - f^{★} \leq 10^{- 200},

or until the maximum budget of 100,000 iterations is reached. For the run reported here, the tolerance was reached for every tested dimension. The required iteration counts were:

\begin{matrix} d & 49 & 50 & 51 & 52 & 53 & 54 & 55 & 56 & 57 & 58 \\ t_{tol} & 6970 & 7160 & 7350 & 7530 & 7600 & 8060 & 8100 & 8430 & 8540 & 8910 \end{matrix}

where

t_{tol}

denotes the first recorded block endpoint at which the stopping tolerance is met.

Figure 1 shows that the best objective gap decreases steadily over many orders of magnitude for all tested dimensions, with every run reaching the tolerance

10^{- 200}

within the prescribed iteration budget. In particular, the trajectories do not exhibit variance explosion or visible instability under the parameter regime considered here. The nearly linear decay on the logarithmic scale after the initial transient is consistent with the mean-square stabilization mechanism established in the main text.

Figure 2 shows the early iterations of the same experiment. Since the particles are initialized in the large box

{[- 100, 100]}^{d}

, the initial objective gaps are far from zero. Thus, the observed convergence is not an artifact of starting close to the minimizer.

This experiment is intended only as a supplementary illustration. The main contribution of this paper is theoretical: the stochastic-momentum reformulation of PSO, the associated Lyapunov drift analysis, and the resulting mean-square convergence guarantee. Different objective environments can change the observed behavior substantially. Strongly convex quadratics provide a controlled setting in which the assumptions are most transparent. Ill-conditioned convex functions may slow the decrease in the best-value gap; nonsmooth functions can break the smoothness estimates used in the drift proof; multimodal nonconvex functions can violate the mean-improvement condition through premature stagnation; and constrained or noisy simulation-based problems can introduce boundary and sampling effects that are not represented in the unprojected recursion. These cases require additional analysis beyond the theorem proved here.

8. Conclusions

This paper connects the canonical global-best PSO recursion to modern stochastic optimization theory by identifying an explicit stochastic-momentum structure. In particular, the velocity update can be decomposed into an inertial term, a deterministic drift term corresponding to a gradient step on a time-varying quadratic surrogate centered at the personal and global best positions, and a conditionally zero-mean perturbation induced by the random acceleration coefficients. An explicit heavy-ball to iterate-moving-average (IMA) transformation then yields a one-step recursion for an auxiliary sequence, which makes it possible to carry out a Lyapunov drift analysis in the spirit of standard stochastic momentum frameworks.

The main emphasis of this paper is theoretical rather than empirical. Our primary contribution is a mean-square convergence analysis of PSO based on this stochastic-momentum representation, together with explicit drift inequalities and a Lyapunov function that captures both the momentum dynamics and the memory terms. Under smoothness and strong convexity of the objective, together with standard boundedness assumptions on the trajectories (consistent with box constraints and/or velocity clamping used in practice), the resulting drift inequality implies summability of the mean-square error and hence mean-square convergence of particle positions to the unique minimizer, as well as convergence of the personal-best and global-best objective gaps (see Theorem 1). The deterministic PSO dynamics obtained by replacing the random multipliers with their expectations is recovered as a special case of the same framework.

From this perspective, the numerical experiments play only an illustrative role. They are included to confirm that the predicted behavior is consistent with the observed dynamics, rather than to provide a comprehensive benchmarking study (see Section 7 for an additional high-dimensional convex experiment).

The present analysis also makes clear both the essential ingredients of the proof and its current limitations. On the one hand, the argument relies crucially on the surrogate-gradient representation, the martingale structure of the noise term, and a Lyapunov function compatible with the memory updates. On the other hand, the current setting is restricted to smooth strongly convex objectives, boundedness is assumed rather than derived from the dynamics, and the result is formulated only for constant parameters and the global-best (all-to-all) topology.

Several natural directions remain for future work. These include deriving explicit parameter regions that certify the drift condition and comparing them quantitatively with existing PSO stability boundaries, weakening or removing the boundedness assumption, extending the analysis to time-varying parameters and neighborhood topologies, and investigating whether related Lyapunov constructions can yield convergence rates or guarantees beyond the strongly convex regime.

Author Contributions

Conceptualization, B.B.; Methodology, B.B.; Formal Analysis, G.V.; Investigation, G.V.; Writing—Original Draft, G.V.; Writing—Review & Editing, B.B.; Funding Acquisition, B.B. All authors have read and agreed to the published version of the manuscript.

Funding

The work of the first author was partially supported by a grant from the scientific program of Chinese universities “Program to support the stability of Higher Education” (section Shenzhen 2022—Commission on Science, Technology and Innovation of the Shenzhen Municipality 20220819092520001).

Data Availability Statement

The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors gratefully acknowledge the support of the National Key Research and Development Program of China (Grant No. 2025YFE0113400).

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. IMA Transformation: Full Algebra

This appendix provides a detailed derivation of Lemma 2. Starting from the stochastic heavy-ball form (7):

x_{t + 1} = x_{t} - η \nabla U_{t} (x_{t}) + ω (x_{t} - x_{t - 1}) + ε_{t + 1},

assume

ω \in [0, 1)

is constant and define:

λ : = \frac{ω}{1 - ω}, z_{t} : = \frac{x_{t + 1} - ω x_{t}}{1 - ω} .

Equivalently:

(1 - ω) z_{t} = x_{t + 1} - ω x_{t} ⟺ x_{t + 1} = ω x_{t} + (1 - ω) z_{t} .

(A1)

Subtract the corresponding identity at time

t - 1

:

\begin{matrix} (1 - ω) (z_{t} - z_{t - 1}) & = (x_{t + 1} - ω x_{t}) - (x_{t} - ω x_{t - 1}) \\ = x_{t + 1} - (1 + ω) x_{t} + ω x_{t - 1} . \end{matrix}

(A2)

Now use (7) to substitute

x_{t + 1}

:

\begin{matrix} x_{t + 1} - (1 + ω) x_{t} + ω x_{t - 1} & = [x_{t} - η \nabla U_{t} (x_{t}) + ω (x_{t} - x_{t - 1}) + ε_{t + 1}] \\ - (1 + ω) x_{t} + ω x_{t - 1} \\ = - η \nabla U_{t} (x_{t}) + ε_{t + 1} . \end{matrix}

(A3)

Combine (A2) and (A3) and divide by

(1 - ω)

:

z_{t} = z_{t - 1} - \tilde{η} \nabla U_{t} (x_{t}) + {\tilde{ε}}_{t + 1}, \tilde{η} : = \frac{η}{1 - ω}, {\tilde{ε}}_{t + 1} : = \frac{ε_{t + 1}}{1 - ω} .

This is exactly (8). Finally, (A1) is (9). If

E [ε_{t + 1} ∣ F_{t}] = 0

, then

E [{\tilde{ε}}_{t + 1} ∣ F_{t}] = 0

.

Appendix B. Lyapunov Drift Expansion: Full Calculation

This appendix expands the drift identity behind Lemma 3. Define:

L_{t} : = {∥ z_{t} - x^{★} ∥}^{2} .

From the IMA recursion (8):

z_{t} = z_{t - 1} - \tilde{η} \nabla U_{t} (x_{t}) + {\tilde{ε}}_{t + 1} .

Let

s_{t} : = \nabla U_{t} (x_{t})

. Then:

\begin{matrix} L_{t} & = ∥ z_{t - 1} - x^{★} - \tilde{η} s_{t} + {\tilde{ε}}_{t + 1} ∥^{2} \\ = ∥ z_{t - 1} - x^{★} ∥^{2} - 2 \tilde{η} 〈 s_{t}, z_{t - 1} - x^{★} 〉 + {\tilde{η}}^{2} ∥ s_{t} ∥^{2} + 2 〈 {\tilde{ε}}_{t + 1}, z_{t - 1} - x^{★} - \tilde{η} s_{t} 〉 + {∥ {\tilde{ε}}_{t + 1} ∥}^{2} . \end{matrix}

Subtract

L_{t - 1} = {∥ z_{t - 1} - x^{★} ∥}^{2}

and condition on

F_{t}

:

\begin{matrix} E [L_{t} - L_{t - 1} ∣ F_{t}] & = - 2 \tilde{η} 〈 s_{t}, z_{t - 1} - x^{★} 〉 + {\tilde{η}}^{2} {∥ s_{t} ∥}^{2} \\ + 2 E [〈 {\tilde{ε}}_{t + 1}, z_{t - 1} - x^{★} - \tilde{η} s_{t} 〉 ∣ F_{t}] + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] . \end{matrix}

Since

z_{t - 1}, s_{t}

are

F_{t}

-measurable and

E [{\tilde{ε}}_{t + 1} ∣ F_{t}] = 0

:

E [〈 {\tilde{ε}}_{t + 1}, z_{t - 1} - x^{★} - \tilde{η} s_{t} 〉 ∣ F_{t}] = 〈E [{\tilde{ε}}_{t + 1} ∣ F_{t}], z_{t - 1} - x^{★} - \tilde{η} s_{t}〉 = 0 .

Thus:

E [L_{t} - L_{t - 1} ∣ F_{t}] = - 2 \tilde{η} 〈 s_{t}, z_{t - 1} - x^{★} 〉 + {\tilde{η}}^{2} ∥ s_{t} ∥^{2} + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}],

which is (3).

Appendix C. Explicit Bounds for ∥∇ U_t (x_t)∥ 2

This appendix justifies Lemma 5 with explicit constants. From Lemma 1:

\nabla U_{t} (x_{t}) = 2 α (x_{t} - p_{t}) + 2 β (x_{t} - g_{t}) = \frac{c_{1}}{2 η} (x_{t} - p_{t}) + \frac{c_{2}}{2 η} (x_{t} - g_{t}) .

Using

{∥ a + b ∥}^{2} \leq {2 ∥ a ∥}^{2} + 2 {∥ b ∥}^{2}

:

\begin{matrix} ∥ \nabla U_{t} (x_{t}) ∥^{2} & \leq \frac{2}{4 η^{2}} (c_{1}^{2} ∥ x_{t} - p_{t} ∥^{2} + c_{2}^{2} {∥ x_{t} - g_{t} ∥}^{2}) = \frac{1}{2 η^{2}} (c_{1}^{2} ∥ x_{t} - p_{t} ∥^{2} + c_{2}^{2} {∥ x_{t} - g_{t} ∥}^{2}) . \end{matrix}

Next,

∥ x_{t} - p_{t} ∥^{2} \leq 2 ∥ x_{t} - x^{★} ∥^{2} + 2 {∥ p_{t} - x^{★} ∥}^{2}

and similarly for

g_{t}

, hence:

\begin{matrix} ∥ \nabla U_{t} (x_{t}) ∥^{2} & \leq \frac{1}{2 η^{2}} (c_{1}^{2} (2 ∥ x_{t} - x^{★} ∥^{2} + 2 {∥ p_{t} - x^{★} ∥}^{2}) + c_{2}^{2} (2 ∥ x_{t} - x^{★} ∥^{2} + 2 {∥ g_{t} - x^{★} ∥}^{2})) \\ = \frac{2 c_{1}^{2} + 2 c_{2}^{2}}{η^{2}} ∥ x_{t} - x^{★} ∥^{2} + \frac{2 c_{1}^{2}}{η^{2}} ∥ p_{t} - x^{★} ∥^{2} + \frac{2 c_{2}^{2}}{η^{2}} {∥ g_{t} - x^{★} ∥}^{2}, \end{matrix}

which is (5).

Appendix D. Explicit Bounds for the Noise Variance

This appendix gives details for Lemma 6. Recall:

ε_{t + 1} = c_{1} ξ_{1, t} (p_{t} - x_{t}) + c_{2} ξ_{2, t} (g_{t} - x_{t}), ξ_{k, t} : = r_{k, t} - \frac{1}{2} .

Conditioned on

F_{t}

,

ξ_{1, t}, ξ_{2, t}

are independent, mean zero, and

E [ξ_{k, t}^{2} ∣ F_{t}] = Var (r_{k, t}) = 1 / 12

. Hence:

\begin{matrix} E [∥ ε_{t + 1} ∥^{2} ∣ F_{t}] & = E [∥ c_{1} ξ_{1, t} (p_{t} - x_{t}) + c_{2} ξ_{2, t} (g_{t} - x_{t}) ∥^{2} ∣ F_{t}] \\ = c_{1}^{2} E [ξ_{1, t}^{2} ∣ F_{t}] ∥ p_{t} - x_{t} ∥^{2} + c_{2}^{2} E [ξ_{2, t}^{2} ∣ F_{t}] {∥ g_{t} - x_{t} ∥}^{2} \\ = \frac{1}{12} (c_{1}^{2} ∥ p_{t} - x_{t} ∥^{2} + c_{2}^{2} {∥ g_{t} - x_{t} ∥}^{2}) . \end{matrix}

Using

∥ p_{t} - x_{t} ∥^{2} \leq 2 ∥ p_{t} - x^{★} ∥^{2} + 2 {∥ x_{t} - x^{★} ∥}^{2}

and similarly for

g_{t}

:

\begin{matrix} E [∥ ε_{t + 1} ∥^{2} ∣ F_{t}] & \leq \frac{1}{12} (c_{1}^{2} (2 ∥ p_{t} - x^{★} ∥^{2} + 2 ∥ x_{t} - x^{★} ∥^{2}) + c_{2}^{2} (2 ∥ g_{t} - x^{★} ∥^{2} + 2 ∥ x_{t} - x^{★} ∥^{2})) \\ = \frac{c_{1}^{2}}{6} (∥ x_{t} - x^{★} ∥^{2} + {∥ p_{t} - x^{★} ∥}^{2}) + \frac{c_{2}^{2}}{6} (∥ x_{t} - x^{★} ∥^{2} + {∥ g_{t} - x^{★} ∥}^{2}) . \end{matrix}

Finally,

{\tilde{ε}}_{t + 1} = ε_{t + 1} / (1 - ω)

implies:

E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] = \frac{1}{{(1 - ω)}^{2}} E [∥ ε_{t + 1} ∥^{2} ∣ F_{t}],

yielding (10) (with the same constants, scaled by

{(1 - ω)}^{- 2}

).

Appendix E. Detailed Proof of the Distance-Level Drift Inequality

This appendix expands the “collect coefficients” step in Lemma 7. Start from (11):

\begin{matrix} E [L_{t} - L_{t - 1} ∣ F_{t}] & \leq - 2 \tilde{η} ((λ + 1) (U_{t} (x_{t}) - U_{t} (x^{★})) \\ - λ (U_{t} (x_{t - 1}) - U_{t} (x^{★}))) \\ + {\tilde{η}}^{2} ∥ s_{t} ∥^{2} + E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] . \end{matrix}

(A4)

Using Lemma 4 at

x = x_{t}

,

\begin{matrix} U_{t} (x_{t}) - U_{t} (x^{★}) & = (α + β) ∥ x_{t} - x^{★} ∥^{2} + 2 α 〈 x_{t} - x^{★}, x^{★} - p_{t} 〉 + 2 β 〈 x_{t} - x^{★}, x^{★} - g_{t} 〉 . \end{matrix}

Apply Young’s inequality: for any

δ_{p}, δ_{g} > 0

:

\begin{matrix} 2 α | 〈 x_{t} - x^{★}, x^{★} - p_{t} 〉 | & \leq α δ_{p} ∥ x_{t} - x^{★} ∥^{2} + α δ_{p}^{- 1} {∥ p_{t} - x^{★} ∥}^{2}, \\ 2 β | 〈 x_{t} - x^{★}, x^{★} - g_{t} 〉 | & \leq β δ_{g} ∥ x_{t} - x^{★} ∥^{2} + β δ_{g}^{- 1} {∥ g_{t} - x^{★} ∥}^{2} . \end{matrix}

Thus:

\begin{matrix} U_{t} (x_{t}) - U_{t} (x^{★}) & \geq ((α + β) - α δ_{p} - β δ_{g}) {∥ x_{t} - x^{★} ∥}^{2} \\ - α δ_{p}^{- 1} ∥ p_{t} - x^{★} ∥^{2} - β δ_{g}^{- 1} {∥ g_{t} - x^{★} ∥}^{2} . \end{matrix}

(A5)

Similarly, apply Lemma 4 and Young to

x_{t - 1}

to get an upper bound of the form:

\begin{matrix} U_{t} (x_{t - 1}) - U_{t} (x^{★}) & \leq ((α + β) + α δ_{p}^{'} + β δ_{g}^{'}) {∥ x_{t - 1} - x^{★} ∥}^{2} \\ + α {(δ_{p}^{'})}^{- 1} ∥ p_{t} - x^{★} ∥^{2} + β {(δ_{g}^{'})}^{- 1} {∥ g_{t} - x^{★} ∥}^{2} . \end{matrix}

(A6)

for arbitrary

δ_{p}^{'}, δ_{g}^{'} > 0

.

Plug (A5) and (A6) into (A4). Then, bound the remaining terms by Lemmas 5 and 6:

\begin{matrix} {\tilde{η}}^{2} {∥ s_{t} ∥}^{2} \leq {\tilde{η}}^{2} (C_{x} ∥ x_{t} - x^{★} ∥^{2} + C_{p} ∥ p_{t} - x^{★} ∥^{2} + C_{g} {∥ g_{t} - x^{★} ∥}^{2}), \\ E [∥ {\tilde{ε}}_{t + 1} ∥^{2} ∣ F_{t}] \leq V_{x} ∥ x_{t} - x^{★} ∥^{2} + V_{p} ∥ p_{t} - x^{★} ∥^{2} + V_{g} {∥ g_{t} - x^{★} ∥}^{2}, \end{matrix}

where (from Lemmas 5 and 6):

\begin{matrix} C_{x} = \frac{2 c_{1}^{2} + 2 c_{2}^{2}}{η^{2}}, C_{p} = \frac{2 c_{1}^{2}}{η^{2}}, C_{g} = \frac{2 c_{2}^{2}}{η^{2}}, V_{x} = \frac{c_{1}^{2} + c_{2}^{2}}{6 {(1 - ω)}^{2}}, \\ V_{p} = \frac{c_{1}^{2}}{6 {(1 - ω)}^{2}}, V_{g} = \frac{c_{2}^{2}}{6 {(1 - ω)}^{2}} . \end{matrix}

Collecting coefficients gives a bound of the form:

E [L_{t} - L_{t - 1} ∣ F_{t}] \leq - a_{0} ∥ x_{t} - x^{★} ∥^{2} + b_{0} ∥ x_{t - 1} - x^{★} ∥^{2} + e_{0} ∥ p_{t} - x^{★} ∥^{2} + f_{0} {∥ g_{t} - x^{★} ∥}^{2},

with

\begin{matrix} a_{0} & : = 2 \tilde{η} (λ + 1) ((α + β) - α δ_{p} - β δ_{g}) - {\tilde{η}}^{2} C_{x} - V_{x}, \\ b_{0} & : = 2 \tilde{η} λ ((α + β) + α δ_{p}^{'} + β δ_{g}^{'}), \\ e_{0} & : = 2 \tilde{η} (λ + 1) α δ_{p}^{- 1} + 2 \tilde{η} λ α {(δ_{p}^{'})}^{- 1} + {\tilde{η}}^{2} C_{p} + V_{p}, \\ f_{0} & : = 2 \tilde{η} (λ + 1) β δ_{g}^{- 1} + 2 \tilde{η} λ β {(δ_{g}^{'})}^{- 1} + {\tilde{η}}^{2} C_{g} + V_{g} . \end{matrix}

Take full expectation to obtain the unconditional version. Choosing, for instance,

δ_{p} = δ_{g} = \frac{1}{4}, δ_{p}^{'} = δ_{g}^{'} = \frac{1}{4},

makes all coefficients explicit and finite. On the role of

η

, under the surrogate scaling

α = c_{1} / (4 η)

and

β = c_{2} / (4 η)

with

\tilde{η} = η / (1 - ω)

, the leading terms in

a_{0}

involve products of the form

\tilde{η} (α + β)

and

{\tilde{η}}^{2} C_{x}

, for which

η

cancels. Therefore, positivity of

a_{0}

is governed primarily by the parameter combination

(ω, c_{1}, c_{2})

(and the auxiliary constants

δ_{\cdot}, λ, \dots

) under this reduction;

η

does not act as an independent “small step-size” knob in the usual sense.

References

Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the IEEE International Conference on Neural Networks; IEEE: New York, NY, USA, 1995; Volume 4, pp. 1942–1948. [Google Scholar]
Garrigos, G.; Gower, R.M. Handbook of Convergence Theorems for (Stochastic) Gradient Methods. arXiv 2024, arXiv:2301.11235. [Google Scholar] [CrossRef]
Jiang, M.; Luo, Y.; Yang, S. Stochastic convergence analysis and parameter selection of the standard PSO algorithm. Inf. Process. Lett. 2007, 102, 8–16. [Google Scholar] [CrossRef]
Chen, X.; Li, Y. A modified PSO structure resulting in high exploration ability with convergence guaranteed. IEEE Trans. Syst. Man Cybern. B Cybern. 2007, 37, 1271–1289. [Google Scholar] [CrossRef] [PubMed]
Kadirkamanathan, V.; Selvarajah, K.; Fleming, P. Stability analysis of the particle dynamics in particle swarm optimizer. IEEE Trans. Evol. Comput. 2006, 10, 245255. [Google Scholar] [CrossRef]
Poli, R. Dynamics and stability of the sampling distribution of PSO via moment analysis. J. Artif. Evol. Appl. 2008, 2008, 761459. [Google Scholar]
Bonyadi, M.R.; Michalewicz, Z. Stability Analysis of the Particle Swarm Optimization Without Stagnation Assumption. IEEE Trans. Evol. Comput. 2016, 20, 814–819. [Google Scholar] [CrossRef]
Xu, G.; Yu, X. On convergence analysis of particle swarm optimization algorithm. J. Comput. Appl. Math. 2018, 333, 65–73. [Google Scholar] [CrossRef]
Hu, D.; Qiu, X.; Liu, Y.; Zhou, X. Probabilistic convergence analysis of the stochastic PSO model without the stagnation assumption. Inf. Sci. 2021, 547, 996–1007. [Google Scholar] [CrossRef]
Dong, W.; Zhang, R. Stochastic stability analysis of composite dynamic system for particle swarm optimization. Inf. Sci. 2022, 592, 227–243. [Google Scholar] [CrossRef]
Bruned, V.; Mas, A.; Wlodarczyk, S. Weak convergence of particle swarm optimization. arXiv 2018, arXiv:1811.04924. [Google Scholar]
Tarekegn Nigatu, D.; Gemechu Dinka, T.; Luleseged Tilahun, S. Convergence analysis of particle swarm optimization algorithm by a velocity control method. Front. Appl. Math. Stat. 2024, 10, 1304268. [Google Scholar] [CrossRef]
Huang, X.; Qiu, H.; Riedl, M. On the Global Convergence of Particle Swarm Optimization Methods. Appl. Math. Optim. 2023, 88, 30. [Google Scholar] [CrossRef]
Cui, X. Symmetry-Based Convergence Theory for PSO: From Heuristic to Provably Convergent Optimization. Symmetry 2026, 18, 28. [Google Scholar]

Figure 1. Best objective gap

f (g_{t}) - f^{★}

versus iteration number for PSO on strongly convex diagonal quadratic objectives with dimensions

d = 49, \dots, 58

. The parameters are

ω = 0.6

,

c_{1} = 1.7

, and

c_{2} = 1.7

, with 1000 particles initialized in

{[- 100, 100]}^{d}

and componentwise velocity clamp

[- 1, 1]

.

Figure 1. Best objective gap

f (g_{t}) - f^{★}

versus iteration number for PSO on strongly convex diagonal quadratic objectives with dimensions

d = 49, \dots, 58

. The parameters are

ω = 0.6

,

c_{1} = 1.7

, and

c_{2} = 1.7

, with 1000 particles initialized in

{[- 100, 100]}^{d}

and componentwise velocity clamp

[- 1, 1]

.

Figure 2. Initial phase of the same experiment as in Figure 1. The swarm is initialized far from the minimizer, and the plot shows that the decrease in the objective gap is not caused by immediate initialization near the solution.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Budak, B.; Vorontsov, G. Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis. Mathematics 2026, 14, 2107. https://doi.org/10.3390/math14122107

AMA Style

Budak B, Vorontsov G. Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis. Mathematics. 2026; 14(12):2107. https://doi.org/10.3390/math14122107

Chicago/Turabian Style

Budak, Boris, and Georgii Vorontsov. 2026. "Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis" Mathematics 14, no. 12: 2107. https://doi.org/10.3390/math14122107

APA Style

Budak, B., & Vorontsov, G. (2026). Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis. Mathematics, 14(12), 2107. https://doi.org/10.3390/math14122107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis

Abstract

1. Introduction

2. Related Work

3. Problem Setting and PSO Dynamics

4. PSO as Stochastic Momentum on a Quadratic Surrogate

Heavy-Ball and IMA Forms (Fully Explicit)

5. Lyapunov Function and Structural Bounds

6. Drift Inequality and Convergence

Composite Lyapunov and Closure of the Drift

7. Numerical Illustration on a High-Dimensional Convex Quadratic

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. IMA Transformation: Full Algebra

Appendix B. Lyapunov Drift Expansion: Full Calculation

Appendix C. Explicit Bounds for ∥∇ U_t (x_t)∥ 2

Appendix D. Explicit Bounds for the Noise Variance

Appendix E. Detailed Proof of the Distance-Level Drift Inequality

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Mean-Square Convergence of Particle Swarm Optimization via Stochastic Momentum Analysis

Abstract

1. Introduction

2. Related Work

3. Problem Setting and PSO Dynamics

4. PSO as Stochastic Momentum on a Quadratic Surrogate

Heavy-Ball and IMA Forms (Fully Explicit)

5. Lyapunov Function and Structural Bounds

6. Drift Inequality and Convergence

Composite Lyapunov and Closure of the Drift

7. Numerical Illustration on a High-Dimensional Convex Quadratic

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. IMA Transformation: Full Algebra

Appendix B. Lyapunov Drift Expansion: Full Calculation

Appendix C. Explicit Bounds for ∥∇ Ut (xt)∥ 2

Appendix D. Explicit Bounds for the Noise Variance

Appendix E. Detailed Proof of the Distance-Level Drift Inequality

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Appendix C. Explicit Bounds for ∥∇ U_t (x_t)∥ 2