Next Article in Journal
Nonlinear Dynamics of Perturbations of a Plane Potential Fluid Flow: Nonlocal Generalization of the Hopf Equation
Previous Article in Journal
Distributed Observers for State Omniscience with Stochastic Communication Noises
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Stochastic Convergence Result for the Nelder–Mead Simplex Method

Óbuda University, 1034 Budapest, Hungary
Mathematics 2023, 11(9), 1998; https://doi.org/10.3390/math11091998
Submission received: 2 March 2023 / Revised: 5 April 2023 / Accepted: 20 April 2023 / Published: 23 April 2023

Abstract

:
We prove that the Nelder–Mead simplex method converges in the sense that the simplex vertices converge to a common limit point with a probability of one. The result may explain the practical usefulness of the Nelder–Mead method.

1. Introduction

The Nelder–Mead (NM) simplex method [1] is a direct method for the solution of the minimization problem
f x min f : R n R ,
where f is continuous. It is an “incredibly popular method” (see [2]) in derivative-free optimization [3,4,5,6,7] and in various application areas (see, e.g., [8]). The Nelder–Mead simplex method became popular especially in the computational chemistry area as shown by the book [3] and the references therein. The original Nelder–Mead paper [1] has 36,809 references in Google Scholar as of 26 February 2023, showing a great variety of applications and a great number of mainly heuristic variants occasionally combined with other techniques. The Nelder–Mead algorithm can be found in many software libraries or systems as well, such as IMSL, NAG, Matlab, Scilab, Python SciPy and R [9]. The popularity of the method is due to its observed good performance in practice. In spite of this, only a few theoretical results are known on its convergence (see, e.g., [2,10,11,12]).
The counterexample of McKinnon [13] is a strictly convex function f : R 2 R with continuous derivatives on which the Nelder–Mead algorithm converges to a nonstationary point of f. For strictly convex functions f : R 2 R with bounded level sets, Lagarias, Reeds, Wright and Wright [10] proved that the function values at all simplex vertices converged to the same value and the diameters of the simplices are converging to zero. Kelley [4,14] gave a sufficient-decrease condition for the average of the objective function values (evaluated at the simplex vertices) and proved that if this condition is satisfied during the process, then any accumulation point of the simplices is a critical point of f. Han and Neumann [15] investigated the convergence to 0 and the effect of dimensionality on the function f x = x T x ( x R n ). For the restricted Nelder–Mead algorithm, Lagarias, Poonen and Wright [11] later proved that if f : R 2 R was a twice-continuously differentiable function with bounded level sets and everywhere positive definite Hessian, then it converged to the unique minimizer of f.
If the objective function f does not satisfy the conditions of [10] or [11], then a number of counterexamples show that the Nelder–Mead method may have different types of convergence behavior. It is possible that the function values at the simplex vertices converge to a common value, while the function f has no finite minimum and the simplex sequence is unbounded (Examples 1 and 2 of [16]). It is also possible that the simplex vertices converge to the same point, but the limit point is not a stationary point of f ([13], Examples 3 and 4 of [16]). Other examples indicate that the simplex sequence may converge to a limit simplex of positive diameter resulting in different limit values of f at the vertices at the limit simplex (Examples 4 and 5 of [16]).
Here, we study the convergence of the simplex vertices to a common limit point. In papers [16,17], we proved this type of convergence under sufficient conditions for 1 n 3 and 1 n 8 , respectively. However, the key assumption of papers [16,17] was related to an algorithmically undecidable problem and required ways to circumvent it.
In this paper, we prove two new theorems for the convergence of the Nelder–Mead method in low dimensional spaces ( n = 2 , 3 , 4 , 5 , 6 ). Theorem 1 replaces the key assumption of [16,17] with an algorithmically computable one. It is the basis of Theorem 2 of Section 6, which proves that the Nelder–Mead method converges with a probability of one. This result may explain the good behavior of the Nelder–Mead method experienced in practice. The case of two-dimensional strictly convex functions is observed in Remarks 2 and 4, respectively.

2. The Nelder–Mead Simplex Method

There are several forms and variants of the Nelder–Mead method. We use the version of Lagarias, Reeds, Wright and Wright [10]. The vertices of the initial simplex S 0 are denoted by x 1 0 , x 2 0 , , x n + 1 0 R n . It is assumed that vertices x 1 0 , x 2 0 , , x n + 1 0 are ordered such that
f x 1 0 f x 2 0 f x n + 1 0
and this condition is maintained during the iterations of the Nelder–Mead algorithm. The simplex of iteration k is denoted by S k = x 1 k , x 2 k , , x n + 1 k R n × n + 1 . Define x c k = 1 n i = 1 n x i k and x k λ = 1 + λ x c k λ x n + 1 k . The reflection, expansion and contraction points of simplex S k are defined by
x r k = x k 1 , x e k = x k 2 , x o c k = x k 1 2 , x i c k = x k 1 2 ,
respectively. The function values at the vertices x j k and the points x r k , x e k , x o c k and x i c k are denoted by f x j k = f j k ( j = 1 , , n + 1 ), f r k = f x r k , f e k = f x e k , f o c k = f x o c k and f i c k = f x i c k , respectively.
The Nelder–Mead simplex method (Algorithm 1) is a nonstationary iterative method where the kernel of the iteration loop consists of ordering the simplex vertices and exactly one of four possible operations. The logical conditions for these operations (reflection, expansion, contraction and shrinking) are mutually exclusive.
For the order operation, there are two rules that apply to reindexing after each iteration. If a nonshrink step occurs, then x n + 1 k is replaced by a new point v x r k , x e k , x o c k , x i c k . The following cases are possible:
f v < f x 1 k , f x 1 k f v f x n k , f v < f x n + 1 k .
If
j = 1 , if f v < f x 1 k max 2 n + 1 f x 1 k f v f x k , otherwise ,
then the new simplex vertices are
x i k + 1 = x i k 1 i j 1 , x j k + 1 = v , x i k + 1 = x i 1 k i = j + 1 , , n + 1 .
This rule inserts v into the ordering with the highest possible index. If shrinking occurs, then
z 1 = x 1 k , z i = x i k + x 1 k / 2 ( i = 2 , , n + 1 )
plus a reordering takes place. If f x 1 k f z i ( i = 2 , , n + 1 ), then by convention x 1 k + 1 = x 1 k . Hence, it is guaranteed that
f x 1 k f x 2 k f x n + 1 k k 0 .
The insertion rule (2) implies that if function f is bounded below on R n and only a finite number of shrink iterations occur, then each sequence f i k converges to some f i for i = 1 , , n + 1 (see Lemma 3.3 of [10]).
Algorithm 1: Nelder–Mead algorithm
Mathematics 11 01998 i001

3. A Matrix Form of the Nelder–Mead Method

Assume that simplex S k = x 1 k , x 2 k , , x n + 1 k is such that condition (3) holds. If the incoming vertex v is of the form
v = 1 + α n i = 1 n x i k α x n + 1 k = x k α
for some α 1 , 2 , 1 2 , 1 2 , we can define the transformation matrix
T α = I n 1 + α n e 0 α e = 1 , 1 , , 1 T .
Since S k T α = x 1 k , , x n k , x α , we have to reorder the matrix columns according to the insertion rule (2). Define the permutation matrix
P j = e 1 , , e j 1 , e n + 1 , e j , , e n R n + 1 × n + 1 j = 1 , , n + 1 .
Then, S k T α P j is the new simplex S k + 1 . The following cases are possible
OperationNew simplex
Reflection ( v = x r k ) S k + 1 = S k T 1 P j ( j = 2 , , n )
Expansion ( v = x e k ) S k + 1 = S k T 2 P 1
Expansion ( v = x r k ) S k + 1 = S k T 1 P 1
Outside contraction ( v = x o c k ) S k + 1 = S k T 1 2 P j ( j = 1 , , n + 1 )
Inside contraction ( v = x i c k ) S k + 1 = S k T 1 2 P j ( j = 1 , , n + 1 )
For shrinking, the new simplex is
S k + 1 = S k T s h r P ,
where
T s h r = 1 2 I n + 1 + 1 2 e 1 e T ,
the permutation matrix P P n + 1 is defined by the ordering condition (3), and P n + 1 is the set of all possible permutation matrices of order n + 1 .
Hence, for k 1 ,
S k = S k 1 T k P k = S 0 B k ,
where
B k = i = 1 k T i P i T i P i T
and
T = T α P j : α 1 2 , 1 2 , j = 1 , , n + 1 T s h r P : P P n + 1 T 1 P j : j = 1 , , n T 2 P 1
Note that T contains 3 n + 3 + n + 1 ! matrices.
Observe that the transformation matrices T α , T s h r , T α P and T s h r P ( P P n + 1 ) are nonsingular and have the property that their column sums are equal to one (see, e.g., [16,17]). The latter property implies that T P 1 ( T P T ) holds in any induced matrix norm.
We seek for conditions that guarantee that
lim k x i k x ^ i = 1 , 2 , , n + 1
holds for some vector x ^ . If so, then lim k S k = x ^ e T and for k , both f i k f x ^ ( i = 1 , 2 , , n + 1 ) and diam S k 0 follow. In fact, we prove the convergence of the right infinite matrix product
i = 1 T i P i ( T i P i T )
to a rank-one matrix of the form B = w e T , from which S k = S 0 B k x ^ e T ( x ^ = S 0 w ) and the speed estimate
diam S k 2 S 0 B k B
also follow (see also [16,17]).
Counterexamples of [16,17,18] show that even if the simplex sequence S k converges to some limit S , it may happen that diam S > 0 . If diam S = 0 holds, that is S = x ^ e T for some vector, x ^ , it may happen that x ^ is not a stationary or minimum point (see McKinnon [13] and also [16,17,18]).

4. Properties of the Transformation Matrices

The spectra of the transformation matrices T α P j and T s h r P are fully characterized in Section 3 of [16]. Furthermore, these matrices have a common similarity form (9). Define the matrix
F = 1 e T 0 I n .
Lemma 1 
([16,17]). For all T i P i T , matrix F 1 T i P i F has the form
F 1 T i P i F = 1 0 b i C i ,
where b i R n and C i R n × n depends on T i P i .
For a more general result, see Hartfiel [19]. Note that a constant γ > 0 exist such that b i γ holds for all T i P i T . For later use, we make the following numbering of the elements T s P s T and their corresponding matrices C s :
T s P s T C s
T 1 P j + 1 C j ( j = 1 , , n 1 )
T 2 P 1 C n
T 1 P 1 C n + 1
T 1 2 P j C n + 1 + j ( j = 1 , , n + 1 )
T 1 2 P j C 2 n + 2 + j ( j = 1 , , n + 1 )
T s h r P P P n + 1 C 3 n + 3 + j ( j = 1 , , n + 1 ! )
where the numbering of the permutations P P n + 1 follows the perms function of Matlab (in actual computations).

5. An Improved Convergence Result

Here, we prove a new convergence theorem where the key condition is numerically checkable at least for low dimensions unlike in [16,17], where the key assumption was algorithmically undecidable and required ways to circumvent this problem. The new result is the basis of the stochastic convergence result of Section 6. The latter theorem may explain, in a sense, the experienced good behavior of the Nelder–Mead method in practice.
Formula (9) implies that
B k = j = 1 k T i j P i j = F L k F 1 T i j P i j T ,
where
L k = j = 1 k 1 0 b i j C i j ,
and B k is convergent if and only if L k is convergent to some matrix
1 0 x ˜ Y ˜ .
We use the following simple result (see, e.g., [16,17]). For i 1 , let
A i = 1 0 b i C i R n + 1 × n + 1 C i R n × n .
Lemma 2. 
Assume that j = 1 k C j c k , k = 1 c k is convergent ( < ) and b k γ for all k. Then, L k = j = 1 k A j converges and
lim k L k = 1 0 x ˜ 0
for some x ˜ .
Proof. 
It is easy to see that
L k = j = 1 k A j = 1 0 i = 1 k j = 1 i 1 C j b i j = 1 k C j = 1 0 x k j = 1 k C j .
If k = 1 c k is convergent, then c k 0 . Hence, j = 1 k C j 0 as k . Since s k = j = 1 k c j is convergent, for any ε > 0 there is a number k 0 = k 0 ε such that for m > k k 0 , s m s k < ε . Thus, for m > k k 0 , we obtain
x m x k i = k + 1 m j = 1 i 1 C j b i γ i = k + 1 m c i 1 γ ε .
Hence, x k x ˜ for some x ˜ . □
If C j q < 1 for j 1 , then j = 1 k C j q k and the series i = 1 q i is convergent.
Assume now that n 2 and consider all possible products of T i j P i j of fixed length ( 2 ). For each product j = 1 T i j P i j , there is a corresponding product j = 1 C i j . Define the sets C = C i : T i P i T and
C = j = 1 C i j : C i j C = Y i : i = 1 , 2 , , N ,
where N = 3 n + 3 + n + 1 ! . Consider the norm of the elements of C and decompose the set C in the form C = C q C Q , where
C q = j = 1 C i j C : j = 1 C i j q q < 1 ,
C Q = j = 1 C i j C : q < j = 1 C i j Q Q > 1 ,
0 < q < 1 is a fixed number, and for simplicity, Q is selected such that C i Q also holds for all C i C . That such a q exists, C Q and C q are not empty follow from [16]. This fact is also indicated by Equation (21).
We investigate the product (11). For any k , write k = m + r with m , r N and 0 r < . Note that m = k , where · stands for the floor function. Then,
j = 1 k C i j = j = 1 m C i j 1 + 1 C i j C i m + 1 C i m + r
and
j = 1 k C i j i = 1 m C i j 1 + 1 C i j Q r .
Assume that r 1 m -products belong to C q and r 2 m -product belong to C Q . Clearly r 1 m + r 2 m = m . Then, j = 1 k C i j q r 1 m Q r 2 m + 1 . There exists an integer κ 1 such that q 1 κ Q q κ . Hence, j = 1 k C i j Q 1 q r 1 m κ r 2 m . Moreover, assume that r 1 m κ r 2 m μ m for some μ 0 , 1 . This assumption guarantees that the elements from C q counterbalance the effect of those from C Q . It also implies the density inequalities
r 1 m μ + κ 1 + κ m > 1 μ 1 + κ m r 2 m .
It follows that
j = 1 k C i j Q 1 q μ m = Q 1 q μ m = : c k .
Now, q μ < 1 and k 1 < m = k r k . Hence,
c k : = Q 1 q μ m Q 1 q μ k 1 = Q 1 q μ q μ k = Γ 1 q μ k 0 ,
since q μ < 1 and k = 1 c k Γ 1 k = 1 q μ k < . Hence, by Lemma 2
lim k L k = 1 0 x ˜ 0 = L ˜
holds for some vector x ˜ . Since there exists a constant γ > 0 such that b i γ ,
x ˜ x k = i = k + 1 j = 1 i 1 C i j b i γ Γ 1 i = k q μ i Γ 2 q μ k ,
L k L ˜ ϑ Γ 3 q μ k
holds with a suitable constant Γ 3 > 0 . It follows that
B k F 1 0 x ˜ 0 F 1 = 1 e T x ˜ x ˜ e T = w e T = B
and
B k B ϑ Γ 4 cond F q μ k .
We can summarize the obtained results in the following.
Theorem 1. 
Assume that n 2 , S 0 is nondegenerate, 2 is fixed and C q is not empty. Let r 1 k be the number of ℓ-products that belong to C q and r 2 k be the number of those ℓ-products that belong to C Q during the first k iterations of the Nelder–Mead method. Moreover, assume that for κ N , q 1 κ Q q κ and for some μ 0 , 1 , r 1 k μ k + κ r 2 k holds ( k k 0 ). Then, the Nelder–Mead algorithm converges in the sense that
lim k x j k = x ^ ( j = 1 , , n + 1 )
with a convergence speed proportional to O q μ k . If f is continuous at x ^ , then
lim k f x j k = f x ^ ( j = 1 , , n + 1 )
holds as well.
A simple but time-consuming computation of the elements of C shows the feasibility of the assumptions of Theorem 1. Equation (21) shows the ratios of C q / C for the specified n , pairs, in case of spectral norm and q = 0.99 .
n 2 3 4 5 2 0.7111 0.8361 0.9020 0.9409 3 0.8518 0.9374 0.9738 0.9891 4 0.8507 0.9760 0.9956 5 0.9641 0.9973 6 0.9935
The greater the ratio C q / C , the better the chance for convergence since more elements can be selected from C q than from C Q . For any n 2 , there are problems on which the NM algorithm does not converge in the above sense (see [16]). Hence, in the general case, the density assumption seems to be necessary.
Corollary 1. 
For n = 2 , 3 , 4 , 5 , 6 , Equation (21) implies the convergence of the Nelder–Mead method under the density condition of Theorem 1.
For strictly convex functions f : R n R , Lagarias et al. proved ([10] Lemma 3.5) that no shrinking occurs when the Nelder–Mead algorithm applied to f. The following observation has some importance, if we can rule out certain operations or steps of the Nelder–Mead method when it is applied to a function f.
Remark 1. 
Theorem 1 remains true, if sets T , C , C q and C Q are replaced by the subsets T ˜ T , C ˜ = C i : T i P i T ˜ and C ˜ = j = 1 C i j : C i j C ˜ , such that C ˜ = C ˜ q C ˜ Q ,
C ˜ q = j = 1 C i j C ˜ : j = 1 C i j q ,
C ˜ Q = j = 1 C i j C ˜ : q < j = 1 C i j Q
and C ˜ q is nonempty.
Remark 2. 
Assume that n = 2 and the operation set T is restricted to
T ˜ = T T s h r P : P P n + 1
(no shrinking occurs). For q = 0.99 , the ratios C ˜ q / C ˜ are given in the next equation
2 3 4 5 6 7 C ˜ q / C ˜ 0.3456 0.4691 0.5468 0.6143 0.6715 0.7187
Hence, we have convergence for strictly convex functions f : R 2 R under the conditions of Theorem 1.
Theorem 1 is based on the behavior of consecutive steps of the Nelder–Mead method. The difference between Theorem 1 and Theorem 9 of [16] is the following. In the case of Theorem 9 of [16], we identified a subset W 1 T and constructed a matrix norm · ϑ , such that for every T i P i W 1 , C i ϑ q < 1 held. The existence of such norm was related to an algorithmically undecidable problem. The ratio of operations from W 1 and T W 1 then decided the convergence. Here, in Theorem 1, we avoided the construction of the matrix norm · ϑ , by computing the (spectral) norms of consecutive steps (products of operators) and sorting them into two subsets C q and C Q . The rest of the proof was quite similar to that of Theorem 9 of [16]. The norms of -products can be easily computed, although the computation time and required memory quickly increase with n and .
If one can show that for a given function f, the NM method takes only steps that belong to C q for some , then the convergence immediately follows. However, if it is not the case, then either one has to make some additional condition as in Theorem 1 or seek some statistical characterization such as Theorem 2 of the next section.

6. A Random Convergence Result

The counterexamples of [16,17,18] showed that there was no sure convergence for the NM method. One may ask, however, if Theorem 1 is sharp enough in some sense. Here, we study a simple random model using the proof of Theorem 1. Formulas (15) and (16) imply that it is enough to study the infinite product
j = 1 C i j = j = 1 C i j 1 + 1 C i j = j = 1 Y i j ( Y i j C ) .
Assume that the -products Y i C are randomly chosen with the probability p i 0 and i = 1 N p i = 1 . Moreover, assume that the subsequent -products C i j 1 + 1 C i j ( T i j 1 + 1 T i j ) are randomly chosen independently of each other.
Let a i = Y i 2 ( Y i C , i = 1 , , N ). Note that a i > 0 for 1 i N and there are numbers a j such that a j > 1 (see also Section 3 of [16]). Let X be a random variable and let a 1 , a 2 , , a N be the values which it assumes. Let P X = a i = p i ( 1 i N ) be the probability distribution of X. Assume that for the expected value of X, μ = E X = i = 1 N p i a i < 1 holds.
If X is uniformly distributed, the expected values μ = E X belonging to the cases of Equation (21) are given in the following equation (with four-decimal digit precision).
n 2 3 4 5 2 0.8512 0.6515 0.4931 0.3725 3 0.7435 0.4891 0.3202 0.2093 4 0.5963 0.3305 0.1825 5 0.5704 0.2918 6 0.6015
Hence, the condition μ = E X < 1 holds in these cases.
We need the following simple results.
Lemma 3. 
Let the positive random variables X i i = 1 be independent and identically distributed with the same distribution as X, and assume that μ = E X < 1 . Then, Z k = i = 1 k X i 0 holds with a probability of one. Moreover, there exist numbers c k k = 1 such that k = 1 c k < and only a finite number of the events Z k c k can occur with a probability of one.
Proof. 
The independence of X i ’s implies that E Z k = i = 1 k E X i = μ k . For any c k > 0 , it follows from the Markov inequality that P Z k c k E Z k c k = μ k c k . Select c k = μ k / 2 ( k 1 ). Then,
k = 1 P Z k c k k = 1 c k = μ 1 μ < .
It follows from the Borel–Cantelli lemma (see, e.g., Theorem 1.5.1 of [20] or Borovkov [21]) that Z n 0 with a probability of one and only a finite number of the events Z k c k ( k = 1 , 2 , ) can occur. □
Corollary 2. 
Under the conditions of Lemma 3 there exists numbers c ˜ k 0 ( k 1 ) such that Z k c ˜ k ( k 1 ) and k = 1 c ˜ k < .
Proof. 
There is a random index n ω such that P n ω < with a probability of one, and Z k < c k if k n ω . Since a ˜ = max i a i > 1 and for any k, P Z k = a ˜ k P X 1 = a ˜ , , X k = a ˜ > 0 , there is no fixed number n 0 such that Z k < ε , for k n 0 and for all ε > 0 . However, for k < n ω , Z k c ˜ k : = a ˜ k and Z k < c ˜ k : = c k for k n ω . Thus, it follows that Z k c ˜ k ( k 1 ) and k = 1 c ˜ k < . □
Theorem 2. 
Assume that n 2 , S 0 is nondegenerate, 2 is fixed, C q , and the ℓ-products Y i C are randomly chosen with probability p i 0 and i = 1 N p i = 1 . Furthermore, assume that the subsequent ℓ-products are randomly chosen independently of each other. If μ = i = 1 N a i p i < 1 , then for the Nelder–Mead algorithm,
lim k x j k = x ^ ( j = 1 , , n + 1 )
holds with a probability of one. If, in addition, f is continuous at x ^ , then
lim k f x j k = f x ^ ( j = 1 , , n + 1 )
also holds with a probability of one.
Proof. 
Let X be a random variable defined on the positive numbers a i i = 1 N with the probability distribution p i i = 1 N . By assumption, μ = E X = i = 1 N a i p i < 1 . Lemma 3 and Corollary 2 imply that
j = 1 k C i j j = 1 k C i j 1 + 1 C i j = j = 1 k a s j c ˜ k k 1
and i = 1 c ˜ k < . The result now follows from Lemma 2, the proof of Theorem 1 and the “continuity theorem” (see, e.g., Borovkov [21], Theorem 6.1.4, p. 134). □
Note that assumption r 1 k μ k + κ r 2 k does not occur here. Instead, we use the assumption μ = E X < 1 . Furthermore, note that the smaller the μ , the faster the convergence.
If we assume a uniform distribution for X, then we have the following.
Corollary 3. 
Under the assumption of uniform distribution, Equation (23) implies Theorem 2 for n = 2 , 3 , 4 , 5 , 6 .
Theorem 2 and the Corollary 3 ensure the convergence of the NM algorithm with a probability of one although the limit point is not necessarily a minimum point of f. Concerning the speed of convergence, we note that some classical stochastic approximation algorithms (Robbins–Monro, Kiefer–Wolfowitz methods) also converge with a probability of one (see, e.g., [22]).
Remark 3. 
Theorem 2 remains true, if the sets C q and C are replaced by the subsets C ˜ q and C ˜ , respectively, and the probability assumptions for these also hold.
Remark 4. 
Assume that n = 2 and the operation set T is restricted to
T ˜ = T T s h r P : P P n + 1
(no shrinking occurs). For q = 0.99 and a uniform distribution, the expected values μ are given in the next equation
2 3 4 5 6 7 μ 1.2961 1.2139 1.1215 1.0334 0.9489 0.8701
Hence, Theorem 2 holds for = 6 , 7 . It follows that for strictly convex functions f : R 2 R , the Nelder–Mead method converges with a probability of one under the assumption of uniform distribution.

7. Conclusions

Although the convergence of the Nelder–Mead simplex method to a minimum point of a function f cannot be guaranteed in general, the main result indicates that in the stochastic sense, it converges to a point at which the value of f is less than the best function value at the start. It also may explain the good behavior of the Nelder–Mead method on average.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Acknowledgments

The author is highly indebted to László Szeidl for his help and comments on Section 6. The author is also indebted to the unknown referees for their observations and remarks that improved the paper.

Conflicts of Interest

The author declares no conflict of interest.

References

  1. Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
  2. Larson, J.; Menickelly, M.; Wild, S. Derivative-free optimization methods. Acta Numer. 2019, 28, 287–404. [Google Scholar] [CrossRef]
  3. Walters, F.; Morgan, S.; Parker, L.P.L.; Deming, S. Sequential Simplex Optimization; CRC Press LLC: Boca Raton, FL, USA, 1991. [Google Scholar]
  4. Kelley, C. Iterative Methods for Optimization; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 1999. [Google Scholar] [CrossRef]
  5. Conn, A.; Scheinberg, K.; Vicente, L. Introduction to Derivative-Free Optimizations; Society for Industrial and Applied Mathematics (SIAM): Philadelphia, PA, USA, 2009. [Google Scholar] [CrossRef]
  6. Audet, C.; Hare, W. Derivative-Free and Blackbox Optimization; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
  7. Kochenderfer, M.; Wheeler, T. Algorithms for Optimization; The MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
  8. Tekile, H.; Fedrizzi, M.; Brunelli, M. Constrained Eigenvalue Minimization of Incom-plete Pairwise Comparison Matrices by Nelder-Mead Algorithm. Algorithms 2021, 14, 222. [Google Scholar] [CrossRef]
  9. Nash, C.J. On Best Practice Optimization Methods in R. J. Stat. Softw. 2014, 60, 1–14. [Google Scholar] [CrossRef]
  10. Lagarias, J.; Reeds, J.; Wright, M.; Wright, P. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optimiz. 1998, 9, 112–147. [Google Scholar] [CrossRef]
  11. Lagarias, J.; Poonen, B.; Wright, M. Convergence of the restricted Nelder-Mead algorithm in two dimensions. SIAM J. Optimiz. 2012, 22, 501–532. [Google Scholar] [CrossRef]
  12. Wright, M. Nelder, Mead, and the other simplex method. Extra Volume: Optimization Stories. Doc. Math. 2012, 271–276. [Google Scholar]
  13. McKinnon, K. Convergence of the Nelder-Mead simplex method to a nonstationary point. SIAM J. Optimiz. 1998, 9, 148–158. [Google Scholar] [CrossRef]
  14. Kelley, C. Detection and remediation of stagnation in the Nelder-Mead algorithm using an sufficient decrease condition. SIAM J. Optimiz. 1999, 10, 43–55. [Google Scholar] [CrossRef]
  15. Han, L.; Neumann, M. Effect of dimensionality on the Nelder-Mead simplex method. Optim. Method. Softw. 2006, 21, 1–16. [Google Scholar] [CrossRef]
  16. Galántai, A. Convergence of the Nelder-Mead method. Numer. Algorithms 2022, 90, 1043–1072. [Google Scholar] [CrossRef]
  17. Galántai, A. Convergence theorems for the Nelder-Mead method. J. Comput. Appl. Mech. 2020, 15, 115–133. [Google Scholar] [CrossRef]
  18. Galántai, A. A convergence analysis of the Nelder-Mead simplex method. Acta Polytech. Hung. 2021, 18, 93–105. [Google Scholar] [CrossRef]
  19. Hartfiel, D. Nonhomogeneous Matrix Products; World Scientific: Singapore, 2002. [Google Scholar] [CrossRef]
  20. Chandra, T. The Borel—Cantelli Lemma; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
  21. Borovkov, A. Probability Theory; Springer: Berlin/Heidelberg, Germany, 2013. [Google Scholar] [CrossRef]
  22. Kushner, H.; Clark, D. Stochastic Approximation Methods for Constrained and Unconstrained Systems; Springer: New York, NY, USA, 1978. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Galántai, A. A Stochastic Convergence Result for the Nelder–Mead Simplex Method. Mathematics 2023, 11, 1998. https://doi.org/10.3390/math11091998

AMA Style

Galántai A. A Stochastic Convergence Result for the Nelder–Mead Simplex Method. Mathematics. 2023; 11(9):1998. https://doi.org/10.3390/math11091998

Chicago/Turabian Style

Galántai, Aurél. 2023. "A Stochastic Convergence Result for the Nelder–Mead Simplex Method" Mathematics 11, no. 9: 1998. https://doi.org/10.3390/math11091998

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop