The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions

Galántai, Aurél

doi:10.3390/a17110523

Open AccessArticle

The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions

by

Aurél Galántai

Software Engineering Institute, John von Neumann Faculty of Informatics, Óbuda University, 1034 Budapest, Hungary

Algorithms 2024, 17(11), 523; https://doi.org/10.3390/a17110523

Submission received: 7 October 2024 / Revised: 10 November 2024 / Accepted: 13 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue Numerical Optimization and Algorithms: 2nd Edition)

Download Versions Notes

Abstract

We investigate and compare two versions of the Nelder–Mead simplex algorithm for function minimization. Two types of convergence are studied: the convergence of function values at the simplex vertices and convergence of the simplex sequence. For the first type of convergence, we generalize the main result of Lagarias, Reeds, Wright and Wright (1998). For the second type of convergence, we also improve recent results which indicate that the Lagarias et al.’s version of the Nelder–Mead algorithm has better convergence properties than the original Nelder–Mead method. This paper concludes with some open questions.

Keywords:

Nelder–Mead simplex methods; convergence; comparison

MSC:

65K10; 90C56

1. Introduction

The original Nelder–Mead (NM) simplex algorithm was constructed in 1965 by Nelder and Mead [1] as an improved version of the simplex method of Spendley, Hext and Himsworth [2]. During the almost six decades that have passed, the Nelder–Mead algorithm has became a truly popular and widely used method for the minimization problem

f (x) \to min (f : R^{n} \to R)

in derivative-free optimization (see, e.g., [3,4,5,6,7,8,9,10,11]).

In spite of its widespread use, only a few convergence results are known. They are of essentially two types. The first type of results are related to the convergence of function values at the vertices of the simplex sequence. They say nothing about the vertices. The most famous results of this type are due to Lagarias, Reeds, Wright and Wright [12]. The other type of results are related to the convergence of simplex vertices or simplices. The first result of this type is due to Kelley [13], who proved that under a certain (Armijo-type) descent condition, the accumulation points of the simplex sequence are critical points of f. Recent results on the convergence of the simplex sequence to a common limit point are given in papers [14,15,16].

Wright [17,18] raised several questions concerning the Nelder–Mead method:

(a): Do the function values at all vertices necessarily converge to the same value?
(b): Do all vertices of the simplices converge to the same point?
(c): Why is it sometimes so effective (compared to other direct search methods) in obtaining a rapid improvement in f?
(d): One failure mode is known (McKinnon [19])—but are there other failure modes?
(e): Why, despite its apparent simplicity, should the Nelder–Mead method be difficult to analyze mathematically?

In fact, there are several recent examples that indicate different types of convergence behavior:

The function values at the simplex vertices may converge to a common limit value, while the function f has no finite minimum and the simplex sequence is unbounded (Example 5 of [14]; Examples 1 and 2 of [15]; Example 4 of [20]).
The simplex vertices may converge to a common limit point, but it is not a stationary point of f (McKinnon [19] and Example 3 of [15]).
The simplex sequence may converge to a limit simplex of a positive diameter, resulting in different limit values of f at the vertices at the limit simplex (Theorem 1 of [21]; Example 4 of [14]; Examples 4 and 5 of [15]; Examples 1, 2 and 3 of [20]).
The function values at the simplex vertices may converge to a common value, while the simplex sequence converges to a limit simplex with a positive diameter (Example of 5 [20]).

These examples are answering questions (a), (b) and (d) of Wright negatively.

There are several variants of the original Nelder–Mead method [1]. Here, we study two main versions of the Nelder–Mead algorithm: the ’original’ (unordered) method of Nelder and Mead [1] (Algorithm 1) and the ’ordered’ version of Lagarias, Reeds, Wright and Wright [12] (Algorithm 2). In Section 2, we introduce both algorithms, while Section 3 contains their matrix representations. In Section 4, we study the connection and spectral properties of these matrices. In Section 5, we prove convergence of the function values at the simplex vertices for both algorithms. These results are more general than those of Lagarias et al. [12], while their proof is quite simple.

In Section 6, we study the convergence of the generated simplex sequence. Here, we give necessary and sufficient conditions for the convergence of the simplex sequence (for both algorithms). This relates the convergence to the convergence of an infinite matrix product. We pay particular attention to the case when the simplex sequence converges to a common limit point. The results show why the convergence problem (questions (c) and (e) of Wright) is difficult. This paper closes with computational results, conclusions and open questions.

2. Two Versions of the Nelder–Mead Method

We first define the original Nelder–Mead simplex method in the following form (see [1,6,22]).

The simplex of iteration k is denoted by

S^{(k)} = [x_{1}^{(k)}, \dots, x_{n + 1}^{(k)}] \in R^{n \times (n + 1)}

, where

x_{i}^{(k)}

denotes vertex i of iteration k. The function values at the vertices are denoted by

f_{i}^{(k)} = f (x_{i}^{(k)})

(

i = 1, \dots, n + 1

). We define the index of the worst vertex (

h_{k}

), the best vertex (

ℓ_{k}

) and the second worst vertex (

m_{k}

) as follows:

h_{k} : f_{h_{k}}^{(k)} = max_{1 \leq i \leq n + 1} f (x_{i}^{(k)}), ℓ_{k} : f_{ℓ_{k}}^{(k)} = min_{1 \leq i \leq n + 1} f (x_{i}^{(k)}),

m_{k} : f_{m_{k}}^{(k)} = max_{1 \leq i \leq n + 1, i \neq h_{k}} f (x_{i}^{(k)}) .

Note that

1 \leq ℓ_{k}, m_{k}, h_{k} \leq n + 1

, and they are not necessarily unique. Define the center

x_{c}^{(k)}

and straight line

x^{(k)} (α)

as follows:

x_{c}^{(k)} = \frac{1}{n} \sum_{i = 1, i \neq h_{k}}^{n + 1} x_{i}^{(k)}, x^{(k)} (α) = (1 + α) x_{c}^{(k)} - α x_{h_{k}}^{(k)},

The reflection, expansion, outside and inside contraction points of simplex

S^{(k)}

are defined as

x_{r}^{(k)} = x^{(k)} (1), x_{e}^{(k)} = x^{(k)} (2), x_{o c}^{(k)} = x^{(k)} (\frac{1}{2}), x_{i c}^{(k)} = x^{(k)} (- \frac{1}{2}) .

The corresponding function values are denoted by

f_{r}^{(k)} = f (x_{r}^{(k)})

,

f_{e}^{(k)} = f (x_{e}^{(k)})

,

f_{o c}^{(k)} = f (x_{o c}^{(k)})

and

f_{i c}^{(k)} = f (x_{i c}^{(k)})

, respectively.

The original Nelder–Mead simplex method is given by Algorithm 1.

Algorithm 1: Original Nelder–Mead algorithm.

The “decision boundaries” are the same as in the ordered Nelder–Mead method of Lagarias et al. [12], which is given in Algorithm 2. Suppose that

f (x_{1}^{(0)}) \leq f (x_{2}^{(0)}) \leq \dots \leq f (x_{n + 1}^{(0)})

(1)

holds for the initial simplex vertices and that this ordering is kept during the subsequent iterations. Here,

ℓ_{k} = 1

,

m_{k} = n

and

h_{k} = n + 1

for all k.

There are two rules of reindexing after each iteration. For a nonshrink step,

x_{n + 1}^{(k)}

is replaced by a new point

v \in \{x_{r}^{(k)}, x_{e}^{(k)}, x_{o c}^{(k)}, x_{i c}^{(k)}\}

. The following cases are possible:

f (v) < f_{1}^{(k)}

,

f_{1}^{(k)} \leq f (v) < f_{n}^{(k)}

and

f_{n}^{(k)} \leq f (v) < f_{n + 1}^{(k)}

. Define

j^{(k)} = \{\begin{matrix} 1, & if f (v) < f_{1}^{(k)} \\ \{j : f_{j - 1}^{(k)} \leq f (v) < f_{j}^{(k)}, 2 \leq j \leq n + 1\}, & otherwise \end{matrix} .

(2)

Then, the new simplex vertices are

x_{i}^{(k + 1)} = \{\begin{matrix} x_{i}^{(k)} & (1 \leq i \leq j^{(k)} - 1), \\ v & (i = j^{(k)}), \\ x_{i - 1}^{(k)} & (i = j^{(k)} + 1, \dots, n + 1) . \end{matrix}

(3)

This ‘insertion rule’ puts v into the ordering with the highest possible index, guaranteeing that

f_{i}^{(k + 1)} \leq f_{i}^{(k)}

(

i = 1, \dots, n + 1

) and

f_{1}^{(k + 1)} \leq f_{2}^{(k + 1)} \leq \dots \leq f_{n + 1}^{(k + 1)} (k \geq 0) .

(4)

If shrinking occurs, then the new vertices must be ordered such that

x_{1}^{(k + 1)} = x_{1}^{(k)}

if

f (z_{1}) \leq f (z_{i})

(

i = 2, \dots, n + 1

).

For a given

S^{(0)}

and f, the sequence

\{S^{(k)}\}

is uniquely defined by Algorithm 2. Clearly, this is not true for Algorithm 1.

Algorithm 2: Ordered Nelder–Mead algorithm.

3. Matrix Representations for the Two Algorithms

The steps of both algorithms can be easily described by transformation matrices. We first consider Algorithm 1. If the incoming vertex

v = x^{(k)} (α_{k})

(

α_{k} \in \{\pm \frac{1}{2}, 1, 2\}

) belongs to

\{x_{r}^{(k)}, x_{e}^{(k)}, x_{o c}^{(k)}, x_{i c}^{(k)}\}

, then

S^{(k + 1)} = [x_{1}^{(k)}, \dots, x_{h_{k} - 1}^{(k)}, v, x_{h_{k} + 1}^{(k)}, \dots, x_{n + 1}^{(k)}],

where v replaces vertex

x_{h_{k}}^{(k)}

. For

j = 1, \dots, n + 1

, define the vector

c_{j} (α) = {[\frac{1 + α}{n}, \dots, \frac{1 + α}{n}, - α, \frac{1 + α}{n}, \dots, \frac{1 + α}{n}]}^{T} \in R^{n + 1}

(5)

with

- α

at the position j and the matrix

T_{j} (α) = I + (c_{j} (α) - e_{j}) e_{j}^{T},

(6)

where

e_{j}

is the j-th unit vector. Then,

v = S^{(k)} c_{h_{k}} (α_{k})

and

S^{(k + 1)} = S^{(k)} T_{h_{k}} (α_{k}) (α_{k} \in \{\pm \frac{1}{2}, 1, 2\}) .

(7)

The shrinking iteration can be written as

S^{(k + 1)} = S^{(k)} T_{ℓ_{k}}^{s h r} (T_{ℓ_{k}}^{s h r} = \frac{1}{2} I + \frac{1}{2} e_{ℓ_{k}} e^{T}),

(8)

where

e = {[1, 1, \dots, 1]}^{T}

. It follows that

S^{(k + 1)} = S^{(k)} T^{(k)} = S^{(0)} \prod_{i = 0}^{k} T^{(i)} = S^{(0)} B_{k} (T^{(i)} \in \tilde{T}),

(9)

where

\tilde{T} = \{T_{ℓ} (α) : 1 \leq ℓ \leq n + 1, α = \pm \frac{1}{2}, 1, 2\} \cup \{T_{j}^{s h r} : 1 \leq j \leq n + 1\} .

(10)

The set

\tilde{T}

has exactly

5 (n + 1)

nonsingular matrices. Note that

e^{T} T_{j} (α) = e^{T}

and

e^{T} T_{ℓ_{k}}^{s h r} = e^{T}

.

For Algorithm 2, we have the following (see, e.g., [14,15]). Assume that simplex

S^{(k)}

satisfies condition (4). Then, for nonshrinking iterations, the new simplex is given by

S^{(k + 1)} = S^{(k)} T (α) P_{j^{(k)}} (α \in \{\pm \frac{1}{2}, 1, 2\}),

(11)

where

T (α) = [\begin{matrix} I_{n} & \frac{1 + α}{n} e \\ 0 & - α \end{matrix}] \in R^{(n + 1) \times (n + 1)}

(12)

and

P_{j} = [e_{1}, \dots, e_{j - 1}, e_{n + 1}, e_{j}, \dots, e_{n}] \in R^{(n + 1) \times (n + 1)} (j = 1, \dots, n + 1)

(13)

is a permutation matrix. If shrinking occurs, the new simplex is

S^{(k + 1)} = S^{(k)} T_{s h r} P (T_{s h r} = \frac{1}{2} I_{n + 1} + \frac{1}{2} e_{1} e^{T}),

where the permutation matrix

P \in P_{n + 1}

is defined by the ordering condition (4) and

P_{n + 1}

is the set of all possible permutation matrices of order

n + 1

.

Define the sets

T_{1} = \{T (α) P_{j} : α \in \{- \frac{1}{2}, \frac{1}{2}\}, j = 1, \dots, n + 1\},

T_{2} = \{T (1) P_{j} : j = 1, \dots, n\} \cup \{T (2) P_{1}\}, T_{3} = \{T_{s h r} P : P \in P_{n + 1}\}

and

T = T_{1} \cup T_{2} \cup T_{3} .

(14)

It follows that

S^{(k + 1)} = S^{(k)} T_{k} P^{(k)} = S^{(0)} \prod_{i = 0}^{k} T_{i} P^{(i)} = S^{(0)} B_{k} (T_{i} P^{(i)} \in T, k \geq 1),

(15)

where

B_{k} = \prod_{i = 0}^{k} T_{i} P^{(i)} (T_{i} P^{(i)} \in T) .

(16)

The set

T

consists of

3 n + 3 + (n + 1)!

nonsingular matrices. For any

T_{i} P^{(i)} \in T

,

e^{T} T_{i} P^{(i)} = e^{T}

.

Observe that both algorithms are nonstationary iterations. Note that the number of shrinking matrices for Algorithm 2 increased to

(n + 1)!

compared to the

n + 1

shrinking matrices of Algorithm 1.

4. Connections and Properties

Here, we investigate the connection between the transformation matrices of both algorithms, their consequences and the spectral properties of the matrices.

The corresponding transformation matrices of Algorithms 1 and 2 are connected by orthogonal similarities. For any

1 \leq s \leq n + 1

,

P^{T} T_{s} (α) P = T_{n + 1} (α) = T (α)

(17)

with

P = [e_{1}, \dots, e_{s - 1}, e_{s + 1}, \dots, e_{n + 1}, e_{s}]

and

Q^{T} T_{s}^{s h r} Q = \frac{1}{2} I + \frac{1}{2} e_{1} e^{T} = T_{s h r}

(18)

with

Q = [e_{s}, e_{1}, \dots, e_{s - 1}, e_{s + 1}, \dots, e_{n + 1}]

.

With

{\tilde{S}}^{(k)} = [{\tilde{x}}_{1}^{(k)}, \dots, {\tilde{x}}_{n + 1}^{(k)}]

and

S^{(k)} = [x_{1}^{(k)}, \dots, x_{n + 1}^{(k)}]

, denote the kth simplices of Algorithms 1 and 2, respectively. If a permutation matrix

Q_{k} = [e_{r_{1}}, \dots, e_{r_{n + 1}}] \in R^{(n + 1) \times (n + 1)}

exists such that

{\tilde{S}}^{(k)} Q_{k} = S^{(k)}

,

x_{1}^{(k)} = {\tilde{x}}_{ℓ_{k}}^{(k)}

and

x_{n + 1}^{(k)} = {\tilde{x}}_{h_{k}}^{(k)}

, then we can ask if

{\tilde{S}}^{(k + 1)}

and

S^{(k + 1)}

define the same simplex. The assumption implies that

e_{r_{1}} = e_{ℓ_{k}}

and

e_{r_{n + 1}} = e_{h_{k}}

. Hence,

{\tilde{S}}^{(k)} = S^{(k)} Q_{k}^{T}

and

{\tilde{S}}^{(k + 1)} = {\tilde{S}}^{(k)} T_{h_{k}} (α_{k}) = S^{(k)} (Q_{k}^{T} + (Q_{k}^{T} c_{h_{k}} (α_{k}) - Q_{k}^{T} e_{h_{k}}) e_{h_{k}}^{T}) .

Since

e_{h_{k}} = Q_{k} e_{n + 1}

,

Q_{k}^{T} e_{h_{k}} = e_{n + 1}

and

Q_{k}^{T} c_{h_{k}} (α_{k}) = {[e_{r_{1}}, \dots, e_{r_{n + 1}}]}^{T} c_{h_{k}} (α_{k}) = c_{n + 1} (α_{k})

, we have

\begin{matrix} {\tilde{S}}^{(k + 1)} & = S^{(k)} (I + (c_{n + 1} (α_{k}) - e_{n + 1}) e_{n + 1}^{T}) Q_{k}^{T} = S^{(k)} T (α_{k}) Q_{k}^{T} \\ = S^{(k + 1)} P_{j^{(k)}}^{T} Q_{k}^{T} . \end{matrix}

In the case of shrinking,

{\tilde{S}}^{(k + 1)} = {\tilde{S}}^{(k)} T_{ℓ_{k}}^{s h r} = S^{(k)} (\frac{1}{2} Q_{k}^{T} + \frac{1}{2} Q_{k}^{T} e_{ℓ_{k}} e^{T}) .

Note that

Q_{k}^{T} e_{ℓ_{k}} = Q_{k}^{T} Q_{k} e_{1} = e_{1}

. Hence,

{\tilde{S}}^{(k + 1)} = S^{(k)} (\frac{1}{2} Q_{k}^{T} + \frac{1}{2} e_{1} e^{T}) = S^{(k)} (\frac{1}{2} I + \frac{1}{2} e_{1} e^{T} Q_{k}) Q_{k}^{T} .

Since

e^{T} Q_{k} = e^{T}

, we have that

{\tilde{S}}^{(k + 1)} = S^{(k)} T_{s h r} Q_{k}^{T} = S^{(k + 1)} {[P^{(k)}]}^{T} Q_{k}^{T} .

Hence, the simplices of iteration

k + 1

are the same for the ordering of vertices. Since the continuation of the original Nelder–Mead method is not uniquely determined, unlike in the case of Algorithm 2, we can expect similar although not identical results for both algorithms.

The following results are obvious. For any

1 \leq j \leq n + 1

,

T_{j} (α) T_{j} (β) = T_{j} (- α β) .

Lemma 1.

The eigenvalues of

T_{j} (α)

are

λ_{i} (T_{j} (α)) = 1

(

i = 1, \dots, n

) and

λ_{n + 1} (T_{j} (α)) = - α

(

j = 1, \dots, n + 1

).

T_{j}^{s h r}

has the eigenvalues

λ_{1} (T_{j}^{s h r}) = 1

and

λ_{i} (T_{j}^{s h r}) = \frac{1}{2}

(

i = 2, \dots, n + 1

) for

j = 1, \dots, n + 1

. Both

T_{j} (α)

and

T_{j}^{s h r}

have a diagonal Jordan form.

Corollary 1.

Sequences

{\{T_{j} {(α)}^{k}\}}_{k = 1}^{\infty}

and

{\{{(T_{j}^{s h r})}^{k}\}}_{k = 1}^{\infty}

are uniformly bounded for

|α| \leq 1

and convergent for

- 1 \leq α < 1

(see, e.g., [23,24,25]).

Note that

{∥T_{j} (α)∥}_{2} = {∥T (α)∥}_{2} = {∥T (α) P_{j}∥}_{2} > 1

and

{∥T_{j}^{s h r}∥}_{2} = {∥T_{s h r}∥}_{2} = {∥T_{s h r} P∥}_{2} > 1

, where

{∥T (α)∥}_{2} = {(s_{n} + \sqrt{s_{n}^{2} - α^{2}})}^{\frac{1}{2}} (s_{n} = \frac{1 + α^{2}}{2} + \frac{{(1 + α)}^{2}}{2 n})

(19)

and

{∥T_{s h r}∥}_{2} = {(\frac{n + 5}{8} + \sqrt{\frac{{(n + 5)}^{2}}{64} - \frac{1}{4}})}^{\frac{1}{2}} .

(20)

The first expression follows from a result on the singular values of companion matrices (see, e.g., Barnett [26] or Kittaneh [27]), while the second follows from a result of Montano, Salas and Soto [28].

Matrices

T_{s h r}

,

T_{s h r} P

,

T_{j}^{s h r}

,

T_{j} (α)

,

T (α) P_{j}

(

- 1 < α < 0

) are left-stochastic. Hence,

{∥T_{s h r}∥}_{1} = {∥T_{s h r} P∥}_{1} = {∥T_{j}^{s h r}∥}_{1} = {∥T_{j} (α)∥}_{1} = {∥T (α) P_{j}∥}_{1} = 1 (- 1 < α < 0) .

We note that

T_{j} (1)

(

1 \leq j \leq n + 1

) is an involution. If

T (1)

is multiplied by a permutation matrix

P_{j}

(

1 \leq j \leq n

), this property is lost. However, for

n = 2

,

T (1) P_{2}

is a 6-involutory matrix, which is exploited in proving Theorem 3 (for k-involutory matrices, see Trench [29]).

The multiplication of

T (α)

or

T_{s h r}

by a permutation matrix

P_{j}

or P makes a significant change in the eigenstructure. The following results hold (see Theorem 3 and Corollary 1 of Galántai [15]).

Theorem 1.

(i): $T (α) P_{1}$ has at least one eigenvalue $λ = 1$ .
(ii): $T (1) P_{j}$ ( $1 \leq j \leq n + 1$ ) has at least two eigenvalues $λ = 1$ .
(iii): If $α > - 1$ and $j \geq 2$ , then $T (α) P_{j}$ has exactly $j - 1$ eigenvalues $λ = 1$ .
(iv): For $|α| < 1$ , $T (α) P_{1}$ has exactly one eigenvalue $λ_{1} = 1$ , while the remaining n eigenvalues are in the open unit disk.
(v): If $|α| < 1$ and $2 \leq j \leq n + 1$ , then $T (α) P_{j}$ has exactly $j - 1$ eigenvalues $λ = 1$ , while $n + 2 - j$ eigenvalues are in the open unit disk.
(vi): If $α = 1$ and $1 \leq j \leq n + 1$ , then all the eigenvalues of $T (1) P_{j}$ are on the unit circle.
(vii): If $α > 1$ and $j = 1$ , then $T (α) P_{1}$ has a second eigenvalue in the interval $(1, 1 + \frac{1 + α}{n})$ and all its eigenvalues are in the annulus $1 \leq |λ| \leq 1 + α$ .

Corollary 2.

The eigenvalues of

T_{s h r} P

(

P \in P_{n + 1}

) are

λ_{1} (T_{s h r} P) = 1

and

|λ_{i} (T_{s h r} P)| = \frac{1}{2}

for

i = 2, \dots, n + 1

.

The change in the eigenstructure of

T (α) P_{j}

is the most striking in contrast to that of

T_{j} (α)

. It gives better convergence properties for Algorithm 2, as shown in Section 6 and Section 7.

5. Convergence of the Function Values at the Vertices

Lagarias et al. [12] studied the convergence of function values at the simplex vertices for Algorithm 2. We cite their two most important results, before proving more general results for both Algorithms.

Lemma 2

(Lemma 3.3 of [12]). If function f is bounded from below on

R^{n}

and only a finite number of shrink iterations occur, then each sequence

{\{f_{i}^{(k)}\}}_{k = 0}^{\infty}

(generated by Algorithm 2) converges to some limit

f_{i}^{*}

for

i = 0, 1, \dots, n + 1

and

f_{1}^{*} \leq f_{2}^{*} \leq \dots \leq f_{n + 1}^{*}

.

Lemma 2 guarantees the convergence under rather general conditions. In general, the limit values

{\{f_{i}^{*}\}}_{i = 1}^{n + 1}

can be different as shown by Examples 1 and 2 of [20] (also see [15,21]).

Theorem 2

(Theorem 5.1 of [12]). Assume that f is a strictly convex function on

R^{2}

with bounded level sets. Assume that

S^{(0)}

is non-degenerate. Then, the Nelder–Mead simplex method (Algorithm 2) converges in the sense that

f_{1}^{*} = f_{2}^{*} = f_{3}^{*} = f^{*}

.

Theorem 2 is generally considered as the main result of [12]. In addition, it can be proved (see [20]) that if f is strictly convex on

R^{2}

with bounded level sets,

S^{(0)}

is non-degenerate (affine invariant); thus, any accumulation point

S^{'}

of the simplex sequence

\{S^{(k)}\}

(generated by Algorithm 2) has the form

S^{'} = [x^{'}, x^{'}, x^{'}]

.

We need the following concept and results.

Definition 1.

Let f be defined on a convex set

S \subset R^{n}

. The function f is said to be quasiconvex on S if

f (λ x_{1} + (1 - λ) x_{2}) \leq max \{f (x_{1}), f (x_{2})\}

for every

x_{1}, x_{2} \in S

and for every

λ \in [0, 1]

.

Convexity implies quasiconvexity.

Lemma 3

(Lemma 6 of [20]). If f is quasiconvex and bounded below on

R^{n}

, then each sequence

{\{f_{i}^{(k)}\}}_{k = 0}^{\infty}

is monotone decreasing and converges to some limit

f_{i}^{*}

for

i = 0, 1, \dots, n + 1

. Furthermore,

f_{1}^{*} \leq f_{2}^{*} \leq \dots \leq f_{n + 1}^{*}

.

Corollary 3

(Corollary 2 of [20]). If f is convex on

R^{n}

and an index

1 \leq j \leq n

exists such that

f_{j}^{(k)} < f_{n + 1}^{(k)}

, then

f_{i c}^{(k)} < f_{n + 1}^{(k)}

and no shrinking occurs in iteration k.

Lemma 4

(Lemma 10 of [20]). Assume that f is convex and bounded below on

R^{n}

. If for some integer ℓ (

1 \leq ℓ \leq n

),

f_{ℓ}^{*} < f_{ℓ + 1}^{*},

(21)

then there is an index

K > 0

such that for all

k \geq K

,

j^{(k)} > ℓ

, that is, the first ℓ vertices do not change.

Lemma 5

(Lemma 11 of [20]). If f is convex and bounded below on

R^{n}

, then

f_{n}^{*} = f_{n + 1}^{*}

.

The following generalization of Theorem 2 of Lagarias et al. was proved in [20]. Here, we present it with a new short proof.

Theorem 3.

Assume that f is a convex function on

R^{2}

and is bounded below. Then, the Nelder–Mead simplex method (Algorithm 2) converges in the sense that

f_{1}^{*} = f_{2}^{*} = f_{3}^{*}

.

Proof.

It follows (Lemmas 3 and 5) that

f_{1}^{*} \leq f_{2}^{*} = f_{3}^{*}

. Assume that

f_{1}^{*} < f_{2}^{*}

. It follows from Lemma 4 that there exists an index

K > 0

such that for

k \geq K

,

x_{1}^{(k)} = x_{1}^{(K)}

. Hence, only

x_{2}^{(k)}

and

x_{3}^{(k)}

may change. The insertion rule and the impossibility of shrinking (Corollary 3) imply that only the following cases are possible:

(i): $f_{1}^{(k)} \leq f_{r}^{(k)} < f_{2}^{(k)}$ ;
(ii): $f_{2}^{(k)} \leq f_{r}^{(k)} < f_{3}^{(k)}$ and $f_{1}^{(k)} \leq f_{o c}^{(k)} \leq f_{r}^{(k)}$ ;
(iii): $f_{3}^{(k)} \leq f_{r}^{(k)}$ and $f_{1}^{(k)} \leq f_{i c}^{(k)} < f_{3}^{(k)}$ .

We assume that

K > 0

is big enough, so that for

k \geq K

,

f_{1}^{*} \leq f_{1}^{(k)} < f_{1}^{*} + ε

,

f_{2}^{*} \leq f_{i}^{(k)} < f_{2}^{*} + ε

(

i = 2, 3

), where

ε > 0

is such that

f_{1}^{*} + 4 ε < f_{2}^{*}

. Depending on the selected case,

f_{r}^{(k)} \geq f_{2}^{*}

,

f_{o c}^{(k)} \geq f_{2}^{*}

or

f_{i c}^{(k)} \geq f_{2}^{*}

must hold. In case (i),

x_{r}^{(k)}

replaces

x_{2}^{(k)}

and

S^{(k + 1)} = S^{(k)} T (1) P_{2}

. In case (ii), since

x_{o c}^{(k)} = \frac{1}{2} (x_{c}^{(k)} + x_{r}^{(k)})

, we have

f_{o c}^{(k)} \leq \frac{1}{4} f_{1}^{(k)} + \frac{1}{4} f_{2}^{(k)} + \frac{1}{2} f_{r}^{(k)} \leq \frac{1}{4} (f_{1}^{*} + ε) + \frac{3}{4} (f_{2}^{*} + ε) < f_{2}^{*} .

Hence, case (ii) cannot occur. Similarly, in case (iii),

f_{i c}^{(k)} \leq \frac{1}{4} f_{1}^{(k)} + \frac{1}{4} f_{2}^{(k)} + \frac{1}{2} f_{3}^{(k)} \leq \frac{1}{4} (f_{1}^{*} + ε) + \frac{3}{4} (f_{2}^{*} + ε) < f_{2}^{*}

showing that case (iii) cannot occur. Hence, we can only have the simplex sequence

S^{(k + K)} = S^{(K)} {[T (1) P_{2}]}^{k}, k \geq 0 .

Since

T (1) P_{2}

is a 6-involutory matrix (

{[T (1) P_{2}]}^{6} = I_{3}

),

\{S^{(k + K)}\}

is a periodic sequence and

S^{(6 + K)} = S^{(K)}

, which is impossible by the insertion rule. This is a contradiction; thus,

f_{1}^{*} = f_{2}^{*} = f_{3}^{*}

. □

For Algorithm 1, we prove the following results. First, observe the following facts. For nonshrink steps, we have

(i): $x_{i}^{(k + 1)} = x_{i}^{(k)}$ for $i \neq h_{k}$ , $x_{h_{k}}^{(k)} = v$ ;
(ii): $f_{i}^{(k + 1)} = f_{i}^{(k)}$ ( $i \neq h_{k}$ ) and $f_{h_{k}}^{(k + 1)} < f_{h_{k}}^{(k)}$ ;
(iii): $f_{ℓ_{k}}^{(k)} = {min}_{i} f_{i}^{(k)} \leq f_{i}^{(k)} \leq {max}_{i} f_{i}^{(k)} = f_{h_{k}}^{(k)}$ ;
(iv): $f_{ℓ_{k + 1}}^{(k + 1)} \leq f_{ℓ_{k}}^{(k)}$ and $f_{h_{k + 1}}^{(k + 1)} \leq f_{h_{k}}^{(k)}$ .

Hence,

f_{i}^{(k + 1)} \leq f_{i}^{(k)}

(

i = 1, \dots, n + 1

).

Lemma 6

(Algorithm 1). If

f : R^{n} \to R

is bounded below and only a finite number of shrink operations occur, then there exist finite limit values

f_{i}^{*}

such that

f_{i}^{(k)} \to f_{i}^{*}

(

i = 1, 2, \dots, n + 1

).

The only difference to Lemma 2 is that the function values

f_{i}^{(k)}

and their limits are not ordered.

Lemma 7

(Algorithm 1). If

f : R^{n} \to R

is quasiconvex on

R^{n}

and bounded below, then there exist finite limit values

f_{i}^{*}

such that

f_{i}^{(k)} \to f_{i}^{*}

(

i = 1, 2, \dots, n + 1

).

Proof.

If f is quasiconvex, then in the case of shrinking, the inequality

f_{i}^{(k + 1)} = f (\frac{1}{2} (x_{i}^{(k)} + x_{ℓ_{k}}^{(k)})) \leq max \{f_{i}^{(k)}, f_{ℓ_{k}}^{(k)}\} = f_{i}^{(k)} (i = 1, \dots, n + 1)

also holds. So does Lemma 6. □

Lemma 8

(Algorithm 1). If

f : R^{n} \to R

is strictly convex, then no shrinking occurs.

Proof.

Shrinking happens if

f_{m_{k}}^{(k)} \leq f_{r}^{(k)} < f_{h_{k}}^{(k)} \land f_{o c}^{(k)} > f_{r}^{(k)}

or

f_{i c}^{(k)} \geq f_{h_{k}}^{(k)} \land f_{r}^{(k)} \geq f_{h_{k}}^{(k)}

holds. In general,

f_{c}^{(k)} < \frac{1}{n} \sum_{i = 1, i \neq h_{k}}^{n + 1} f_{i}^{(k)} \leq f_{m_{k}}^{(k)}

. If

f_{m_{k}}^{(k)} \leq f_{r}^{(k)} < f_{h_{k}}^{(k)}

, then

f_{o c}^{(k)} = f (\frac{1}{2} (x_{r}^{(k)} + x_{c}^{(k)})) < \frac{1}{2} (f_{r}^{(k)} + f_{m_{k}}^{(k)}) \leq f_{r}^{(k)}

. If

f_{r}^{(k)} \geq f_{h_{k}}^{(k)}

, then

f_{i c}^{(k)} = f (\frac{1}{2} (x_{c}^{(k)} + x_{h_{k}}^{(k)})) < \frac{1}{2} (f_{c}^{(k)} + f_{h_{k}}^{(k)}) \leq f_{h_{k}}^{(k)}

. □

Remark 1

(Algorithm 1). Assume that

f : R^{n} \to R

is convex. Then,

f_{c}^{(k)} \leq f_{m_{k}}^{(k)}

. If

f_{m_{k}}^{(k)} \leq f_{r}^{(k)} < f_{h_{k}}^{(k)}

, then

f_{o c}^{(k)} \leq f_{r}^{(k)}

. If

f_{r}^{(k)} \geq f_{h_{k}}^{(k)}

, then

f_{i c}^{(k)} \leq f_{h_{k}}^{(k)}

. However, if at least for one

j \neq h_{k}

,

f_{j}^{(k)} < f_{h_{k}}^{(k)}

holds, then

f_{c}^{(k)} \leq \frac{1}{n} \sum_{i = 1, i \neq h_{k}}^{n + 1} f_{i}^{(k)} < f_{h_{k}}^{(k)}

and

f_{i c}^{(k)} < f_{h_{k}}^{(k)}

. In such a case, there is no shrinking.

Assume that

f : R^{n} \to R

is bounded below and only a finite number of shrink operations occur. Then, we have finite numbers

f_{i}^{*}

such that

f_{i}^{(k)} \to f_{i}^{*}

as

k \to \infty

, and each

\{f_{i}^{(k)}\}

is monotone decreasing (

i = 1, \dots, n + 1

). The values

f_{i}^{*}

are unordered. However, there is permutation

\{i_{1}, i_{2}, \dots, i_{n + 1}\}

of the indices

\{1, 2, \dots, n + 1\}

such that

f_{i_{1}}^{*} \leq f_{i_{2}}^{*} \leq \dots \leq f_{i_{n}}^{*} \leq f_{i_{n + 1}}^{*} .

(22)

Lemma 9

(Algorithm 1). If f is convex and bounded below on

R^{n}

, then

f_{i_{n}}^{*} = f_{i_{n + 1}}^{*}

.

Proof.

Assume that

f_{i_{n}}^{*} < f_{i_{n + 1}}^{*}

. Lemma 6 and Remark 1 imply that for

0 < ε < f_{i_{n + 1}}^{*} - f_{i_{n}}^{*}

, there exists an index

K > 0

such that for

k \geq K

,

f_{i_{j}}^{*} \leq f_{i_{j}}^{(k)} \leq f_{i_{j}}^{*} + ε (j = 1, \dots, n + 1) .

Note that

f_{i_{j}}^{*} \leq f_{i_{j}}^{(k)} \leq f_{i_{n}}^{*} + ε < f_{i_{n + 1}}^{*} \leq f_{i_{n + 1}}^{(k)}

(

j = 1, \dots, n

). Hence, there cannot be shrinking, and only the worst vertex

x_{i_{n + 1}}^{(k)}

can change for

k \geq K

(

f (v) \geq f_{i_{n + 1}}^{*}

). Clearly,

i_{n + 1} = h_{k}

for

k \geq K

. If

α = 1, 2

, then

f (x_{r}^{(k)}) < f_{m_{k}}^{(k)} \leq f_{i_{n}}^{*}

by definition. Hence, only the cases

α = \pm \frac{1}{2}

are possible. For simplicity, let

η = i_{n + 1}

. Then,

S^{(k + 1)} = S^{(k)} T_{η} (α) (α \in \{- \frac{1}{2}, \frac{1}{2}\})

and

S^{(K + j)} = S^{(K)} \prod_{i = 1}^{j} T_{η} (α_{i}) (α_{i} \in \{- \frac{1}{2}, \frac{1}{2}\}) .

Since

T_{j} (α) T_{j} (β) = T_{j} (- α β)

(

j = 1, \dots, n + 1

), it follows that

\prod_{i = 1}^{j} T_{η} (α_{i}) = T_{η} ({(- 1)}^{j + 1} \prod_{i = 1}^{j} α_{i}) \to T_{η} (0) = [\begin{matrix} I_{η - 1} & \frac{1}{n} e & 0 \\ 0 & 0 & 0 \\ 0 & \frac{1}{n} e & I_{n + 1 - η} \end{matrix}],

S^{(K + j)} \to S^{(K)} [\begin{matrix} I_{η - 1} & \frac{1}{n} e & 0 \\ 0 & 0 & 0 \\ 0 & \frac{1}{n} e & I_{n + 1 - η} \end{matrix}] = S^{*}

and

x_{i_{n + 1}}^{(k)} \to x_{i_{n + 1}}^{*} = \frac{1}{n} \sum_{j = 1, j \neq η}^{n + 1} x_{j}^{(K)} = \frac{1}{n} \sum_{j = 1}^{n} x_{i_{j}}^{(K)} .

Since f is convex, we have

f_{i_{n + 1}}^{*} = f (x_{i_{n + 1}}^{*}) = f (\frac{1}{n} \sum_{j = 1}^{n} x_{i_{j}}^{(K)}) \leq \frac{1}{n} \sum_{i = 1}^{n} f_{i_{j}}^{(K)} < f_{i_{n + 1}}^{*},

which is a contradiction. Hence,

f_{i_{n}}^{*} = f_{i_{n + 1}}^{*}

holds under convexity. □

Theorem 4.

If f is convex and bounded below on

R^{2}

, then Algorithm 1 converges in the sense that

f_{1}^{*} = f_{2}^{*} = f_{3}^{*}

.

Proof.

Lemmas 6 and 9 imply that

f_{i_{1}}^{*} \leq f_{i_{2}}^{*} = f_{i_{3}}^{*}

for some permutation

i_{1}, i_{2}, i_{3}

of

\{1, 2, 3\}

. Assume that

f_{i_{1}}^{*} < f_{i_{2}}^{*}

. Then, there exists an index K such that for

k \geq K

,

f_{i_{1}}^{*} \leq f_{i_{1}}^{(k)} < f_{i_{1}}^{*} + ε < f_{i_{2}}^{*} \leq f_{m_{k}}^{(k)} \leq f_{h_{k}}^{(k)} < f_{i_{2}}^{*} + ε

(

f_{i_{1}}^{*} + 4 ε < f_{i_{2}}^{*}

) holds with

m_{k}, h_{k} \in \{i_{2}, i_{3}\}

. Since

f_{i_{1}}^{(k)} < f_{i_{2}}^{*}

, only vertices

x_{i_{2}}^{(k)}

and

x_{i_{3}}^{(k)}

can change for

k \geq K

. Also, note that

f_{i_{1}}^{(k)} < f_{i_{2}}^{*}

implies that no shrinking may occur. It also follows that only

f_{r}^{(k)}, f_{o c}^{(k)}, f_{i c}^{(k)} \geq f_{i_{2}}^{*}

are possible. Consider now the following ’possible’ cases:

(i): $f_{i_{2}}^{*} \leq f_{r}^{(k)} < f_{m_{k}}^{(k)}$ ;
(ii): $f_{m_{k}}^{(k)} \leq f_{r}^{(k)} < f_{h_{k}}^{(k)}$ and $f_{o c}^{(k)} \leq f_{r}^{(k)}$ ;
(iii): $f_{h_{k}}^{(k)} \leq f_{r}^{(k)}$ and $f_{i c}^{(k)} < f_{h_{k}}^{(k)}$ .

In the first case,

x_{m_{k + 1}}^{(k + 1)} = x_{h_{k}}^{(k + 1)} = x_{r}^{(k)}

and

x_{h_{k + 1}}^{(k + 1)} = x_{m_{k}}^{(k)}

. Hence, the operation

S^{(k + 1)} = S^{(k)} T_{h_{k}} (1)

must be followed by

S^{(k + 2)} = S^{(k + 1)} T_{m_{k}} (α)

(

h_{k} \neq m_{k}

) for some

α \in \{- \frac{1}{2}, \frac{1}{2}, 1\}

. In the second case,

f_{o c}^{(k)} \leq \frac{1}{4} f_{i_{1}}^{(k)} + \frac{1}{4} f_{m_{k}}^{(k)} + \frac{1}{2} f_{r}^{(k)} \leq \frac{1}{4} (f_{i_{1}}^{*} + ε) + \frac{3}{4} (f_{i_{2}}^{*} + ε) < f_{2}^{*},

which makes this particular step impossible. Similarly, in the third case,

f_{i c}^{(k)} \leq \frac{1}{4} f_{i_{1}}^{(k)} + \frac{1}{4} f_{m_{k}}^{(k)} + \frac{1}{2} f_{h_{k}}^{(k)} \leq \frac{1}{4} (f_{i_{1}}^{*} + ε) + \frac{3}{4} (f_{i_{2}}^{*} + ε) < f_{2}^{*},

this step is impossible. Hence, the only possible step is

S^{(k + 1)} = S^{(k)} T_{h_{k}} (1)

with

m_{k + 1} = h_{k}

and

h_{k + 1} = m_{k}

. In fact, we have

S^{(k + 2)} = S^{(k)} T_{h_{k}} (1) T_{m_{k}} (1)

for a pair of

m_{k}

and

h_{k}

, and this is repeated. Consider the following possible cases:

(m_{k}, h_{k}) \in J = \{(2, 3), (3, 2), (1, 3), (3, 1), (1, 2), (2, 1)\} .

It is easy to check that for every

(h_{k}, m_{k}) \in J

,

{[T_{h_{k}} (1) T_{m_{k}} (1)]}^{6} = I .

Here, we have a periodicity,

S^{(k + 12)} = S^{(k)} {[T_{h_{k}} (1) T_{m_{k}} (1)]}^{6} = S^{(k)}

, which is impossible by the condition

f_{r}^{(k)} < f_{m_{k}}^{(k)} < f_{h_{k}}^{(k)}

. Hence,

f_{1}^{*} = f_{2}^{*} = f_{3}^{*}

must hold. □

Theorems 3 and 4 are generalizations of Theorem 2 of Lagarias et al. [12], since they require only convexity and boundedness from below. Moreover, their proofs are much simpler.

6. Convergence of the Simplex Sequences

Here, we study the convergence of the simplex sequence

\{S^{(k)}\}

generated by both Nelder–Mead methods. If

S^{(k)} \to S^{\infty} = [x_{1}^{*}, \dots, x_{n + 1}^{*}]

, then (provided that f is continuous)

f_{i}^{(k)} \to f (x_{i}^{*})

for

i = 1, \dots, n + 1

as

k \to \infty

. The limit vertices

x_{i}^{*}

can be different (see, e.g., Examples 1 and 2 of [20]).

Ideally,

S^{(k)}

should converge to some limit of the form

S^{\infty} = [x^{*}, \dots, x^{*}] = x^{*} e^{T}

, where

x^{*} \in R^{n}

is a stationary point of

f

. MacKinnon’s example [19] and Example 3 of [15] show that

x^{*}

is not always a stationary point.

If

B_{k}

converges to a rank one matrix of the form

B^{\infty} = w e^{T}

(

w \in R^{n + 1}

), then

S^{(k)} = S^{(0)} B_{k} \to S^{(0)} w e^{T} = \hat{x} e^{T}

and diameter

(S^{(k)}) \to 0

. There are several examples (see, e.g., [14,15,20,21]), where the simplex sequence

\{S^{(k)}\}

converges to a limit

S^{\infty}

with diameter

(S^{\infty}) > 0

and rank (

S^{\infty})

, rank

(B^{\infty}) \geq 2

.

The following necessary and sufficient result holds for both algorithms.

Lemma 10.

Assume that

S^{(0)}

is non-degenerate (affinely independent). Then,

\{S^{(k)}\}

(generated by Algorithm 1 or 2) converges to some

S^{\infty} \in R^{n \times (n + 1)}

if and only if

\{B_{k}\}

converges to some

B^{\infty}

.

Proof.

In both cases

S^{(k)} = S^{(0)} \prod_{i = 0}^{k} X_{i} = S^{(0)} B_{k} (X_{i} \in T)

, where

T = \tilde{T}

(in case of Algorithm 1) or

T = T

(in case of Algorithm 2). If

B_{k} \to B^{\infty}

, then

S^{(k)} = S^{(0)} B_{k} \to S^{(0)} B^{\infty} : = S^{\infty}

. Since

e^{T} B_{k} = e^{T}

, we have

[\begin{matrix} e^{T} \\ S^{(k)} \end{matrix}] = [\begin{matrix} e^{T} \\ S^{(0)} \end{matrix}] B_{k} .

By assumption, the first matrix on the right is invertible and so

B_{k} = {[\begin{matrix} e^{T} \\ S^{(0)} \end{matrix}]}^{- 1} [\begin{matrix} e^{T} \\ S^{(k)} \end{matrix}] .

If

S^{(k)} \to S^{\infty}

, then

B_{k} = {[\begin{matrix} e^{T} \\ S^{(0)} \end{matrix}]}^{- 1} [\begin{matrix} e^{T} \\ S^{(k)} \end{matrix}] \to {[\begin{matrix} e^{T} \\ S^{(0)} \end{matrix}]}^{- 1} [\begin{matrix} e^{T} \\ S^{\infty} \end{matrix}] = B^{\infty} .

□

From now on, we assume that

S^{(0)}

is always non-degenerate (its columns are affinely independent). Lemma 10 is formally independent of f. However, the decision mechanism of the NM algorithm determines the next simplex and the resulting matrix product. Certain configurations clearly cannot occur. Such is the case of strictly convex functions, when no shrinking can occur or

T (1) P_{2}

cannot be repeated in a sequence more than five times (for the case

n = 2

). In general, it is hard to tell what sequences (products) are possible for a given f.

Lemma 11.

If

B_{k} = \prod_{i = 0}^{k} X_{i} \to B^{\infty}

(

X_{i} \in T

,

T = \tilde{T}

or

T = T

), then rank

(B^{\infty}) \leq n

.

Proof.

Assume that rank

(B^{\infty}) = n + 1

. Then,

B^{\infty}

is invertible. Since all of

X_{i} \in T

is invertible,

{lim}_{k \to \infty} B_{k - 1}^{- 1} = {(B^{\infty})}^{- 1}

. As

X_{k} = B_{k - 1}^{- 1} B_{k}

,

lim_{k \to \infty} X_{k} = lim_{k \to \infty} (B_{k - 1}^{- 1}) lim_{k \to \infty} (B_{k}) = {(B^{\infty})}^{- 1} B^{\infty} = I .

Hence,

X_{k} \to I

is necessary for an invertible

B^{\infty}

. Since

T = \tilde{T}

or

T = T

is a finite set and its elements are different from I, this cannot happen and rank

(B^{\infty}) \leq n

. □

Sylvester’s theorem on rank (see, e.g., Mirsky [30]) implies that

rank (S^{\infty}) \geq rank (S^{(0)}) + rank (B^{\infty}) - (n + 1) .

Since, by assumption, rank

(S^{(0)}) = n

, we have

n \geq

rank

(S^{\infty}) \geq

rank

(B^{\infty}) - 1

. In Example 4 of [15], rank

(S^{(0)}) = n

, rank

(B^{\infty}) = n

and rank

(S^{\infty}) = n - 1

.

The following simple result characterizes the possible limits of infinite matrix products.

Lemma 12.

Assume that

B_{k} = \prod_{i = 0}^{k} X_{i} \to B^{\infty}

(

X_{i} \in T

) and

X_{s}

occurs infinitely often in the product

\prod_{i = 0}^{\infty} X_{i}

, then every nonzero row of

B^{\infty}

is a left eigenvector of

X_{s}

belonging to the eigenvalue

λ = 1

.

Proof.

Since

X_{s}

appears infinitely many times in the product

\prod_{i = 0}^{\infty} X_{i}

, there is a subsequence of

\{B_{i_{j}}\}

with rightmost factor

X_{s}

, say

B_{i_{1}} X_{s}, B_{i_{2}} X_{s}, \dots,

where the

B_{i_{j}}

s are products of

X_{i}

s. Since

B_{i_{j}} \to B^{\infty}

, we have

B_{i_{j}} X_{s} \to B^{\infty} X_{s} = B^{\infty}

. □

Define the (left) 1-eigenspace of matrix A by

E_{1} (A) = \{x : x A = x\}

. The rows of

B^{\infty}

belong to

E_{1} (X_{s})

. If several

X_{s}

, say

X_{s_{i}}

(

i = 1, \dots, m

), occur infinitely often in the product

\prod_{i = 0}^{\infty} X_{i}

, then the rows of

B^{\infty}

belong to

\cap_{i = 1}^{m} E_{1} (X_{s_{i}})

.

Since

e^{T} \in E_{1} (T_{i} P^{(i)})

for all

T_{i} P^{(i)} \in T

and

e^{T} \in E_{1} (T^{(i)})

for all

T^{(i)} \in \tilde{T}

, we have

e^{T} \in \cap_{i = 1}^{m} E_{1} (T_{s_{i}} P^{(s_{i})})

and

e^{T} \in \cap_{i = 1}^{m} E_{1} (T^{(s_{i})})

, respectively. If

\cap_{i = 1}^{m} E_{1} (T_{s_{i}} P^{(s_{i})}) = \{λ e^{T} : λ \in R\}

or

\cap_{i = 1}^{m} E_{1} (T^{(s_{i})}) = \{λ e^{T} : λ \in R\}

, then

B^{\infty}

has the form

w e^{T}

for some

w \in R^{n + 1}

.

Note that for any

T \in \tilde{T}

or

T \in T

,

e^{T} T = e^{T}

. Hence, if

B_{k} \to B^{\infty}

, then

e^{T} B_{k} = e^{T} \to e^{T} B^{\infty}

, that is,

e^{T} = e^{T} B^{\infty}

. If

B^{\infty} = w e^{T}

, then

e^{T} B^{\infty} = e^{T} w e^{T} = e^{T}

implies that

e^{T} w = 1

.

We recall the following definitions and results (see, e.g., Hartfiel [31,32]).

A right infinite matrix product is an expression

A_{1} A_{2} \dots A_{k} A_{k + 1} \dots

. A set

M

of

n \times n

matrices has the right convergence property (RCP), if all possible right infinite products

\prod_{i = 1}^{\infty} A_{i}

(

A_{i} \in M

) converge.

A set

M

of

n \times n

matrices is product-bounded if there is a constant

β > 0

such that

∥A_{1} \dots A_{k}∥ \leq β

for all k and all

A_{1}, \dots, A_{k} \in M

.

If

M

is an RCP set, then

M

is also product-bounded.

Lemma 13.

If

M

is an RCP set,

A_{1}, \dots, A_{k} \in M

and λ is an eigenvalue of

A_{1} A_{2} \dots A_{k}

, then

|λ| < 1

or

λ = 1

, and this eigenvalue is simple. Hence, each matrix of

M

must satisfy this condition.

The matrices

T (2) P_{1}

and

T_{j} (2)

have at least one eigenvalue greater than 1, and

∥T (2) P_{1}∥ > 1

and

∥T_{j} (2)∥ > 1

in any induced matrix norm. Hence,

\{{[T (2) P_{1}]}^{k}\}

and

\{T_{j} {(2)}^{k}\}

are unbounded. There are also examples when the Nelder–Mead algorithm produces unbounded simplex sequences

S^{(k)} = S^{(0)} {[T (2) P_{1}]}^{k}

or

S^{(k)} = S^{(0)} {[T (1) P_{1}]}^{k}

(see, e.g., [14,15,20]). It is clear that the whole matrix set

\tilde{T}

or

T

is not an RCP set.

Hence, we must seek for subsets

M

of

\tilde{T}

or

T

, which are RCPs. However, Blondel and Tsitsiklis [33] proved that the product boundedness of a finite matrix set

M

is algorithmically undecidable and it remains undecidable even in the special case, when

M

consists of only two matrices. Since product boundedness is a weaker property than the RCP from which it follows, and although it is algorithmically undecidable, it seems difficult to decide the RCP property in general. This might answer question (e) of Wright.

However, it is relatively easy to identify one particular RCP set. For both algorithms, the corresponding shrinking matrices form an RCP set.

Lemma 14.

Both

M_{1} = \{T_{j}^{s h r} : 1 \leq j \leq n + 1\}

and

M_{2} = \{T_{s h r} P : P \in P_{n + 1}\}

are RCP sets.

Proof.

By using induction on k, we can prove that for

k \geq 1

,

\prod_{i = 1}^{k} T_{j_{i}}^{s h r} = \prod_{i = 1}^{k} (\frac{1}{2} I + \frac{1}{2} e_{j_{i}} e^{T}) = \frac{1}{2^{k}} I + \sum_{i = 1}^{k} \frac{1}{2^{i}} e_{j_{i}} e^{T}

and

\prod_{i = 1}^{k} T_{s h r} P^{(j_{i})} = \prod_{i = 1}^{k} (\frac{1}{2} I + \frac{1}{2} e_{1} e^{T}) P^{(j_{i})} = \frac{1}{2^{k}} \prod_{i = 1}^{k} P^{(j_{i})} + \sum_{i = 1}^{k} \frac{1}{2^{i}} e_{ℓ_{i}} e^{T} (ℓ_{1} = 1) .

For the last formula, note that if P is a permutation matrix, then

e_{i}^{T} P = e^{T}

and

P e_{i} = e_{j}

for some j. In element-wise partial ordering,

T_{j}^{s h r} \geq 0

holds for

j = 1, \dots, n + 1

and

0 \leq \prod_{i = 1}^{k} T_{j_{i}}^{s h r} = \frac{1}{2^{k}} I + \sum_{i = 1}^{k} \frac{1}{2^{i}} e_{j_{i}} e^{T} \leq e e^{T} .

Here,

\frac{1}{2^{k}} I \to 0

as

k \to \infty

, and the sequence

{\{\sum_{i = 1}^{k} \frac{1}{2^{i}} e_{j_{i}} e^{T}\}}_{k = 1}^{\infty}

is monotone increasing in component-wise partial ordering. Hence,

\prod_{i = 1}^{\infty} T_{j_{i}}^{s h r}

is convergent. We also have that

0 \leq \prod_{i = 1}^{k} T_{s h r} P^{(j_{i})} \leq e e^{T}

,

\frac{1}{2^{k}} \prod_{i = 1}^{k} P^{(j_{i})} \to 0

as

k \to \infty

, and

{\{\sum_{i = 1}^{k} \frac{1}{2^{i}} e_{j_{i}} e^{T}\}}_{k = 1}^{\infty}

is monotone increasing in partial ordering. Hence,

\prod_{i = 1}^{\infty} T_{s h r} P^{(j_{i})}

is convergent. □

Remark 2.

Corollary 2 implies that

E_{1} (T_{s h r} P) = \{λ e^{T} : λ \in R\}

for all

P \in P_{n + 1}

. If matrices

T_{s h r} P^{(s_{i})}

(

i = 1, \dots, m

) occur infinitely often in the product

\prod_{i = 1}^{\infty} T_{s h r} P^{(j_{i})}

and

\prod_{i = 1}^{k} T_{s h r} P^{(j_{i})} \to B^{\infty}

(

k \to \infty

), then the nonzero rows of

B^{\infty}

belong to

\cap_{i = 1}^{m} E_{1} (T_{s h r} P^{(j_{i})}) = \{λ e^{T} : λ \in R\}

. Hence,

\prod_{i = 1}^{k} T_{s h r} P^{(j_{i})} \to w e^{T}

for some w. Lemma 1 implies that

E_{1} (T_{j}^{s h r}) = \{λ e^{T} : λ \in R\}

for all

j = 1, \dots, n + 1

. It also follows from the previous argument that

\prod_{i = 1}^{k} T_{j_{i}}^{s h r} \to w e^{T}

for some w.

The

S^{(k)} \to \hat{x} e^{T}

for the infinitely repeated shrinking process also follows from Cantor’s intersection theorem.

Lemma 15.

Define the matrix

F = [\begin{matrix} 1 & - e^{T} \\ 0 & I_{n} \end{matrix}] (F^{- 1} = [\begin{matrix} 1 & e^{T} \\ 0 & I_{n} \end{matrix}]) .

(23)

Then, for all

T \in T

(

T = T

or

T = \tilde{T}

), the matrix

F^{- 1} T F

has the common block lower triangular form

F^{- 1} T F = [\begin{matrix} 1 & 0 \\ b (T) & C (T) \end{matrix}],

(24)

where

b (T) \in R^{n}

and

C (T) \in R^{n \times n}

.

The result was proved for

T \in T

in [14,15]. The proof of case

\tilde{T}

is similar. Since both

\tilde{T}

and

T

are finite sets, there exist two constants

γ_{b}, γ_{C} > 0

such that

∥b (T)∥ \leq γ_{b}

and

∥C (T)∥ \leq γ_{C}

for all

T \in \tilde{T}

or

T \in T

. For

T_{j} (α) \in \tilde{T}

, the corresponding

C (T_{j} (α))

has

n - 1

eigenvalues of

λ = 1

and

∥C (T_{j} (α))∥ \geq 1

. For

T (α) P_{j} \in T

(

j > 2

),

∥C (T (α) P_{j})∥ \geq 1

. However, for

T (α) P_{j} \in T

with

|α| < 1

and

j = 1, 2

, the corresponding

C (T (α) P_{j})

s have all their eigenvalues in the open unit disk (see Theorem 1). Hence, they can be elements of an RCP set.

If

A_{i} = [\begin{matrix} 1 & 0 \\ b_{i} & C_{i} \end{matrix}] \in R^{(n + 1) \times (n + 1)} (C_{i} \in R^{n \times n}, i \geq 1),

(25)

then

L_{k} = \prod_{i = 1}^{k} A_{i} = [\begin{matrix} 1 & 0 \\ \sum_{i = 1}^{k} (\prod_{j = 1}^{i - 1} C_{j}) b_{i} & \prod_{i = 1}^{k} C_{i} \end{matrix}] = [\begin{matrix} 1 & 0 \\ x_{k} & \prod_{j = 1}^{k} C_{i} \end{matrix}] .

(26)

If the infinite matrix product

\prod_{i = 1}^{\infty} T_{i}

(

T_{i} \in T

(

i \geq 1

) or

T \in \tilde{T}

(

i \geq 1

)) converges to a rank-one matrix of the form

w e^{T}

, that is, if

\prod_{i = 1}^{k} T_{i} \to w e^{T}

(

w^{T} = [w_{1}, {\hat{w}}^{T}]

,

\hat{w} \in R^{n}

), then it follows that

\prod_{i = 1}^{k} [\begin{matrix} 1 & 0 \\ b (T_{i}) & C (T_{i}) \end{matrix}] \to F^{- 1} w e^{T} F = [\begin{matrix} e^{T} w & 0 \\ \hat{w} & 0_{n \times n} \end{matrix}] .

(27)

Lemma 16.

For the convergence of the simplex sequence to a common point, it is necessary and sufficient that both

\prod_{i = 1}^{k} C (T_{i}) \to 0 (k \to \infty)

(28)

and

\sum_{i = 1}^{k} (\prod_{j = 1}^{i - 1} C (T_{j})) b (T_{i}) \to \tilde{x} (k \to \infty)

(29)

hold for some vector

\tilde{x}

.

It is clear that the infinite products of

M_{1}

or

M_{2}

satisfy these conditions. We recall the following simple result.

Lemma 17

([14,15,16]). Assume that

∥\prod_{j = 1}^{k} C_{j}∥ \leq c_{k}

,

\sum_{k = 1}^{\infty} c_{k}

is convergent (

< \infty

) and that

∥b_{k}∥ \leq γ

for all k. Then,

L_{k} = \prod_{j = 1}^{k} A_{j}

converges and

lim_{k \to \infty} L_{k} = [\begin{matrix} 1 & 0 \\ \tilde{x} & 0 \end{matrix}]

(30)

for some

\tilde{x}

.

If

∥C_{i}∥ \leq q < 1

for all i, then the speed of convergence is linear (geometric). This follows from

∥\prod_{i = 1}^{k} C_{i}∥ \leq q^{k}

and the inequality

∥\tilde{x} - x_{k}∥ \leq γ \sum_{i = k + 1}^{m} ∥\prod_{j = 1}^{i - 1} C_{i}∥ \leq γ \frac{q^{k}}{1 - q} (m > k)

with

m \to \infty

.

We recall the following result (see Hartfiel [31], Corollary 6.4, Daubechies and Lagarias [34]).

Lemma 18.

Let

M

be a compact matrix set. Then, every infinite product, taken from

M

, converges to 0 if there is a norm

∥\cdot∥

such that

∥A∥ \leq q

,

q < 1

, for all

A \in M

.

Hence, there must be a norm

∥\cdot∥

such that

∥C_{i}∥ \leq q < 1

, for all

T_{s h r} P_{j}

(or

T_{j}^{s h r}

). Note that Lemma 18 is not constructive. However, for the convergence of

\prod_{i = 1}^{\infty} T_{i} \to w e^{T}

(for all

T_{i} \in M

), the condition

∥C (T_{i})∥ \leq q < 1

(

T_{i} \in M

) is necessary and sufficient.

For Algorithm 1,

M_{1}

is the only subset that satisfies this requirement. For Algorithm 2, define the set

{\hat{M}}_{2} = \{T (α) P_{j} : α = \pm \frac{1}{2}, j = 1, 2\} \cup M_{2}

and assume that

(A): there exists a matrix norm ${∥\cdot∥}_{ϑ}$ (induced by a vector norm ${∥\cdot∥}_{ϑ}$ ) such that if $T_{i} P^{(i)} \in {\hat{M}}_{2}$ , then ${∥C (T_{i} P^{(i)})∥}_{ϑ} < 1$ .

If (A) holds, then

{\hat{M}}_{2}

is an RCP set and every infinite product

\prod_{i = 1}^{\infty} T_{i} P^{(i)}

(

T_{i} P^{(i)} \in {\hat{M}}_{2}

) has the form

w e^{T}

for some w. It follows from Theorem 1 that for

T_{i} P^{(i)} \in T ∖ {\hat{M}}_{2}

,

{∥C (T_{i} P^{(i)})∥}_{ϑ} \geq 1

.

The existence of such norm

{∥\cdot∥}_{ϑ}

of the form

{∥A∥}_{ϑ} = {∥S^{- 1} A S∥}_{2}

was proved in [14] for

n = 1, 2, 3

and in [15] for

n = 1, 2, \dots, 8

. For the cases

n = 9, 10

, the corresponding matrices S, found experimentally by direct search methods, are given in Appendix A. Since the eigenvalues of

T (α) P_{j}

(

|α| < 1

,

j = 1, 2

) that lay inside the unit disk converge to 1 for

n \to \infty

(see Lemma 3 of [15]), it is hard to find a norm with property (A) for greater n values.

The obtained RCP sets

M_{1}

and

{\hat{M}}_{2}

are quite narrow. However, we can keep the convergence of any infinite product from

M_{1}

or

{\hat{M}}_{2}

by inserting an infinite number of matrices from

T

under proper conditions (for Algorithm 2, see [14,15]). For Algorithm 2, it was also observed [16] that several fixed-length matrix products have spectral norms less than 1. Upon these observations, the convergence set of both algorithms can be significantly expanded.

Assume again that

T = \tilde{T}

or

T = T

. Moreover, introduce the notation

T^{ℓ} = \{T_{i_{1}} \dots T_{i_{ℓ}} : T_{i_{j}} \in T, j = 1, 2, \dots, ℓ\}

(31)

and consider the product

\prod_{j = 1}^{m ℓ} T_{i_{j}} = \prod_{j = 1}^{m} (\prod_{t = 1}^{ℓ} T_{i_{(j - 1) ℓ + t}})

, where

{\hat{T}}_{j} = \prod_{t = 1}^{ℓ} T_{i_{(j - 1) ℓ + t}} \in T^{ℓ} (j = 1, \dots, m) .

Let

F^{- 1} T_{i_{j}} F = [\begin{matrix} 1 & 0 \\ b (T_{i_{j}}) & C (T_{i_{j}}) \end{matrix}] = [\begin{matrix} 1 & 0 \\ b_{i_{j}} & C_{i_{j}} \end{matrix}] .

Then,

F^{- 1} (\prod_{j = 1}^{m ℓ} T_{i_{j}}) F = [\begin{matrix} 1 & 0 \\ \sum_{j = 1}^{m ℓ} (\prod_{t = 1}^{j - 1} C_{i_{t}}) b_{i_{j}} & \prod_{j = 1}^{m ℓ} C_{i_{j}} \end{matrix}]

(32)

and

\begin{matrix} F^{- 1} {\hat{T}}_{j} F & = [\begin{matrix} 1 & 0 \\ \sum_{r = 1}^{ℓ} (\prod_{t = 1}^{r - 1} C_{i_{(j - 1) ℓ + t}}) b_{i_{(j - 1) ℓ + r}} & \prod_{τ = 1}^{ℓ} C_{i_{(j - 1) ℓ + τ}} \end{matrix}] \\ = [\begin{matrix} 1 & 0 \\ {\hat{b}}_{j} & {\hat{C}}_{j} \end{matrix}] = [\begin{matrix} 1 & 0 \\ b ({\hat{T}}_{j}) & C ({\hat{T}}_{j}) \end{matrix}], \end{matrix}

where

∥{\hat{b}}_{j}∥ \leq (\sum_{r = 1}^{ℓ - 1} γ_{C}^{r - 1}) γ_{b} = : {\hat{γ}}_{\hat{b}}

and

∥C ({\hat{T}}_{j})∥ \leq γ_{C}^{ℓ} = : {\hat{γ}}_{\hat{C}}

are uniformly bounded.

Assume that

k = ℓ m

and consider the expression

F^{- 1} (\prod_{j = 1}^{m} {\hat{T}}_{j}) F = [\begin{matrix} 1 & 0 \\ \sum_{j = 1}^{m} (\prod_{t = 1}^{j - 1} {\hat{C}}_{t}) {\hat{b}}_{j} & \prod_{j = 1}^{m} {\hat{C}}_{j} \end{matrix}] .

(33)

which is clearly equal to (32).

For

0 < q < 1

, define the sets

T_{q}^{ℓ} = \{T_{i_{1}} \dots T_{i_{ℓ}} \in T^{ℓ} : ∥C (T_{i_{1}} \dots T_{i_{ℓ}})∥ \leq q\}, T_{Q}^{ℓ} = T^{ℓ} ∖ T_{q}^{ℓ} .

(34)

For any

{\hat{T}}_{j} \in T_{q}^{ℓ}

, we have the estimates

∥{\hat{b}}_{j}∥ \leq {\hat{γ}}_{\hat{b}}

and

∥{\hat{C}}_{j}∥ \leq q < 1

. Hence, by Lemma 17, the set

T_{q}^{ℓ}

is an RCP set, and the limits of the infinite products

\prod_{j = 1}^{\infty} {\hat{T}}_{j}

are of the form

w e^{T}

.

Now, consider the product

\prod_{j = 1}^{k} T_{i_{j}} = \prod_{j = 1}^{m ℓ + r} T_{i_{j}} = (\prod_{j = 1}^{m ℓ} T_{i_{j}}) (\prod_{τ = m ℓ + 1}^{m ℓ + r} T_{i_{τ}}) = (\prod_{j = 1}^{m} {\hat{T}}_{j}) (\prod_{τ = m ℓ + 1}^{m ℓ + r} T_{i_{τ}})

where

{\hat{T}}_{j} \in T_{q}^{ℓ}

(

j = 1, \dots, m

),

T_{i_{τ}} \in T

(

τ = m ℓ + 1, \dots m ℓ + r

) and

0 \leq r < ℓ

are fixed (

ℓ \geq 1

). The corresponding reduced form is

F^{- 1} (\prod_{j = 1}^{k} T_{i_{j}}) F = [\begin{matrix} 1 & 0 \\ x_{k} & (\prod_{j = 1}^{m} {\hat{C}}_{j}) (\prod_{τ = m ℓ + 1}^{m ℓ + r} C_{i_{τ}}) \end{matrix}]

Consider the product

\prod_{j = 1}^{m} {\hat{T}}_{j}

(

{\hat{T}}_{j} \in T^{ℓ}

), where the number of factors

{\hat{T}}_{j}

that belong to

T_{q}^{ℓ}

is

r_{1} (m)

. Clearly,

0 \leq r_{1} (m) \leq m

. There exists a

κ \in N

such that

\frac{1}{q^{κ - 1}} \leq {\hat{γ}}_{C} \leq \frac{1}{q^{κ}}

. Assume that there is a number

μ \in (0, 1]

such that

(1 + κ) r_{1} (m) - κ m \geq μ m

. Then,

\begin{matrix} ∥(\prod_{j = 1}^{m} {\hat{C}}_{j}) (\prod_{τ = m ℓ + 1}^{m ℓ + r} C_{i_{τ}})∥ & \leq q^{r_{1} (m)} {\hat{γ}}_{C}^{m - r_{1} (m)} γ_{C}^{r} \\ \leq γ_{C}^{ℓ - 1} q^{r_{1} (m) - κ (m - r_{1} (m))} = γ_{C}^{ℓ - 1} q^{(1 + κ) r_{1} (m) - κ m} \\ \leq γ_{C}^{ℓ - 1} q^{μ m} \leq γ_{C}^{ℓ - 1} q^{μ (\frac{k}{ℓ} - 1)} = \frac{1}{q} γ_{C}^{ℓ - 1} {(q^{\frac{μ}{ℓ}})}^{k} . \end{matrix}

Since

\hat{q} = q^{\frac{μ}{ℓ}} < 1

and the conditions of Lemma 17 hold, we proved the following result.

Theorem 5.

Assume that

n \geq 2

,

S^{(0)}

is non-degenerate,

ℓ \geq 1

is fixed and

T_{q}^{ℓ}

is not empty. Let

r_{1} (⌊\frac{k}{ℓ}⌋)

be the number of ℓ products that belong to

T_{q}^{ℓ}

during the first k iterations of the Nelder–Mead method (Algorithm 2). Moreover, assume that for an integer

κ \in N

,

\frac{1}{q^{κ - 1}} \leq {\hat{γ}}_{C} \leq \frac{1}{q^{κ}}

and for some

μ \in (0, 1]

,

r_{1} (⌊\frac{k}{ℓ}⌋) \geq \frac{μ + κ}{1 + κ} ⌊\frac{k}{ℓ}⌋

. Then, the Nelder–Mead method (Algorithm 2) converges in the sense that

lim_{k \to \infty} x_{j}^{(k)} = \hat{x} (j = 1, 2, \dots, n + 1)

(35)

with a convergence speed proportional to

Q ({(q^{\frac{μ}{ℓ}})}^{k})

.

Note that assumption

r_{1} (m) \geq \frac{μ + κ}{1 + κ} m

, which is a density condition, yields an infinite number of factors from the set

T_{Q}^{ℓ}

as

m \to \infty

. The larger the ratio

|T_{q}^{ℓ}| / |T^{ℓ}|

, the wider the convergence set. Theorem 5 is clearly true for any pair of subsets

W^{ℓ}

and

W_{q}^{ℓ}

such that

\emptyset \neq W_{q}^{ℓ} \subset W^{ℓ}

and

W \subseteq T

.

For Algorithm 1 and some

q < 1

,

{\tilde{T}}_{q}^{1} = M_{1}

, to which we cannot add more elements, since

∥C (T_{j} (α))∥ \geq 1

in any induced matrix norm. For Algorithm 2,

T_{q}^{1} = {\hat{M}}_{2}

under Assumption (A).

Theorem 5 was first proved for Algorithm 2 for

ℓ = 1

in [14,15] and for

ℓ > 1

in [16] in a slightly different way and form. The conditions were verified for a spectral norm and

q = 0.99

up to dimension

n = 6

.

The computational results of Section 7 show that for

ℓ > 1

,

M_{1}^{ℓ} ⫋ {\tilde{T}}_{q}^{ℓ}

and

{\hat{M}}_{2}^{ℓ} ⫋ T_{q}^{ℓ}

(see also [16]). Hence, Theorem 5 indeed extends the convergence sets

M_{1}

and

{\hat{M}}_{2}

(quite significantly).

Upon the basis of the earlier version of Theorem 5, a stochastic convergence result was proved for Algorithm 2 in [16]. Since Theorem 5 holds for both algorithms, this can be extended for Algorithm 1 as well. However, it will be published later for reasons discussed in Section 7.

7. Computational Results and Conclusions

Here, we consider the cardinality of set

T_{q}^{ℓ}

for both algorithms with and without shrinking operations.

Define the quantities

\tilde{r} (n, ℓ) = |{\tilde{T}}_{q}^{ℓ}| / |{\tilde{T}}^{ℓ}|

and

r (n, ℓ) = |T_{q}^{ℓ}| / |T^{ℓ}|

. The larger the ratio

\tilde{r} (n, ℓ)

or

r (n, ℓ)

, the wider the convergence set. Table 1 and Table 2 contain the computed values

\tilde{r} (n, ℓ)

or

r (n, ℓ)

, respectively.

Note that

\tilde{r} (n, ℓ) \geq 1 / 5^{ℓ}

and

r (n, ℓ) \geq {[\frac{(n + 1)!}{3 n + 3 + (n + 1)!}]}^{ℓ}

. The ratio

r (n, ℓ)

cannot achieve 1 (see the examples within Section 1 and Section 6). For increasing n, the

(n + 1)!

shrinking matrices somehow distort the real situation, since it is unlikely that all possible shrinking steps occur for one particular function.

It is clear that Algorithm 2 has a wider convergence set than Algorithm 1.

If we discard the shrinking operations (the case of strictly convex functions), then Theorem 5 also holds for the pair of sets

{\tilde{T}}_{w}^{ℓ} = \{T_{i_{1}} \dots T_{i_{ℓ}} : T_{i_{j}} \in \tilde{T} ∖ M_{1}, j = 1, \dots, n + 1\},

{\tilde{T}}_{w}^{ℓ} (q) = \{T_{i_{1}} \dots T_{i_{ℓ}} \in {\tilde{T}}_{w}^{ℓ} : ∥C (T_{i_{1}} \dots T_{i_{ℓ}})∥ \leq q\}

and

T_{w}^{ℓ} = \{T_{i_{1}} \dots T_{i_{ℓ}} : T_{i_{j}} \in T ∖ M_{2}, j = 1, 2, \dots, ℓ\},

T_{w}^{ℓ} (q) = \{T_{i_{1}} \dots T_{i_{ℓ}} \in T_{w}^{ℓ} : ∥C (T_{i_{1}} \dots T_{i_{ℓ}})∥ \leq q\},

instead of

T^{ℓ}

and

T_{q}^{ℓ}

, respectively. Define again the ratios

{\tilde{r}}_{w} (n, ℓ) = |{\tilde{T}}_{w}^{ℓ} (q)| / |{\tilde{T}}_{w}^{ℓ}|

and

r_{w} (n, ℓ) = |T_{w}^{ℓ} (q)| / |T_{w}^{ℓ}|

. Note that for

n \leq 10

and some

0 < q < 1

,

r_{w} (n, ℓ) \geq 4^{ℓ} / {(3 n + 3)}^{ℓ}

. Table 3 and Table 4 contain the computed values of

{\tilde{r}}_{w} (n, ℓ)

and

r_{w} (n, ℓ)

, respectively.

The comparison of the last two tables shows again that Algorithm 1 has a smaller convergence set and a slower convergence rate than Algorithm 2.

We can observe that the larger the ℓ, the larger is the ratio in all four tables. Although the ratios cannot achieve 1, it is conjectured that they might be quite close to 1 for a large ℓ.

For both algorithms, we can observe the increase in convergence sets, if ℓ is increasing. For larger n values, this increase is slower, especially if no shrinking is allowed. However, this requires an enormous computational time to check, even for modest pairs of n and ℓ. The reason for this behavior is unknown and yet to be investigated. However, we can establish the following conclusions. In case of the convergence of the simplex sequence, Algorithm 2 is better than Algorithm 1. Hence, the ordering of Lagarias et al. [12] significantly improved the classical Nelder–Mead simplex method.

Theorem 5 indicates that the generated simplex sequence has a strong tendency to converge to one common limit point. This, together with Lemma 2 (or Lemma 3), may guarantee that the Nelder–Mead simplex algorithm generates good practical results.

In the case of the convergence of function values at vertices, the novel results are essentially the same for both methods. The extension of Theorems 2, 3 or 4 for

n > 2

is an open question. For

n = 2

, all proofs exploit (implicitly or explicitly) an involuntory property, which does not hold for

n \geq 3

. How the convexity assumptions can be relaxed further is also an open question. If f is quasiconvex on

R^{2}

and bounded from below, then

f_{2}^{*} = f_{3}^{*}

. However, the other parts of the proof do not work.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The author is indebted to J. Abaffy and L. Szeidl for their help and comments. He is also indebted to the unknown referees whose observations and suggestions helped to improve this paper.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A

The matrices for the norm

{∥A∥}_{ϑ} = {∥S^{- 1} A S∥}_{2}

and

n = 9, 10

.

The case where

n = 9

:

S = [\begin{matrix} - 1.2571 & - 0.0128 & 0.5028 & 0.5047 & 2.3173 & - 1.2682 & - 1.4404 & - 2.2437 & 1.1064 \\ - 1.1260 & - 1.8225 & - 1.0287 & 0.0972 & - 1.3367 & - 1.2773 & - 0.3227 & 0.7005 & 2.0421 \\ - 1.7197 & - 0.0434 & - 0.1394 & 0.5115 & 0.1233 & - 0.2385 & - 0.9384 & 0.4162 & - 2.5873 \\ 0.0424 & - 0.0296 & 0.2238 & 1.3031 & 1.0884 & - 0.0319 & 1.7164 & 2.1411 & 0.7368 \\ 1.1717 & 1.7668 & 0.2037 & 0.6359 & - 1.3285 & - 1.7570 & 0.1136 & - 0.3241 & - 1.0724 \\ - 0.0138 & - 0.9442 & 0.0342 & - 0.8301 & 0.0499 & 0.2340 & 2.8213 & - 1.0245 & - 0.5072 \\ - 0.0718 & 0.5593 & 2.5306 & - 0.5053 & - 1.5393 & 0.9299 & - 0.4054 & 0.1010 & 0.0832 \\ 0.5396 & 0.2878 & - 0.1743 & - 2.5974 & 0.9844 & - 0.1686 & 0.0242 & 1.0429 & - 0.1559 \\ 2.1307 & - 1.4421 & 0.4763 & 0.5795 & - 0.0088 & 0.4682 & - 1.0355 & 0.0462 & - 0.5015 \end{matrix}]

The case where

n = 10

:

S = [\begin{matrix} 2.0568 & 4.2600 & - 0.8876 & - 0.1566 & - 1.9386 & - 0.0651 & 0.2271 & - 1.8056 & 0.1294 & 0.0664 \\ 0.4312 & - 0.0886 & 1.8330 & 4.6222 & - 0.0374 & - 0.0071 & - 0.0483 & 0.8386 & 0.2741 & - 0.0184 \\ 3.4415 & - 1.8884 & 0.8095 & - 1.5183 & 1.2164 & - 1.3902 & - 0.0710 & - 0.0837 & - 1.6783 & - 0.0210 \\ - 2.1716 & - 0.0643 & - 1.0052 & 0.0034 & - 1.0715 & - 0.8995 & 2.6609 & 0.0733 & - 2.7365 & 0.5002 \\ - 1.3924 & - 0.2185 & 0.0169 & 0.3663 & - 0.0497 & - 0.4920 & - 4.0342 & - 0.1023 & - 1.1653 & 1.0472 \\ - 0.1115 & - 0.9027 & 1.9524 & - 2.1948 & - 2.7482 & 0.0134 & 0.0722 & 1.3448 & 1.2371 & - 0.3717 \\ - 0.5131 & - 0.0069 & 0.8486 & - 0.0600 & 0.7613 & 3.9197 & 0.5919 & - 0.5482 & - 1.0459 & - 0.4000 \\ 0.9491 & - 1.8435 & - 3.2427 & 0.4871 & - 0.5311 & 0.8950 & - 0.6546 & 0.7932 & 0.6149 & - 0.3206 \\ - 1.0854 & 0.3223 & - 0.1177 & - 0.5542 & 0.7593 & - 1.2426 & 0.0307 & 0.0804 & 0.2414 & - 3.5372 \\ - 0.0960 & 1.6263 & - 0.1063 & - 0.4209 & 1.6908 & - 0.0096 & 0.1650 & 2.4456 & 0.5468 & 1.1978 \end{matrix}]

References

Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
Spendley, W.; Hext, G.; Himsworth, F. Sequential application of simplex designs in optimisation and evolutionary operation. Technometrics 1962, 4, 441–461. [Google Scholar] [CrossRef]
Audet, C.; Hare, W. Derivative-free and Blackbox Optimization; Springer: Berlin/Heidelberg, Germany, 2017. [Google Scholar] [CrossRef]
Conn, A.; Scheinberg, K.; Vicente, L. Introduction to Derivative-Free Optimizations; SIAM: Philadelphia, PA, USA, 2009. [Google Scholar] [CrossRef]
Kelley, C. Iterative Methods for Optimization; SIAM: Philadelphia, PA, USA, 1999. [Google Scholar] [CrossRef]
Kochenderfer, M.; Wheeler, T. Algorithms for Optimization; The MIT Press: Cambridge, MA, USA, 2019. [Google Scholar]
Walters, F.; Morgan, S.; Parker, L.; Deming, S. Sequential Simplex Optimization; CRC Press LLC: Boca Raton, FL, USA, 1991. [Google Scholar]
Larson, J.; Menickelly, M.; Wild, S. Derivative-free optimization methods. Acta Numer. 2019, 28, 287–404. [Google Scholar] [CrossRef]
Rykov, A. System Analysis. Models and Methods of Decision Making and Search Engine Optimization (in Russian); MISiS: Moscow, Russia, 2009. [Google Scholar]
Heeman, P. On Derivative-Free Optimisation Methods. Master’s Thesis, University of Bergen, Bergen, Norway, 2023. [Google Scholar]
Rios, L.; Sahinidis, N. Derivative-free optimization: A review of algorithms and comparison of software implementations. J. Glob. Optim. 2013, 56, 1247–1293. [Google Scholar] [CrossRef]
Lagarias, J.; Reeds, J.; Wright, M.; Wright, P. Convergence properties of the Nelder-Mead simplex method in low dimensions. SIAM J. Optimiz. 1998, 9, 112–147. [Google Scholar] [CrossRef]
Kelley, C. Detection and remediation of stagnation in the Nelder-Mead algorithm using an sufficient decrease condition. SIAM J. Optimiz. 1999, 10, 43–55. [Google Scholar] [CrossRef]
Galántai, A. Convergence theorems for the Nelder-Mead method. J. Comput. Appl. Mech. 2020, 15, 115–133. [Google Scholar] [CrossRef]
Galántai, A. Convergence of the Nelder-Mead method. Numer. Algorithms 2022, 90, 1043–1072. [Google Scholar] [CrossRef]
Galántai, A. A stochastic convergence result for the Nelder-Mead simplex method. Mathematics 2023, 11, 1998. [Google Scholar] [CrossRef]
Wright, M. Direct search methods: Once scorned, now respectable. In Numerical Analysis 1995 (Proceedings of the 1995 Dundee Biennial Conference in Numerical Analysis); Griffiths, D., Watson, G., Eds.; Addison-Wesley Longman: Harlow, UK, 1996; pp. 191–208. [Google Scholar]
Wright, M. Nelder, Mead, and the other simplex method. In Documenta Mathematica, Extra Volume: Optimization Stories; FIZ Karlsruhe GmbH: Eggenstein-Leopoldshafen, Germany, 2012; pp. 271–276. [Google Scholar]
McKinnon, K. Convergence of the Nelder-Mead simplex method to a nonstationary point. SIAM J. Optimiz. 1998, 9, 148–158. [Google Scholar] [CrossRef]
Galántai, A. Convergence of the Nelder-Mead method for convex functions. Acta Polytech. Hung. 2024, 21, 185–202. [Google Scholar] [CrossRef]
Galántai, A. A convergence analysis of the Nelder-Mead simplex method. Acta Polytech. Hung. 2021, 18, 93–105. [Google Scholar] [CrossRef]
Hendrix, E. ; G.-Tóth, B. Introduction to Nonlinear and Global Optimization; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar] [CrossRef]
Oldenburger, R. Infinite powers of matrices and characteristic roots. Duke Math. J. 1940, 6, 357–361. [Google Scholar] [CrossRef]
Meyer, C. Matrix Analysis and Applied Linear Algebra; SIAM: Philadelphia, PA, USA, 2000. [Google Scholar]
Horn, R.; Johnson, C. Matrix Analysis; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Barnett, S. Matrices: Methods and Applications; Clarendon Press: Oxford, UK, 1990. [Google Scholar]
Kittaneh, F. Singular values of companion matrices and bounds of zeros of polynomials. SIAM J. Matrix. Anal. Appl. 1995, 16, 333–340. [Google Scholar] [CrossRef]
Montano, E.; Salas, M.; Soto, R. Positive matrices with prescribed singular values. Proyecciones 2008, 27, 289–305. [Google Scholar] [CrossRef]
Trench, W. Characterization and properties of matrices with k-involutory symmetries. Linear Algebra Its Appl. 2008, 429, 2278–2290. [Google Scholar] [CrossRef]
Mirsky, L. An Introduction to Linear Algebra; Dover Publications, Inc.: New York, NY, USA, 1990. [Google Scholar]
Hartfiel, D. Nonhomogeneous Matrix Products; World Scientific: Singapore, 2002. [Google Scholar] [CrossRef]
Beyn, W.J.; Elsner, L. Infinite products and paracontracting matrices. Electron. J. Linear Al. 1997, 2, 1–8. [Google Scholar] [CrossRef]
Blondel, V.; Tsitsiklis, J. The boundedness of all products of a pair of matrices is undecidable. Syst. Control Lett. 2000, 41, 135–140. [Google Scholar] [CrossRef]
Daubechies, I.; Lagarias, J. Sets of Matrices All Infinite Products of Which Converge. Linear Algebra Its Appl. 1992, 161, 227–263. [Google Scholar] [CrossRef]

Table 1. Ratios for Algorithm 1.

			$\tilde{r} (n, ℓ)$
$n ∖ ℓ$	2	3	4	5	6	7
2	0.39556	0.52563	0.59953	0.65873	0.70331	0.7403
3	0.28	0.3755	0.45766	0.52766	0.58397	0.63187
4	0.28	0.34304	0.40802	0.46877	0.52489	0.57442

Table 2. Ratios for Algorithm 2.

			$\tilde{r} (n, ℓ)$
$n ∖ ℓ$	2	3	4	5	6	7
2	0.71111	0.83615	0.90204	0.94099	0.9641	0.97795
3	0.85185	0.93746	0.97389	0.98912	0.9956	0.99824

Table 3. Ratios for Algorithm 1 without shrinking.

			${\tilde{r}}_{w} (n, ℓ)$
$n ∖ ℓ$	2	3	4	5	6	7
2	0.18056	0.24537	0.2963	0.32823	0.35566	0.37868
3	0	0.026367	0.062897	0.093401	0.11847	0.13871
4	0	0	0.00555	0.016575	0.029086	0.041108

Table 4. Ratios for Algorithm 2 without shrinking.

			${\tilde{r}}_{w} (n, ℓ)$
$n ∖ ℓ$	2	3	4	5	6	7
2	0.34568	0.46914	0.54687	0.61437	0.67157	0.71873
3	0	0.1169	0.24257	0.3278	0.39074	0.44305

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Galántai, A. The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms 2024, 17, 523. https://doi.org/10.3390/a17110523

AMA Style

Galántai A. The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms. 2024; 17(11):523. https://doi.org/10.3390/a17110523

Chicago/Turabian Style

Galántai, Aurél. 2024. "The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions" Algorithms 17, no. 11: 523. https://doi.org/10.3390/a17110523

APA Style

Galántai, A. (2024). The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions. Algorithms, 17(11), 523. https://doi.org/10.3390/a17110523

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Nelder–Mead Simplex Algorithm Is Sixty Years Old: New Convergence Results and Open Questions

Abstract

1. Introduction

2. Two Versions of the Nelder–Mead Method

3. Matrix Representations for the Two Algorithms

4. Connections and Properties

5. Convergence of the Function Values at the Vertices

6. Convergence of the Simplex Sequences

7. Computational Results and Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI