Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems

Rojec, Žiga; Tuma, Tadej; Olenšek, Jernej; Bűrmen, Árpád; Puhan, Janez

doi:10.3390/math10132288

Open AccessArticle

Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems

by

Žiga Rojec

,

Tadej Tuma

,

Jernej Olenšek

,

Árpád Bűrmen

and

Janez Puhan

^*

Department of Electronics, Faculty of Electrical Engineering, University of Ljubljana, SI-1000 Ljubljana, Slovenia

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(13), 2288; https://doi.org/10.3390/math10132288

Submission received: 20 May 2022 / Revised: 21 June 2022 / Accepted: 27 June 2022 / Published: 30 June 2022

(This article belongs to the Special Issue Optimization Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Although proposed more than half a century ago, the Nelder–Mead simplex search algorithm is still widely used. Four numeric constants define the operations and behavior of the algorithm. The algorithm with the original constant values performs fine on most low-dimensional, but poorly on high-dimensional, problems. Therefore, to improve its behavior in high dimensions, several adaptive schemas setting the constants according to the problem dimension were proposed in the past. In this work, we present a novel adaptive schema obtained by a meta-optimization procedure. We describe a schema candidate with eight parameters subject to meta-optimization and define an objective function evaluating the candidate’s performance. The schema is optimized on up to 100-dimensional problems using the Parallel Simulated Annealing with Differential Evolution global method. The obtained global minimum represents the proposed schema. We compare the performance of the optimized schema with the existing adaptive schemas. The data profiles on the Gao–Han modified quadratic, Moré–Garbow–Hilstrom, and CUTEr (Constrained and Unconstrained Testing Environment, revisited) benchmark problem sets show that the obtained schema outperforms the existing adaptive schemas in terms of accuracy and convergence speed.

Keywords:

meta-optimization; Nelder–Mead Algorithm; adaptive parameter schema; high-dimensional optimization problems

MSC:

65K05; 90C56

1. Introduction

Today, we can find optimization algorithms in almost every field of science and technology. A number of optimization algorithms exist, invented to fulfill various requirements regarding convergence rate, precision, robustness, and more. One of them (the Nelder–Mead simplex algorithm [1]), although more than half a century old, is still extensively used for solving a wide range of continuous optimization problems. The algorithm’s popularity is due to its simplicity and reasonably good performance observed in practical optimization cases.

Despite its popularity, the algorithm has proven convergence issues. McKinnon [2] presented a family of two-dimensional functions that cause the Nelder–Mead Algorithm (NMA) to converge to a non-stationary point. Galántai [3] provided a sufficient condition for repeated inside contractions in two dimensions, causing non-convergence. A relatively poor theoretical background on algorithm convergence is available for up to two-dimensional problems. Lagarias et al. [4] proved various limited convergence results for two-dimensional strictly convex objective functions. Further, Lagarias et al. [5] proved convergence for a restricted NMA, a version of the NMA without expansion steps, on two-dimensional strictly convex

C^{2}

functions with bounded level sets.

Many modifications of the original NMA were proposed to avoid the algorithm’s known deficiencies and provide convergence. Kelley [6] proposed an oriented restart in case of detected stagnation. Tseng’s [7] and Nazareth and Tseng’s [8] versions of the algorithm guarantee convergence with a sufficient descent approach. Price et al. [9] used simplex reshaping to achieve convergence on

C^{1}

functions, again satisfying sufficient descent conditions. Bűrmen et al. [10], on the other hand, introduced a grid-restrained version of the NMA, thus making the algorithm a pattern search method. The approach was generalized to a successive approximation of the objective function by Bűrmen and Tuma [11].

However, convergence analysis of the unmodified, original NMA stays a hard mathematical problem. No theoretical background is available above two dimensions. Torczon [12] discovered that the NMA fails because search direction and downhill gradient become orthogonal when the problem dimension is large enough. Wright [13] reported that several scholars observed how the NMA deteriorates with dimensionality but without any explanation. Further, Han and Neumann [14] showed that the NMA makes less and less progress per iteration with an increasing problem dimension. Gao and Han [15] suggested that poor performance in high dimensions is due to an increasing fraction of reflection steps.

Researchers addressed the poor performance of the original NMA in high dimensions in two ways. First, they proposed various algorithm modifications to improve the convergence rate in high-dimensional parameter spaces. Fajfar et al. [16] used genetic programming to evolve a direct search procedure using reflection, expansion, and contraction steps. Musafer’s [17] modification adjusts simplex size and direction by performing different NMA steps on various axial combinations. Fajfar et al. [18] proposed random centroid perturbation for improving the search direction, to name a few.

The second approach deals with NMA parameters and does not modify the algorithm itself in any way. Gao and Han [15] proposed the first schema of dimension-dependent NMA parameters to maintain the algorithm’s descent property in high dimensions. Kumar and Suri [19] suggested another schema obtained from parameter sensitivity analysis on five test functions. Mehta [20], on the other hand, observed that two schemas based on Chebyshev spaced points outperform Gao–Han’s and Kumar–Suri’s schemas.

This paper presents the global minimum of the meta-optimization of the NMA adaptive parameter schema. We used the Parallel Simulated Annealing with Differential Evolution (PSADE) robust global optimization method [21] running on a cluster of personal computers as the meta-optimization method. The subjects of meta-optimization are eight coefficients whose values define an individual adaptive parameter Schema Candidate (SC). The schema’s mathematical formulation is set in advance and does not evolve during the procedure. We run the NMA using the SC on several test functions and evaluate the SC in each iteration of the meta-optimization. The global minimum represents the best adaptive parameter schema corresponding to the predefined mathematical formulation of the schema and used objective function. We compare the performance of the NMA using the optimized schema with the NMA using the existing schemas on modified quadratic, i.e., Gao–Han (GH) [15], Moré–Garbow–Hilstrom (MGH) [22], and Constrained and Unconstrained Testing Environment, revisited (CUTEr) [23], sets of benchmark problems. Since the proposed adaptive parameter schema results from a meta-optimization procedure, we do not provide a mathematical background explaining the schema’s performance. However, the proposed schema outperforms all the other schemas and thus, to the best of our knowledge, currently represents the best dimension adaptive parameter schema for the NMA.

2. Adaptive Parameter Schemas for NMA

Each mathematical symbol used in Section 2 and Section 3 is explained at first use. However, for clarification, all the symbols are also listed in the Abbreviations section at the end of the paper.

The original NMA [1,4] is well known, therefore we provide only a brief introduction. The NMA is an unconstrained minimization algorithm operating on an objective function f of

n_{p}

variables handling

(n_{p} + 1)

points

P_{i}

, i.e., simplex vertices, with objective function values

f (P_{i})

,

i = 0, 1 \dots n_{p}

. In one iteration of the NMA, the following steps are performed:

Order and relabel simplex vertices to satisfy $f (P_{0}) \leq f (P_{1}) \leq \dots \leq f (P_{n_{p}})$ . Calculate the centroid point $\bar{P} = \frac{1}{n_{p}} \sum_{i = 0}^{n_{p} - 1} P_{i}$ excluding the highest objective function value point.
Reflect $P_{n_{p}}$ over $\bar{P}$ to obtain the reflected point $P_{r} = \bar{P} + α (\bar{P} - P_{n_{p}})$ , $α > 0$ .
If $f (P_{r}) < f (P_{0})$ , expand $P_{r}$ to obtain the expanded point
$P_{e} = \bar{P} + \frac{β}{α} (P_{r} - \bar{P}) = \bar{P} + β (\bar{P} - P_{n_{p}})$ , $β > α$ .
If $f (P_{e}) < f (P_{r})$ , replace $P_{n_{p}}$ with $P_{e}$ and end the iteration.
If $f (P_{r}) < f (P_{n_{p} - 1})$ , replace $P_{n_{p}}$ with $P_{r}$ and end the iteration.
If $f (P_{r}) < f (P_{n_{p}})$ , contract $P_{r}$ towards $\bar{P}$ to obtain the contracted point
$P_{r c} = \bar{P} + \frac{γ}{α} (P_{r} - \bar{P}) = \bar{P} + γ (\bar{P} - P_{n_{p}})$ , $γ < α$ .
If $f (P_{r c}) < f (P_{n_{p}})$ , replace $P_{n_{p}}$ with $P_{r c}$ and end the iteration.
If $f (P_{r}) \geq f (P_{n_{p}})$ , contract $P_{n_{p}}$ towards $\bar{P}$ to obtain the contracted point
$P_{n c} = \bar{P} + γ (P_{n_{p}} - \bar{P}) = \bar{P} - γ (\bar{P} - P_{n_{p}})$ .
If $f (P_{n c}) < f (P_{n_{p}})$ , replace $P_{n_{p}}$ with $P_{n c}$ and end the iteration.
Shrink the entire simplex towards $P_{0}$ , $P_{i} : = P_{0} + δ (P_{i} - P_{0})$ , $δ < 1$ , $i = 1, 2, \dots n_{p}$ .

NMA iterations are repeated until convergence is achieved. Algorithm behavior depends on

α

(reflection),

β

(expansion),

γ

(contraction), and

δ

(shrink) parameter values. The NMA default values are

α = 1, β = 2, γ = \frac{1}{2}, δ = \frac{1}{2} .

(1)

Adaptive parameter schemas define NMA parameter values as functions of the number of variables

n_{p}

. The existing schemas (Gao–Han Schema (GHS) [15], Kumar–Suri Schema (KSS) [19], Chebyshev Crude Schema (CCS) [20], and Chebyshev Refined Schema (CRS) [20]) considered in this paper are

\begin{matrix} GHS : & α = 1, β = 1 + \frac{2}{n_{p}}, γ = \frac{3}{4} - \frac{1}{2 n_{p}}, δ = 1 - \frac{1}{n_{p}} \\ KSS : & α = 1 + \frac{3}{5 n_{p}}, β = \frac{6}{5}, γ = \frac{19}{20} - \frac{3}{n_{p}} - \frac{3}{n_{p}^{2}}, δ = 1 - \frac{1}{n_{p}} \\ CCS : & α = 1 + cos \frac{(n_{p} - 1 - n_{p} % 2) π}{2 n_{p}}, β = 1 + cos \frac{(n_{p} - 3 - n_{p} % 2) π}{2 n_{p}}, \\ γ = 1 + cos \frac{(n_{p} + 3 + n_{p} % 2) π}{2 n_{p}}, δ = 1 + cos \frac{(n_{p} + 1 + n_{p} % 2) π}{2 n_{p}} \\ CRS : & α = 1 + cos \frac{(n_{c} - 1) π}{2 n_{c}}, β = 1 + cos \frac{(n_{c} - 3) π}{2 n_{c}}, \\ γ = 1 + cos \frac{(n_{c} + 5) π}{2 n_{c}}, δ = 1 + cos \frac{(n_{c} + 3) π}{2 n_{c}}, n_{c} = 2 (9 + ⌊ \frac{n_{p} - 1}{5} ⌋) \end{matrix}

(2)

where % denotes modulo operation, and

⌊ ⌋

the floor function.

In general, the initial simplex vertices

P_{i}

are random. In this paper, however, to assure repeatability of the results, the initial simplex is generated from the starting point

x_{0}

using Pfeffer’s method [15]. The first vertex is starting point

P_{0} = x_{0}

. The remaining vertices are generated by varying the

i th

component

P_{i} = x_{0} + ϵ_{i} e_{i}

.

e_{i}

is the

i th

component unit vector, and

ϵ_{i}

is given by

ϵ_{i} = \{\begin{matrix} 0.05 x_{0} e_{i} & x_{0} e_{i} \neq 0 \\ 0.00025 & x_{0} e_{i} = 0 \end{matrix}, i = 1, 2 \dots n_{p} .

(3)

The starting point

x_{0}

is

{[1, 1 \dots 1]}^{T}

for GH benchmarks [15], and as given in [22] for MGH benchmarks.

A Nelder–Mead run terminates when the simplex becomes too flat or shrinks below a certain size. In this paper, we use tolerances

T o l_{f}

for simplex flatness and

T o l_{X}

for simplex size. A Nelder–Mead run stops when both criteria (4) are met:

{max}_{i = 1}^{n_{p}} | f (P_{i}) - f (P_{0}) | < T o l_{f} {max}_{j = 0}^{n_{p} - 1} {max}_{i = 1}^{n_{p}} | P_{i j} - P_{0 j} | < T o l_{X},

(4)

where

P_{i} = {[P_{i 0}, P_{i 1}, \dots P_{i (n_{p} - 1)}]}^{T}

is the

i th

vertex of the simplex.

3. Optimization of the Adaptive Parameter Schema

The default (1) and adaptive parameter schema functions (2) shown in Figure 1 have in general similar behavior. By choosing appropriate values

c_{0 p}

and

c_{1 p}

, an individual parameter c from a particular schema could be closely fitted with function

c = c_{0 p} + \frac{c_{1 p}}{n}

.

At this point, a question arises: can a better adaptive parameter schema of the formulation (5) be obtained by choosing the right values for

c_{0 α}

,

c_{1 α}

,

c_{0 β}

,

c_{1 β}

, etc.? Do such values exist, and what are they? A meta-optimization procedure could provide some answers.

α = c_{0 α} + \frac{c_{1 α}}{n_{p}}, β = c_{0 β} + \frac{c_{1 β}}{n_{p}}, γ = c_{0 γ} + \frac{c_{1 γ}}{n_{p}}, δ = c_{0 δ} + \frac{c_{1 δ}}{n_{p}}

(5)

Let us first define the meta-optimization procedure. As in any optimization, we need optimization parameters, an objective function, and an optimization method. Optimization parameters (

c_{0 α} \dots c_{1 δ}

) follow from the mathematical formulation describing an SC (5). Therefore, we have an eight-dimensional meta-optimization parameter space.

The meta-optimization objective function measures the weighted difference in data profiles [24] between the best reference schema and an SC. For a mathematical formulation of the objective function, some definitions are needed. A data profile function of a schema s over a set of benchmark problems

P

d_{s P} (κ) = \frac{| p \in P : \frac{t_{p s}}{n_{p} + 1} \leq κ |}{| P |}

(6)

defines the fraction of problems in a set

P

that schema s solves in

κ

simplex gradient estimates. s is a schema from the set of schemas

S

(

s \in S = {

NMA, GHS, KSS, CCS, CRS, SC}).

| \cdot |

denotes cardinality of a set.

t_{p s}

is the number of objective function evaluations needed by schema s to achieve convergence on problem p, and

n_{p}

is the problem dimension. Since one simplex gradient estimate corresponds to

n_{p} + 1

objective function evaluations, fraction

\frac{t_{p s}}{n_{p} + 1}

is the number of simplex gradient estimates required for convergence. Convergence is achieved when

f (x) \leq f_{L} + τ (f (x_{0}) - f_{L}),

(7)

where

x

is an evaluated point in

n_{p}

-dimensional parameter space, and

f_{L}

is the lowest objective function value reached by any of the schemas

s \in S

in

κ_{\max}

simplex gradient estimates. Tolerance

τ

specifies accuracy level. Moré and Wild [24] use tolerance values

10^{- 1}

,

10^{- 3}

,

10^{- 5}

, and

10^{- 7}

. We set convergence condition tolerance

τ

to

10^{- 7}

as Mehta did in his work [20]. If a particular schema s fails to satisfy condition (7) for problem p, then

t_{p s}

is set to infinity.

The final meta-optimization objective function

h (

SC) evaluates a particular SC defined by eight meta-optimization parameters

c_{0 α} \dots c_{1 δ}

with

h (SC) = \sum_{P \in X} \sum_{κ = 1}^{κ_{\max}} ((max_{r \in R} d_{r P} (κ) - d_{SC P} (κ)) \times \{\begin{matrix} w_{P_{+}} & max_{r \in R} d_{r P} (κ) > d_{SC P} (κ) \\ w_{P_{-}} & max_{r \in R} d_{r P} (κ) \leq d_{SC P} (κ) \end{matrix}) .

(8)

The GH and MGH benchmark problems are treated separately,

X \in {GH, MGH}

. For every number of simplex gradient estimates

κ

, the data profile of the SC is compared with the best of the reference profiles,

r \in R = S - {

SC}. The difference is weighted with

w_{P_{+}}

when the best reference is better, and

w_{P_{-}}

otherwise. Weights

w_{P_{+}}

and

w_{P_{-}}

define a trade-off between under- and over-achieving the optimization goal (which in turn is the best schema’s performance). Usually, under-achieving is penalized more than over-achieving is rewarded. In our case, the weight values were set to

w_{{GH}_{+}} = w_{{MGH}_{+}} = 10

, and

w_{{GH}_{-}} = w_{{MGH}_{-}} = 1

.

Finally, we have to choose an optimization method to perform our meta-optimization procedure. We are searching for a global minimum in eight-dimensional parameter space. Therefore, a global optimization method with proven convergence can do the job. We chose PSADE [21], a parallel version of [25] since it is available in the PyOPUS Python package [26]. Among others, the PyOPUS package includes optimization algorithms (original NMA included), parallel processing support, and benchmark problems (GH, MGH, and CUTEr problems included), all the ingredients needed in our meta-optimization procedure. It can be found in the Python Package Index (PyPI) software repository. The PSADE method exhibited good performance on global benchmark functions as well as on real optimization problems [27,28,29]. Further, PSADE is an asynchronous global method achieving speedups up to the number of slave computational cores when run in parallel. We ran PSADE on a cluster of 25 personal computers. Instead of PSADE, one of a plethora of newer global methods, e.g., [30,31,32,33], could be used. However, besides faster convergence, we do not expect significantly better results.

A meta-optimization search for a better NMA parameter schema requires significant computational power. An individual SC, represented by eight meta-optimization parameter values

c_{0 α} \dots c_{1 δ}

, has to be evaluated against reference parameter schemas in every meta-optimization iteration. One SC evaluation requires as many Nelder–Mead runs as there are problems included in the objective function evaluation. In general, a single Nelder–Mead run stops when the termination criteria are achieved, e.g., when the simplex becomes flat or shrinks below the tolerance. However, additional improvement is possible if the procedure runs further. When driven beyond tolerances, a non-convergent SC may become convergent, although rather slow. Gao and Han [15], Kumar and Suri [19], and Mehta [20] all set an absolute limit to the number of objective function evaluations to

10^{6}

, i.e., 9900–90,909 simplex gradient estimates. However, in their results, 1000–2000 estimates are needed on average for the NMA using an adaptive parameter schema to converge. They set termination tolerances

T o l_{f}

,

T o l_{X}

in range

10^{- 10}

–

10^{- 4}

. Therefore, after some preliminary tests, we set

κ_{\max}

to 5000. More would be better. And—after some preliminary experimental optimization runs—we established that around

10^{6}

meta-optimization iterations are required to achieve convergence in an unconstrained eight-dimensional parameter space using the robust PSADE global optimization method [21]. Again, the more, the better. Thus, the total number of required objective function evaluations can be estimated as the number of simplex gradient estimates per benchmark problem times the number of simplex vertices summed over all benchmark problems times the number of meta-optimization iterations. For GH and MGH benchmarks altogether, this is over

10^{13}

evaluations.

Therefore, brute force is not very promising. Instead, we tuned the meta-optimization parameters by conducting a series of shorter meta-optimization runs. We varied parameter space constraints, NMA termination tolerances

T o l_{f}

and

T o l_{X}

, the gradient estimates limit

κ_{\max}

per Nelder–Mead run, meta-optimization iteration limit, and we also experimented with the objective function definition at the beginning. By observing the results, we gradually eliminated parts that were not essential, e.g., setting NMA termination tolerances

T o l_{f}

and

T o l_{X}

too low can significantly extend the meta-optimization procedure without producing a significantly better result. The final meta-optimization parameter values are as follows: the parameter space is a discrete eight-dimensional box with

0.01

grid and constraints set to

{[c_{0 α}, c_{1 α}, c_{0 β}, c_{1 β}, c_{0 γ}, c_{1 γ}, c_{0 δ}, c_{1 δ}]}^{T} \in [[0.80, 1.20],

[0.20, 0.60],

[0.85, 1.25],

[0.35, 0.75],

[0.65, 1.05],

[- 0.50, - 0.10],

[0.05, 0.45],

{[- 0.40, 0.00]]}^{T}

. NMA termination tolerances are set to

T o l_{f} = T o l_{X} = 10^{- 4}

. The simplex gradient estimates limit was set to

κ_{\max} = 5000

, enough to catch degenerated SCs. The final objective function formulation is given in (8). Meta-optimization iteration limit was set to

10^{6}

.

The final meta-optimized parameter values, i.e., the global minimum of (8), are

\begin{matrix} c_{0 α} = 1.02, & c_{1 α} = 0.31, & c_{0 β} = 1.06, & c_{1 β} = 0.53, \\ c_{0 γ} = 0.82, & c_{1 γ} = - 0.27, & c_{0 δ} = 0.28, & c_{1 δ} = - 0.19 . \end{matrix}

(9)

This is the global minimum when (5) is used as the adaptive parameter schema. We chose (5) as the adaptive parameter schema because of its similarity to the existing schemas.

4. Evaluation of the Optimized Schema with Discussion

In this section, we present and discuss the properties of the optimized schema (9). We analyze its performance and compare it with the other schemas, (1) and (2). Since the schema is a result of the meta-optimization procedure, we do not deal with the issue of why the schema performs well. A mathematical evaluation of an arbitrary schema is given with the objective function definition (8). However, an analytical solution to the defined meta-optimization problem exceeds the scope of this paper.

The optimized parameter schema (9) is compared with the original NMA (1) and the existing adaptive parameter schemas (2), GHS, KSS, CCS, and CRS, in Figure 2. We can observe that the optimized schema follows the same pattern as the other schemas. However, the optimized schema approaches its high-dimensional value faster which is the consequence of smaller

| c_{1 p} / c_{0 p} |

ratios. Otherwise, the optimized schema curves do not significantly deviate from the others. Shrink parameter

δ

is an exception. While

δ

in all the other adaptive schemas approaches 1, the optimized schema’s high-dimensional

δ

value is

c_{0 δ} = 0.28

which is far lower, even than

0.5

of the original NMA. On the other hand, the shrink parameter turns out to be insignificant for GH and MGH benchmarks. The fraction of shrink steps is at most 0.5% for Penalty I and Penalty II problems. However, the fraction of shrink steps in the optimized schema is 0.0% for all GH and MGH benchmarks.

To evaluate the obtained schema further, we compare it in terms of accuracy and speed. Note that the meta-optimization objective function (8) in combination with NMA termination tolerances rewards speed. Therefore, accuracy was not a subject of meta-optimization.

Table 1 shows accuracy results for a GH modified quadratic function (10) up to problem dimension

n_{p} = 100

. Parameter

ϵ \geq 0

defines condition number of matrix

D

, and

σ \geq 0

specifies deviation from quadratic form. The minimum value of (10) is zero (

{min}_{x \in R^{n_{p}}} f (x) = 0

) for any

ϵ

,

σ

, or

n_{p}

. An individual Nelder–Mead run was stopped after

κ_{\max}

= 25,000 simplex gradient estimates, that is, after

N_{eval} = 25, 000 (n_{p} + 1)

objective function evaluations. We applied no tolerance-based termination criteria, i.e.,

T o l_{f} = T o l_{X} = 0

. The table shows the lowest achieved objective function values. A schema is considered to be accurate if the achieved minimum value is correct to at least six decimal places, i.e., when

f (x) <

5 × 10

^{- 7}

. The accurate schema values are shown in bold. If a schema does not converge according to condition (7) for

τ = 10^{- 7}

, the value is shown in italics.

\begin{matrix} f (x) = x^{T} D x + σ {(x^{T} B x)}^{2} \\ D = diag [(1 + ϵ), {(1 + ϵ)}^{2}, \dots {(1 + ϵ)}^{n_{p}}], B = U^{T} U, U = [\begin{matrix} 1 & \dots & 1 \\ ⋱ & ⋮ \\ 1 \end{matrix}] \end{matrix}

(10)

Given 25,000 simplex gradient estimates, the optimized schema is accurate and convergent for all 40 GH benchmark problems, as all the existing schemas also are. As expected, the original NMA encounters accuracy and convergence problems in high dimensions.

Accuracy results for MGH benchmark problems [22] are shown in Table 2. The same rules (25,000 simplex gradient estimates,

T o l_{f} = T o l_{X} = 0

, etc.) apply. Minima of the used MGH functions are all zero with the exception of Penalty I and Penalty II functions. For

n_{p} = 10

, the corresponding minima are 7.0876515…×10

^{- 5}

and 0.00029366054…, respectively. Thus, the six digit accuracy criteria are given by

f_{Penalty I} (x) <

7.087655 ×10

^{- 5}

, and

f_{Penalty II} (x) <

0.0002936615.

Again, with 25,000 simplex gradient estimates available, the optimized schema is accurate and convergent for all MGH benchmarks, except for four trigonometric functions, where the result is approaching the minimum, although not reaching it. We can observe similar behavior for the other schemas as well. Furthermore, for trigonometric function benchmarks, the schemas achieved relatively small objective function value reduction, which is reflected in relation

τ \frac{f (x_{0})}{f_{L}} ≪ 1

. Consequently,

f_{L} ≫ τ (f (x_{0}) - f_{L})

, and convergence condition (7) degenerates into

f (x) ≲ f_{L}

. Strictly following the convergence condition, only schemas reaching the lowest value

f_{L}

are considered convergent. Nevertheless, the optimized schema managed to produce the lowest objective function value in all trigonometric function benchmarks, except one.

Convergence speed of the obtained schema (9) is compared to the other schemas (1) and (2) using data profiles (6) and (7) [24]. Figure 3 shows data profiles for GH and MGH benchmark sets separately and combined. The profiles are calculated at

τ = 10^{- 7}

with

κ_{\max}

= 25,000 simplex gradient estimates and without tolerance-based algorithm termination (

T o l_{f} = T o l_{X} = 0

). The graphs are shown for up to 15,000 simplex gradient estimates.

Graphs in Figure 3 reveal a slight advantage in terms of speed for the optimized schema over the existing adaptive parameter schemas. The schemas perform similarly when only GH benchmarks are considered. Although the optimized schema can be declared as the fastest (solves the highest percentage of problems at almost any given

κ

), the remaining schemas quickly follow. All the adaptive schemas solve 100% of GH problems after approximately 2500 simplex gradient estimates. The CCS is the first to achieve this goal at ~2100 simplex gradient estimates. The original NMA, on the other hand, manages to be competitive with the adaptive schemas for the first 10% of problems. With the lower-dimensional problems solved, it starts to lag, finally solving less than 30% of GH problems.

The advantage of the optimized schema becomes notable when observing the MGH benchmark set. At ~1600 simplex gradient estimates, the optimized schema solves more than 90% of the problems while the remaining adaptive schemas solve up to 83%, and the original NMA merely 50% of the problems. It is clearly the fastest schema and has the highest final percentage of solved problems (98%). Other adaptive schemas solve up to 93%, and the original NMA solves 59% of the problems.

The optimized schema remains the fastest when all GH and MGH benchmark problems are considered. Its advantage over the existing adaptive schemas is 6% at 1000, 5% at 2000, and 3% at 3000 simplex gradient estimates. The original NMA manages to keep up for less than 20% of the problems at ~150 simplex gradient estimates. The final percentage of solved problems is 99% for the optimized schema, up to 97% for the remaining adaptive schemas, and 44% for the original NMA.

In Figure 4, the same measurement of convergence speed is repeated with tolerance based algorithm termination set to

T o l_{f} = T o l_{X} = 10^{- 4}

. The profiles are once more calculated for

τ = 10^{- 7}

. The maximum number of simplex gradient estimates

κ_{\max}

= 25,000 does not play any role in this experiment because none of the individual Nelder–Mead runs ever uses the entire budget of available simplex gradient estimates. The algorithm is always terminated earlier by the tolerance-based criterion. The graphs are shown for up to 8000 simplex gradient estimates. No further progress is made beyond that point.

With tolerance-based termination enabled, the optimized schema performs even better than the other schemas. When we consider only GH benchmarks, the schemas again perform similarly, although, in general, the optimized schema is still slightly faster. The CCS is the first that solves 100% of GH problems in ~2100 estimates. Other adaptive schemas quickly follow. The only notable alteration can be observed for the original NMA which now performs worse and solves less than 20% of GH problems. The original NMA manages to converge in an additional ~10% of GH problems when it is allowed to run beyond the tolerance-based stopping criterion.

The advantage of the optimized schema becomes apparent in MGH data profiles. The KSS, CCS, and CRS keep pace for up to ~370 simplex gradient estimates where ~70% of the MGH problems are solved. The GHS starts to lag at ~240 estimates with ~40% of the problems solved. It catches up at 3000 estimates, finally solving 70% of the MGH problems. KSS, the best of the existing adaptive schemas, ends at 71%. The optimized schema is clearly better with 80% of the problems solved in 730 estimates, ending with 82% at 1660 estimates. The original NMA again performs worse compared to the run without tolerance-based termination, ending with 23% of solved problems.

For all-inclusive GH and MGH benchmark data profiles, the optimized schema starts to stand out from the rest of the existing adaptive schemas at ~60% of solved problems, achieved within ~300 simplex gradient estimates. The optimized schema reaches its final result, i.e., 90% of solved problems, in 2400 estimates. Its advantage over the best existing adaptive schema, i.e., KSS, is 6%. The KSS comes to 84% in 7020 estimates. As expected, the original NMA ends at a modest 21% of solved problems within 1200 estimates.

Figure 5 shows our last convergence speed measurement on GH, MGH, and CUTEr benchmark problems, 169 problems in total. The data profiles are shown for the CUTEr benchmark set, and GH, MGH, and CUTEr benchmark sets combined. They are calculated at convergence condition tolerance

τ = 10^{- 7}

with

κ_{\max}

= 25,000 simplex gradient estimates limit. Cases without the tolerance-based algorithm termination (

T o l_{f} = T o l_{X} = 0

), and with it (

T o l_{f} = T o l_{X} = 10^{- 4}

), are shown. When the tolerance-based criterion is applied, the Nelder–Mead runs are always terminated before the limit of

κ_{\max}

simplex gradient estimates is reached.

Data profiles in Figure 5 confirm our previous observations. Although not meta-optimized on CUTEr benchmark problems, the optimized schema solves the highest percentage of the problems in all shown cases at any given

κ

. It makes a difference of 4 to 6% compared to the first follower at ~400 simplex gradient estimates when tolerance-based algorithm termination is applied, and at ~1100 estimates when it is not. The optimized schema reaches or comes close to its final result in ~4000 estimates. It solves 91 to 97% of problems, GHS 84 to 93%, CRS 84 to 93%, KSS 86 to 91%, CCS 81 to 88%, and the original NMA manages 13 to 36%.

Besides the 40 GH and 46 MGH benchmarks, the following problems are included in data profiles in Figure 5: BrownAlmostLinear with dimensions

n_{p} = {20, 30, 40, 50, 70, 100}

from MGH set, HilbertQuadratic with dimensions

n_{p} = {10, 30, 60, 90}

, OrenPower [34] with dimensions

n_{p} = {30, 50, 60, 70, 80, 90, 100}

, and ARWHEAD_100, DQDRTIC_50, DQDRTIC_100, SPARSINE_50, SPARSINE_100, CHNROSNB_25, CHNROSNB_50, SCOSINE_10, LIARWHD_100, FLETCHBV_100, DIXON3DQ_100, OSCIGRAD_25, OSCIGRAD_100, NONCVXUN_10, NONCVXUN_100, PENALTY1_50, PENALTY1_100, SINQUAD_50, SINQUAD_100, FLETCBV3_100, PENALTY2_100, TOINTGSS_50, TOINTGSS_100, ARGLINC_50, EXTROSNB_100, COSINE_100, TRIDIA_50, TRIDIA_100, NONDQUAR_100, QUARTC_25, QUARTC_100, FREUROTH_100, WATSON_31, ERRINROS_25, ERRINROS_50, NONDIA_20, NONDIA_30, NONDIA_50, NONDIA_90, NONDIA_100, MANCINO_20, MANCINO_30, DQRTIC_100, ENGVAL1_50, ENGVAL1_100, HILBERTA_10, FLETCBV2_100, TQUARTIC_10, EDENSCH_36, ARGLINA_50, ARGLINA_100, BOX_100, POWELLSG_36, POWELLSG_40, POWELLSG_60, POWELLSG_80, POWELLSG_100, POWER_75, POWER_100, HILBERTB_50, ARGLINB_50, MOREBV_50, BDQRTIC_100, SCURLY10_100, VAREIGVL_49, VAREIGVL_99 from CUTEr benchmark problem set.

Speed of convergence of a particular adaptive parameter schema is mirrored in the simplex’s best value descent during an individual Nelder–Mead run. The descent rate can be expressed by

cos θ

, where

θ

is an angle between the search direction

d

and the gradient of the objective function

\nabla f (x)

:

cos θ = \frac{d^{T} \nabla f (x)}{| d | | \nabla f (x) |} .

(11)

Search direction

d

is locally descending when

cos θ < 0

. The fastest descent is achieved at

cos θ = - 1

. According to the NMA definition, the search direction is

d = c (P_{n_{p}}

−

\bar{P})

, where c is the reflection (

α

), expansion (

β

), or contraction (

γ

) NMA parameter. The descent rate in a non-shrinking NMA iteration is therefore calculated as

cos θ = \frac{{(P_{n_{p}} - \bar{P})}^{T} \nabla f (P_{n_{p}})}{| P_{n_{p}} - \bar{P} | | \nabla f (P_{n_{p}}) |} .

(12)

The simplex’s best value descents and corresponding descent rates for

n_{p} =

100-dimensional GH benchmark problems are shown in Figure 6. The figure shows that all the existing schemas as well as the optimized adaptive schema manage to maintain some level of descent during the entire Nelder–Mead run. A higher descent rate, i.e., more negative

cos θ

, ensures faster objective function descent and fulfillment of the termination criteria. The optimized schema is the fastest or near fastest in all shown cases except for the

ϵ = 0.0

,

σ = 0.0001

case. This is partly reflected in the poorer descent rate of the optimized schema for that particular case.

The tolerance boundary intersections shown in Figure 6 are

t_{p s}

values from (6). Data profiles in Figure 3, Figure 4 and Figure 5 summarize tolerance boundary intersections over the entire benchmark problem set.

The absence of a sufficient descent rate is fatal for the original NMA. The original NMA manages some slow descent only in the

ϵ = 0.0

,

σ = 0.0

case. In all remaining cases,

cos θ

approaches

0^{\circ}

. Search direction

d

becomes orthogonal to the negative gradient which was first observed by Torczon [12]. As a consequence, the original NMA stops descending and does not achieve the convergence boundary.

In [4], the authors prove that NMA does not perform shrink iteration when the objective function is strictly convex. Furthermore, for a uniformly convex objective function, the descent rate is provided by expansion and contraction iterations [15], although the effect diminishes with problem dimension

n_{p}

. In other words, to maintain a sufficient descent rate, an adequate share of expansion and contraction iterations is needed. Note that a uniformly convex function is also strictly convex. Since the modified quadratic function (10) is uniformly convex, the above applies to the GH benchmark set. The share of non-reflection iterations, i.e., expansion and contraction iterations combined, is shown in Figure 7. In general, it declines with the problem dimension for all schemas and (

ϵ

,

σ

) pairs. Nevertheless, all the adaptive parameter schemas manage to keep the share above 5%, which provides a sufficient descent rate. The CCS stands out with its lowest non-reflection share above 35%, yet, such a high share is not reflected in better performance. The non-reflection share alone, therefore, does not guarantee high convergence speed. The lowest non-reflection share of the optimized schema is 12% at

n_{p} = 100

in the

ϵ = 0.0

,

σ = 0.0001

case. The original NMA’s share is, on the other hand, never greater than 26% (which in turn is achieved for lower-dimensional problems). With problem dimension increase, it quickly drops as low as to 0.56% in the worst case, which confirms the poor performance and convergence problems of the original NMA schema (1).

5. Conclusions

Adaptive parameter schemas address poor NMA performance on high-dimensional problems. We used a meta-optimization procedure to find a novel adaptive parameter schema presented in this paper. Although the meta-optimization problem seems simple, brute force optimization is not feasible due to the immense computing power required. To set up the problem, we defined a mathematical formulation of the adaptive parameter schema and an objective function evaluating a schema’s performance. We tuned the meta-optimization parameters in a series of shorter meta-optimization runs. The final settings constrain the meta-optimization parameter space, define a single NMA run termination criteria to evaluate an SC’s performance, limit the number of NMA iterations to catch non-convergent SCs, and limit the number of meta-optimization iterations. We used PSADE, a robust global parallel asynchronous method.

The performance of the proposed adaptive parameter schema is discussed and compared with the existing schemas. The share of non-reflection iterations and the descent rate do not show any significant deviation of the proposed schema from the existing ones. However, data profiles on GH modified quadratic, MGH, and CUTEr benchmark problems show that the proposed schema outperforms the existing ones in both accuracy and convergence speed. We performed the evaluation with and without tolerance-based termination of the NMA.

The proposed schema is a result of a meta-optimization procedure. We evaluate its performance but, on the other hand, provide no mathematical explanation for why the schema performs so well. The proposed schema is the global minimum determined by the schema’s mathematical formulation and meta-optimization objective function definition.

Author Contributions

Conceptualization, Á.B. and J.P.; methodology, Ž.R. and J.P.; software, J.O. and J.P.; validation, Á.B.; formal analysis, J.P.; investigation, Ž.R., J.O. and J.P.; data curation, J.P.; writing—original draft preparation, J.P.; writing—review and editing, Á.B. and J.P.; supervision, Á.B. and T.T.; project administration, T.T.; funding acquisition, T.T. All authors have read and agreed to the published version of the manuscript.

Funding

The authors acknowledge the financial support from the Slovenian Research Agency (research core funding No. P2-0246 ICT4QoL—Information and Communications Technologies for Quality of Life).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NMA	Nelder–Mead Algorithm
PSADE	Parallel Simulated Annealing with Differential Evolution
SC	Schema Candidate
GH	Gao–Han
MGH	Moré–Garbow–Hilstrom
CUTEr	Constrained and Unconstrained Testing Environment, revisited
GHS	Gao–Han Schema
KSS	Kumar–Suri Schema
CCS	Chebyshev Crude Schema
CRS	Chebyshev Refined Schema
PyPI	Python Package Index
f	objective function
$n_{p}$	number of optimized variables or problem dimension
$P_{i}$ , $\bar{P}$	ith simplex vertex and centroid of simplex vertices
$P_{r}$ , $P_{e}$	reflected point and expanded point
$P_{r c}$ , $P_{n c}$	reflected point and worst point contracted towards centroid
$α$ , $β$ , $γ$ , $δ$	NMA reflection, expansion, contraction, and shrink parameters
$n_{c}$	CRS constant calculated from $n_{p}$
$x_{0}$ , $x$	starting point, an arbitrary point in $n_{p}$ -dimensional parameter space
$ϵ_{i}$ , $e_{i}$	ith component Pfeffer’s constant and unit vector
$T o l_{f}$ , $T o l_{X}$	simplex flatness and size tolerances
$P_{i j}$	jth component of ith simplex vertex
$c_{0 α}$ , $c_{1 α}$ , $c_{0 β}$ , $c_{1 β}$ , etc.	meta-optimization variables defining an SC
p, $P$	single benchmark problem and set of benchmark problems
s, $S$	single parameter schema and set of all parameter schemas
r, $R$	reference parameter schema and set of all reference parameter schemas
$X$	set of sets of GH and MGH benchmark problems
$κ$	number of simplex gradient estimates
$κ_{\max}$	$κ$ available for schema evaluation per single p
$t_{p s}$	number of objective function evaluations needed on problem p by schema s to satisfy (7)
$d_{s P} (κ)$	share of problems from set $P$ solved by schema s in $κ$ simplex gradient estimates
$f_{L}$	lowest objective function value reached in $κ_{\max}$ simplex gradient estimates by any of the schemas $s \in S$ on a particular problem
$τ$	convergence condition tolerance
$w_{P_{+}}$	weight used in meta-optimization objective function when at least one of the reference schemas outperforms the evaluated SC on set of benchmark problems $P$
$w_{P_{-}}$	weight used in meta-optimization objective function when the evaluated SC outperforms all the reference schemas on set of benchmark problems $P$

References

Nelder, J.A.; Mead, R. A simplex method for function minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
McKinnon, K.I.M. Convergence of the Nelder-Mead simplex method to a non-stationary point. J. Optim. 1998, 9, 148–158. [Google Scholar]
Galántai, A. A convergence analysis of the Nelder-Mead simplex method. Acta Polytech. Hungarica 2021, 18, 93–105. [Google Scholar] [CrossRef]
Lagarias, J.C.; Reeds, J.A.; Wright, M.H.; Wright, P.E. Convergence properties of the Nelder-Mead simplex method in low dimensions. J. Optim. 1998, 9, 112–147. [Google Scholar] [CrossRef] [Green Version]
Lagarias, J.C.; Poonen, B.; Wright, M.H. Convergence of the restricted Nelder-Mead algorithm in two dimensions. J. Optim. 2012, 22, 501–532. [Google Scholar] [CrossRef] [Green Version]
Kelley, C.T. Detection and remediation of stagnation in the Nelder-Mead algorithm using a sufficient decrease condition. J. Optim. 1999, 10, 43–55. [Google Scholar] [CrossRef]
Tseng, P. Fortified-descent simplicial search method: A general approach. J. Optim. 1999, 10, 269–288. [Google Scholar] [CrossRef]
Nazareth, L.; Tseng, P. Gilding the lily: A variant of the Nelder-Mead algorithm based on golden-section search. Comput. Optim. Appl. 2002, 22, 133–144. [Google Scholar] [CrossRef]
Price, C.J.; Coope, I.D.; Byatt, D. A convergent variant of the Nelder-Mead algorithm. J. Optim. Theory. Appl. 2002, 113, 5–19. [Google Scholar] [CrossRef] [Green Version]
Bűrmen, Á.; Puhan, J.; Tuma, T. Grid restrained Nelder-Mead algorithm. Comput. Optim. Appl. 2006, 34, 359–375. [Google Scholar] [CrossRef] [Green Version]
Bűrmen, Á.; Tuma, T. Unconstrained derivative-free optimization by successive approximation. Comput. Appl. Math. 2009, 223, 62–74. [Google Scholar] [CrossRef] [Green Version]
Torczon, V.J. Multi-Directional Search: A Direct Search Algorithm for Parallel Machines. Ph.D. Thesis, Rice University, Houston, TX, USA, 1989. [Google Scholar]
Wright, M. Direct search methods: Once scorned, now respectable. In Proceedings of the 16th Dundee Biennial Conference in Numerical Analysis, Dundee, Scotland, 27–30 June 1996; pp. 191–208. [Google Scholar]
Han, L.; Neumann, M. Effect of dimensionality on the Nelder-Mead simplex method. Optim. Methods Softw. 2006, 21, 1–16. [Google Scholar] [CrossRef]
Gao, F.; Han, L. Implementing the Nelder-Mead simplex algorithm with adaptive parameters. Comput. Optim. Appl. 2012, 51, 259–277. [Google Scholar] [CrossRef]
Fajfar, I.; Puhan, J.; Bűrmen, Á. Evolving a Nelder-Mead algorithm for optimization with genetic programming. Evol. Comput. 2017, 25, 351–373. [Google Scholar] [CrossRef] [PubMed]
Musafer, H.A.; Mahmood, A. Dynamic Hassan–Nelder-Mead with simplex free selectivity for unconstrained optimization. IEEE Access 2018, 6, 39015–39026. [Google Scholar] [CrossRef]
Fajfar, I.; Bűrmen, Á.; Puhan, J. The Nelder-Mead simplex algorithm with perturbed centroid for high-dimensional function optimization. Optim. Lett. 2019, 13, 1011–1025. [Google Scholar] [CrossRef]
Kumar, G.N.S.; Suri, V.K. Multilevel Nelder-Mead’s simplex method. In Proceedings of the 2014 9th International Conference on Industrial and Information Systems (ICIIS), Gwalior, India, 15–17 December 2014; Volume 9, pp. 1–6. [Google Scholar]
Mehta, V.K. Improved Nelder-Mead algorithm in high dimensions with adaptive parameters based on Chebyshev spacing points. Eng. Optim. 2020, 52, 1814–1828. [Google Scholar] [CrossRef]
Olenšek, J.; Tuma, T.; Puhan, J.; Bűrmen, Á. A new asynchronous parallel global optimization method based on simulated annealing and differential evolution. Appl. Soft Comput. 2011, 11, 1481–1489. [Google Scholar] [CrossRef]
Moré, J.J.; Garbow, B.S.; Hilstrom, K.E. Testing unconstrained optimization software. ACM Trans. Math. Softw. 1981, 7, 17–41. [Google Scholar] [CrossRef]
Gould, N.I.M.; Orban, D.; Toint, P.L. CUTEr and SifDec: A constrained and unconstrained testing environment, revisited. ACM Trans. Math. Softw. 2003, 29, 373–394. [Google Scholar] [CrossRef] [Green Version]
Moré, J.J.; Wild, S.M. Benchmarking derivative-free optimization algorithms. SIAM J. Optim. 2009, 20, 172–191. [Google Scholar] [CrossRef] [Green Version]
Olenšek, J.; Bűrmen, Á.; Puhan, J.; Tuma, T. DESA: A new hybrid global optimization method and its application to analog integrated circuit sizing. J. Glob. Optim. 2009, 44, 53–77. [Google Scholar] [CrossRef]
Bűrmen, Á. PyOPUS-Simulation, Optimization, and Design. Available online: http://fides.fe.uni-lj.si/pyopus (accessed on 15 May 2022).
Bűrmen, B.; Locatelli, I.; Bűrmen, Á.; Bogataj, M.; Mrhar, A. Mathematical modeling of individual gastric emptying of pellets in the fed state. J. Drug. Deliv. Sci. Technol 2014, 24, 418–424. [Google Scholar] [CrossRef]
Rojec, Ž.; Olenšek, J.; Fajfar, I. Analog circuit topology representation for automated synthesis and optimization. Inf. MIDEM 2018, 48, 29–40. [Google Scholar]
Rojec, Ž.; Bűrmen, Á.; Fajfar, I. Analog circuit topology synthesis by means of evolutionary computation. Eng. Appl. Artif. Intell. 2019, 80, 48–65. [Google Scholar] [CrossRef]
Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A.H. CCSA: Conscious Neighborhood-based Crow Search Algorithm for Solving Global Optimization Problems. Appl. Soft Comput. 2019, 85, 105583. [Google Scholar] [CrossRef]
Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A.H. QANA: Quantum-based avian navigation optimizer algorithm. Eng. Appl. Artif. Intell. 2021, 104, 104314. [Google Scholar] [CrossRef]
Nadimi-Shahraki, M.H.; Zamani, H. DMDE: Diversity-maintained multi-trial vector differential evolution algorithm for non-decomposition large-scale global optimization. Expert Syst. Appl. 2022, 98, 116895. [Google Scholar] [CrossRef]
Zamani, H.; Nadimi-Shahraki, M.H.; Gandomi, A.H. Starling murmuration optimizer: A novel bio-inspired algorithm for global and engineering optimization. Comput. Methods Appl. Mech. Engrg. 2022, 392, 114616. [Google Scholar] [CrossRef]
Shanno, D.F.; Phua, K. Matrix conditioning and nonlinear optimization. Math. Program. 1978, 14, 149–160. [Google Scholar] [CrossRef]

Figure 1. Parameter schema functions for the original Nelder–Mead Algorithm (NMA), Gao–Han Schema (GHS), Kumar–Suri Schema (KSS), Chebyshev Crude Schema (CCS), and Chebyshev Refined Schema (CRS).

Figure 2. Comparison of NMA adaptive schema parameters.

Figure 3. Data profiles for GH and MGH benchmark sets, separated and combined. No tolerance-based algorithm termination was applied (

T o l_{f} = T o l_{X} = 0

).

Figure 3. Data profiles for GH and MGH benchmark sets, separated and combined. No tolerance-based algorithm termination was applied (

T o l_{f} = T o l_{X} = 0

).

Figure 4. Data profiles for GH and MGH benchmark sets, separated and combined. Tolerance-based algorithm termination was applied (

T o l_{f} = T o l_{X} = 10^{- 4}

).

Figure 4. Data profiles for GH and MGH benchmark sets, separated and combined. Tolerance-based algorithm termination was applied (

T o l_{f} = T o l_{X} = 10^{- 4}

).

Figure 5. Data profiles for Constrained and Unconstrained Testing Environment, revisited (CUTEr) benchmark set and GH, MGH, and CUTEr benchmark sets combined, without (

T o l_{f} = T o l_{X} = 0

) and with tolerance-based algorithm termination applied (

T o l_{f} = T o l_{X} = 10^{- 4}

).

Figure 5. Data profiles for Constrained and Unconstrained Testing Environment, revisited (CUTEr) benchmark set and GH, MGH, and CUTEr benchmark sets combined, without (

T o l_{f} = T o l_{X} = 0

) and with tolerance-based algorithm termination applied (

T o l_{f} = T o l_{X} = 10^{- 4}

).

Figure 6. Best objective function values (

f (P_{0})

) and corresponding descent rates (

cos θ

) during Nelder–Mead run for

n_{p} = 100

-dimensional GH benchmarks. Tolerance-based algorithm termination applied (

T o l_{f} = T o l_{X} = 10^{- 4}

). Black line represents convergence boundary (7) for

τ = 10^{- 7}

.

Figure 6. Best objective function values (

f (P_{0})

) and corresponding descent rates (

cos θ

) during Nelder–Mead run for

n_{p} = 100

-dimensional GH benchmarks. Tolerance-based algorithm termination applied (

T o l_{f} = T o l_{X} = 10^{- 4}

). Black line represents convergence boundary (7) for

τ = 10^{- 7}

.

Figure 7. Share of expansion and contraction iterations for GH benchmark problems.

Table 1. Accuracy of the original NMA, the existing adaptive schemas (GHS, KSS, CCS, CRS), and the optimized schema on Gao–Han (GH) modified quadratic benchmark problems.

	$n_{p}$	$f {(x)}_{NMA}$	$f {(x)}_{GHS}$	$f {(x)}_{KSS}$	$f {(x)}_{CCS}$	$f {(x)}_{CRS}$	$f {(x)}_{opt .}$
$ϵ =$ 0.0	10	3.5 × 10⁻³²³	0.0	0.0	3.5 × 10⁻³²³	0.0	0.0
$σ =$ 0.0	20	2 × 10⁻³²²	0.0	0.0	0.0	0.0	10⁻³²³
	30	1.14 × 10⁻¹¹	0.0	0.0	0.0	0.0	0.0
	40	2.03 × 10⁻⁴	0.0	0.0	0.0	0.0	0.0
	50	5.54 × 10⁻⁴	0.0	0.0	0.0	0.0	0.0
	60	1.38 × 10⁻⁵	0.0	0.0	5 × 10⁻³²⁴	0.0	5 × 10⁻³²⁴
	70	5.76 × 10⁻⁵	0.0	0.0	0.0	0.0	0.0
	80	4.87 × 10⁻⁶	5 × 10⁻³²³	0.0	0.0	0.0	0.0
	90	2.75 × 10⁻⁶	1.4 × 10⁻³²²	0.0	5 × 10⁻³²⁴	0.0	0.0
	100	3.19 × 10⁻⁶	6 × 10⁻³²³	0.0	2 × 10⁻³²³	0.0	0.0
$ϵ =$ 0.05	10	0.0	0.0	0.0	0.0	0.0	0.0
$σ =$ 0.0	20	6.23 × 10⁻³²²	0.0	0.0	0.0	5 × 10⁻³²⁴	0.0
	30	5.31 × 10⁻³	0.0	0.0	0.0	0.0	5 × 10⁻³²⁴
	40	1.32 × 10⁻²	0.0	0.0	0.0	0.0	0.0
	50	1.62 × 10⁻¹	0.0	0.0	0.0	0.0	0.0
	60	12.7	0.0	0.0	0.0	0.0	0.0
	70	8.24	2 × 10⁻³²³	0.0	0.0	0.0	0.0
	80	32.2	1.24 × 10⁻³²²	0.0	5 × 10⁻³²⁴	0.0	0.0
	90	3.77	5.4 × 10⁻³²³	5 × 10⁻³²⁴	5 × 10⁻³²⁴	0.0	0.0
	100	278	6 × 10⁻³²³	0.0	10⁻³²³	0.0	0.0
$ϵ =$ 0.0	10	0.0	0.0	0.0	0.0	0.0	0.0
$σ =$ 0.0001	20	2.05 × 10⁻³	0.0	0.0	0.0	0.0	0.0
	30	1.91 × 10⁻⁵	0.0	0.0	0.0	0.0	0.0
	40	16.7	0.0	0.0	0.0	0.0	0.0
	50	2.63	0.0	0.0	0.0	0.0	0.0
	60	10.9	0.0	0.0	10⁻³²³	0.0	0.0
	70	276	5 × 10⁻³²⁴	0.0	0.0	0.0	0.0
	80	292	8 × 10⁻³²³	0.0	0.0	0.0	0.0
	90	11.7	6 × 10⁻³²³	5 × 10⁻³²⁴	5 × 10⁻³²⁴	0.0	0.0
	100	48.7	7.4 × 10⁻³²³	0.0	10⁻³²³	0.0	0.0
$ϵ =$ 0.05	10	1.5 × 10⁻³²³	0.0	0.0	0.0	0.0	0.0
$σ =$ 0.0001	20	1.93 × 10⁻⁴	0.0	0.0	0.0	5 × 10⁻³²⁴	5 × 10⁻³²⁴
	30	1.12 × 10⁻²	0.0	0.0	0.0	0.0	5 × 10⁻³²⁴
	40	7.31 × 10⁻¹	0.0	0.0	0.0	0.0	0.0
	50	37.2	0.0	0.0	0.0	0.0	5 × 10⁻³²⁴
	60	179	0.0	0.0	0.0	0.0	0.0
	70	18.7	3 × 10⁻³²³	0.0	0.0	0.0	0.0
	80	16.4	7.4 × 10⁻³²³	0.0	5 × 10⁻³²⁴	0.0	0.0
	90	1480	1.3 × 10⁻³²²	1.5 × 10⁻³²³	0.0	0.0	0.0
	100	3802	5 × 10⁻³²³	0.0	10⁻³²³	0.0	0.0
accurate		7/40	40/40	40/40	40/40	40/40	40/40

Table 2. Accuracy of the original NMA, the existing adaptive schemas (GHS, KSS, CCS, CRS), and the optimized schema on Moré–Garbow–Hilstrom (MGH) benchmark problems.

Function	$n_{p}$	$f {(x)}_{NMA}$	$f {(x)}_{GHS}$	$f {(x)}_{KSS}$	$f {(x)}_{CCS}$	$f {(x)}_{CRS}$	$f {(x)}_{opt .}$
Extended	12	2.91 × 10⁻²⁸	2.35 × 10⁻²⁹	1.73 × 10⁻²⁸	4.64 × 10⁻²⁹	5.18 × 10⁻²⁹	6.88 × 10⁻²⁹
Rosenbrock	18	20.0	6.97 × 10⁻²⁹	6.51 × 10⁻²⁹	1.44 × 10⁻²⁸	5.66 × 10⁻²⁸	1.35 × 10⁻²⁸
	24	12.5	1.72 × 10⁻²⁸	2.58 × 10⁻²⁸	1.86 × 10⁻²⁸	3.72 × 10⁻²⁸	6.21 × 10⁻²⁸
	30	34.5	4.09 × 10⁻²⁸	6.70 × 10⁻²⁸	7.28 × 10⁻²⁸	3.70 × 10⁻²⁷	2.76 × 10⁻²⁸
	36	49.1	8.72 × 10⁻²⁸	6.64 × 10⁻²⁸	6.81 × 10⁻²⁸	4.77 × 10⁻²⁸	1.11 × 10⁻²⁷
Extended	12	8.34 × 10⁻⁵⁵	3.33 × 10⁻⁵⁷	7.09 × 10⁻⁵⁹	3.09 × 10⁻⁵⁷	1.07 × 10⁻⁵⁹	1.18 × 10⁻⁵⁸
Powell	24	1.33 × 10⁻⁹	1.83 × 10⁻⁵⁴	3.45 × 10⁻⁵⁶	5.37 × 10⁻⁵⁶	1.67 × 10⁻⁵³	3.39 × 10⁻⁵³
singular	40	1.69 × 10 $^{- 6}$	1.06 × 10⁻⁵⁰	2.34 × 10⁻⁵²	1.46 × 10⁻⁵²	5.22 × 10⁻⁵²	2.33 × 10⁻⁵³
	60	4.16 × 10⁻⁴	9.71 × 10 $^{- 6}$	3.43 × 10⁻⁵⁰	2.88 × 10⁻⁵²	1.28 × 10⁻³⁷	1.16 × 10⁻⁴⁶
Penalty I	10	7.57 × 10⁻⁵	7.09 × 10⁻⁵	7.09 × 10⁻⁵	7.60 × 10 $^{- 5}$	7.09 × 10⁻⁵	7.09 × 10⁻⁵
Penalty II	10	2.98 × 10⁻⁴	2.94 × 10⁻⁴	2.94 × 10⁻⁴	2.98 × 10⁻⁴	2.95 × 10⁻⁴	2.94 × 10⁻⁴
Variably	12	4.77	3.72 × 10⁻³⁰	1.47 × 10⁻²⁹	3.64 × 10⁻²⁹	2.30 × 10⁻²⁹	1.78 × 10⁻²⁹
dimensioned	18	4.22	8.96 × 10⁻³⁰	2.06 × 10⁻²⁹	1.52 × 10⁻²⁹	4.74 × 10⁻²⁹	4.25 × 10⁻²⁹
	24	11.5	8.22 × 10⁻²⁹	8.37 × 10⁻²⁹	7.52 × 10⁻²⁹	9.23 × 10⁻²⁹	2.27 × 10⁻²⁸
	30	40.5	8.08 × 10⁻²⁹	1.08 × 10⁻²⁸	1.06 × 10⁻²⁸	3.38 × 10⁻²⁸	4.49 × 10⁻²⁸
	36	60.1	4.21 × 10⁻²⁸	1.46 × 10⁻²⁸	8.82 × 10⁻²⁹	8.35 × 10⁻²⁸	7.60 × 10⁻²⁸
Trigonometric	10	2.80 × 10 $^{- 5}$	2.80 × 10⁻⁵	2.80 × 10⁻⁵	2.80 × 10⁻⁵	2.80 × 10⁻⁵	2.80 × 10⁻⁵
	20	1.35 × 10⁻⁶	1.35 × 10⁻⁶	6.03 × 10⁻⁶	6.86 × 10⁻⁶	1.35 × 10⁻⁶	1.35 × 10⁻⁶
	30	2.20 × 10⁻⁵	9.90 × 10⁻⁷	9.90 × 10⁻⁷	5.65 × 10⁻⁶	9.90 × 10⁻⁷	5.98 × 10⁻⁷
	40	1.41 × 10⁻⁵	1.55 × 10⁻⁶	3.95 × 10⁻⁶	1.68 × 10⁻⁷	5.58 × 10⁻⁷	1.55 × 10⁻⁶
	50	2.52 × 10⁻⁵	2.24 × 10⁻⁷	3.41 × 10⁻⁶	9.23 × 10⁻⁷	1.11 × 10⁻⁶	2.24 × 10⁻⁷
	60	3.87 × 10⁻⁵	8.68 × 10⁻⁷	7.57 × 10⁻⁷	7.57 × 10⁻⁷	1.27 × 10⁻⁶	4.62 × 10⁻⁷
Discrete	10	6.85 × 10⁻³²	3.03 × 10⁻³³	8.36 × 10⁻³²	2.20 × 10⁻³¹	3.07 × 10⁻³²	1.59 × 10⁻³²
boundary	20	4.69 × 10⁻³⁰	7.24 × 10⁻³²	2.51 × 10⁻³²	2.39 × 10⁻³²	1.05 × 10⁻³¹	3.92 × 10⁻³²
value	30	9.87 × 10⁻⁶	1.10 × 10⁻³¹	1.43 × 10⁻³¹	1.19 × 10⁻³¹	2.02 × 10⁻³¹	9.29 × 10⁻³²
	40	6.46 × 10⁻⁶	4.58 × 10⁻³¹	3.55 × 10⁻³¹	1.37 × 10⁻³¹	4.76 × 10⁻³¹	3.45 × 10⁻³¹
	50	5.72 × 10⁻⁶	6.02 × 10⁻³¹	5.70 × 10⁻³¹	2.84 × 10⁻³¹	5.35 × 10⁻³¹	4.35 × 10⁻³¹
	60	3.19 × 10⁻⁶	2.46 × 10⁻³⁰	8.09 × 10⁻³¹	6.64 × 10⁻³¹	1.11 × 10⁻³⁰	7.39 × 10⁻³¹
Discrete	10	1.91 × 10⁻³¹	4.24 × 10⁻³³	1.44 × 10⁻³²	2.27 × 10⁻³¹	2.56 × 10⁻³²	3.08 × 10⁻³³
integral	20	7.69 × 10⁻³⁰	4.62 × 10⁻³²	2.90 × 10⁻³²	6.27 × 10⁻³²	3.40 × 10⁻³²	2.37 × 10⁻³²
equation	30	7.11 × 10⁻⁴	2.22 × 10⁻³¹	2.50 × 10⁻³¹	3.45 × 10⁻³¹	8.55 × 10⁻³²	2.50 × 10⁻³¹
	40	3.63 × 10⁻⁴	3.82 × 10⁻³¹	3.07 × 10⁻³¹	3.21 × 10⁻³¹	4.25 × 10⁻³¹	3.04 × 10⁻³¹
	50	3.05 × 10⁻³	8.51 × 10⁻³¹	1.34 × 10⁻³⁰	5.95 × 10⁻³¹	1.47 × 10⁻³⁰	7.34 × 10⁻³¹
	60	4.46 × 10⁻⁴	2.24 × 10⁻³⁰	1.30 × 10⁻³⁰	1.59 × 10⁻³⁰	9.74 × 10⁻³¹	6.12 × 10⁻³¹
Broyden	10	3.99 × 10⁻³⁰	3.12 × 10⁻³⁰	7.31 × 10⁻³⁰	3.28 × 10⁻²⁹	2.92 × 10⁻³⁰	2.92 × 10⁻³⁰
tridiagonal	20	3.20 × 10⁻²⁶	1.63 × 10⁻²⁹	3.34 × 10⁻²⁹	6.15 × 10⁻²⁹	2.45 × 10⁻²⁹	3.15 × 10⁻²⁹
	30	4.70 × 10⁻²⁶	1.68 × 10⁻²⁸	1.19 × 10⁻²⁸	9.66 × 10⁻²⁹	6.50 × 10⁻²⁹	8.18 × 10⁻²⁹
	40	9.11 × 10⁻¹⁴	2.24 × 10⁻²⁸	6.73 × 10⁻²⁸	3.70 × 10⁻²⁸	2.45 × 10⁻²⁸	2.24 × 10⁻²⁸
	50	2.67 × 10⁻¹³	5.82 × 10⁻²⁸	6.98 × 10⁻²⁸	4.61 × 10⁻²⁸	6.86 × 10⁻²⁸	4.54 × 10⁻²⁸
	60	3.78 × 10⁻¹¹	9.12 × 10⁻²⁸	1.69 × 10⁻²⁷	1.01 × 10⁻²⁷	1.34 × 10⁻²⁷	1.10 × 10⁻²⁷
Broyden	10	4.18 × 10⁻²⁸	4.61 × 10⁻³⁰	4.81 × 10⁻³⁰	6.82 × 10⁻²⁹	7.43 × 10⁻³⁰	2.36 × 10⁻³⁰
banded	20	1.85 × 10⁻²⁶	2.63 × 10⁻²⁹	6.04 × 10⁻²⁹	7.47 × 10⁻²⁹	1.60 × 10⁻²⁸	4.77 × 10⁻²⁹
	30	12.2	2.25 × 10⁻²⁸	1.34 × 10⁻²⁸	2.08 × 10⁻²⁸	3.15 × 10⁻²⁸	2.89 × 10⁻²⁸
	40	2.02 × 10 $^{- 6}$	3.48 × 10⁻²⁸	7.41 × 10⁻²⁸	9.32 × 10⁻²⁸	1.34 × 10⁻²⁸	3.25 × 10⁻²⁸
	50	9.33 × 10 $^{- 5}$	6.08 × 10⁻²⁸	1.38 × 10⁻²⁷	6.78 × 10⁻²⁸	5.98 × 10⁻²⁸	1.04 × 10⁻²⁷
	60	5.13 × 10 $^{- 6}$	2.93 × 10⁻²⁷	3.44 × 10⁻²⁷	4.09 × 10⁻²⁷	7.98 × 10⁻²⁸	7.59 × 10⁻²⁸
accurate		15/46	40/46	40/46	39/46	39/46	42/46

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Rojec, Ž.; Tuma, T.; Olenšek, J.; Bűrmen, Á.; Puhan, J. Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems. Mathematics 2022, 10, 2288. https://doi.org/10.3390/math10132288

AMA Style

Rojec Ž, Tuma T, Olenšek J, Bűrmen Á, Puhan J. Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems. Mathematics. 2022; 10(13):2288. https://doi.org/10.3390/math10132288

Chicago/Turabian Style

Rojec, Žiga, Tadej Tuma, Jernej Olenšek, Árpád Bűrmen, and Janez Puhan. 2022. "Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems" Mathematics 10, no. 13: 2288. https://doi.org/10.3390/math10132288

APA Style

Rojec, Ž., Tuma, T., Olenšek, J., Bűrmen, Á., & Puhan, J. (2022). Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems. Mathematics, 10(13), 2288. https://doi.org/10.3390/math10132288

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Meta-Optimization of Dimension Adaptive Parameter Schema for Nelder–Mead Algorithm in High-Dimensional Problems

Abstract

1. Introduction

2. Adaptive Parameter Schemas for NMA

3. Optimization of the Adaptive Parameter Schema

4. Evaluation of the Optimized Schema with Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI