A Direct Search Algorithm for Global Optimization

A direct search algorithm is proposed for minimizing an arbitrary real valued function. The algorithm uses a new function transformation and three simplex-based operations. The function transformation provides global exploration features, while the simplex-based operations guarantees the termination of the algorithm and provides global convergence to a stationary point if the cost function is differentiable and its gradient is Lipschitz continuous. The algorithm’s performance has been extensively tested using benchmark functions and compared to some well-known global optimization algorithms. The results of the computational study show that the algorithm combines both simplicity and efficiency and is competitive with the heuristics-based strategies presently used for global optimization.


Introduction
An optimization problem consists of finding an element of a given set that minimizes (or maximizes) a certain value associated with each element of the set.In this paper, we are interested in minimizing the value of a real valued function defined over the n-dimensional Euclidean space.Without loss of generality, our problem is equivalent to minimizing a real valued function over the open unit n-box (0, 1) n ⊂ R n .
Let f denote a real function defined over the open unit n-box (0, 1) n ⊂ R n , and consider the following optimization problem: The function to be minimized f is called the cost function and the unit n-box (0, 1) n is the search space of the problem.Our aim is to obtain a point x * ∈ (0, 1) n , such that the cost function f attains a minimum at x * , i.e., f (x * ) ≤ f (x), ∀x ∈ (0, 1) n .We shall make the following assumption: A1 The cost function f attains its minimum at a point of the search space.
If the function f is continuously differentiable in (0, 1) n , then: Many optimization methods make use of the stationarity Equation (2) to compute candidate optimizer points.These methods require the explicit knowledge of the gradient, but they do not guarantee that the point obtained is an optimizer, except for a pseudoconvex function.
We shall also make an additional assumption that will restrict our alternatives to design a procedure to compute the optimum.

A2
The gradient of the cost function is not available for the optimization mechanism.
Consequently, only function evaluations can be used to compute the minimum.
Remark 1.The problem Equation ( 1) is equivalent to unconstrained optimization.Consider the following unconstrained optimization problem: and let ξ = ξ 1 ξ 2 • • • ξ n T be the vector of variables of the cost function.The invertible transformation: transforms the real line into the unit interval.Consequently, we can convert an unconstrained minimization problem into an open unit n-box constrained minimization problem by transforming each of its variables.Similarly, the linear transformation: The rest of this paper is organized as follows.Section 2 is a review of the literature on direct search and global optimization algorithms.The direct search algorithm for global optimization is designed in Section 3. The algorithm makes use of a set of simplex-based operations for a transformed cost function.We begin by reviewing some basic results about n-dimensional simplices that will be used later to define the algorithm and study its properties.Then, we will introduce a transformation of the cost function that is crucial to improving the exploratory properties of the algorithm.Finally, we conclude this section by explaining the algorithm in detail and analyzing its convergence properties.In Section 4, an experimental study of the algorithm's performance is accomplished by using three well-known test functions.Its performance is also compared to other global optimization strategies.The conclusions are presented in Section 5.

Direct Search Methods and Global Optimization
A direct search method solves optimization problems without requiring any information about the gradient of the cost function [1,2].Unlike many traditional optimization algorithms that use information about the gradient or higher derivatives to search for an optimal point, a direct search algorithm searches a set of points around the current point, looking for one where the value of the cost function is lower than the value at the current point.Direct search methods solve problems for which the cost function is not differentiable or not even continuous.Direct search optimization has been receiving increasing attention over recent years within the optimization community, including the establishment of solid mathematical foundations for many of the methods considered in practice [1,[3][4][5][6][7].These techniques do not attempt to compute approximations to the gradient, but rather, the improvement is derived from a model of the cost function or using geometric strategies.There exists a great number of problems in different fields, such as engineering, mathematics, physics, chemistry, economics, medicine, computer science or operational research, where derivatives are not available, but there is a need for optimization.Some relevant examples are: tuning of algorithmic parameters, automatic error analysis, structural design, circuit design, molecular geometry or dynamic pricing.See Chapter 1 of [6] for a detailed description of the applications.
Initially, direct search methods have been dismissed for several reasons [8].Some of them are the following: they are based on heuristics, but not on rigorous mathematical theory; they are slow to converge and only appropriate for small-sized problems; and finally, there were no mathematical analysis tools to accompany them.However, these drawbacks can be refuted today.Regarding slow convergence, in practice, one only needs improvement rather than optimality.In such a case, direct methods can be a good option because they can obtain an improved solution even faster than local methods that require gradient information.In addition, direct search methods are straightforward to implement.If we not only consider the elapsed time for a computer to run, but the total time needed to formulate the problem, program the algorithm and obtain an answer, then a direct search is also a compelling alternative.Regarding the problem size, it is usually considered that direct search methods are best suited for problems with a small number of variables.However, they have been successfully applied to problems with a few hundred variables.Finally, regarding analytical tools, several convergence analysis studies for a direct search algorithm were published starting in the early 1970s [9][10][11][12].These results prove that it is possible to provide rigorous guarantees of convergence for a large number of direct search methods [4], even in the presence of heuristics.Consequently, direct search methods are regarded today as a respectable choice, and sometimes the only option, for solving certain classes of difficult and practical optimization problems, and they should not be dismissed as a compelling alternative in these cases.Direct search methods are broadly classified into local and global optimization approaches.

Local Optimization Approaches
Since there does not exist a suitable algorithmic characterization of global optima, most optimizers are based on local optimization.The most important derivative-free local based methods are pattern search methods and model-based search methods.Pattern search methods perform local exploration of the cost function on a pattern (a set of points conveniently selected).The Hooke-Jeeves [13] method and the Nelder-Mead simplex method [14] are two well-known examples of pattern search local methods.
The method of Hooke and Jeeves consists of a sequence of exploratory moves from a base point in the coordinate directions, which, if successful, are followed by pattern moves.The purpose of an exploratory move is to acquire information about the cost function in the neighborhood of the current base point.A pattern move attempts to speed up the search by using the information already acquired about the function to identify the best search direction.
The Nelder-Mead simplex algorithm belongs to the class of simplex-based direct search methods introduced by Spendley et al. in [15].This class of methods evolves a pattern set of n + 1 vectors on an n-dimensional search space that is interpreted as the vertex set of a simplex, i.e., a convex n-dimensional polytope.Different simplex-based methods use different operators to evolve the vertex set.The Nelder-Mead algorithm starts by ordering the vertices of the simplex and uses five operators: reflection, expansion, outer contraction, inner contraction and shrinkage.At each iteration, only one of these operations is performed.As a result, a new simplex is obtained such that either contains a better vertex (a vertex that is better than the worst one and substitutes it) or has a smaller volume.From the point of view of the function value, every operator, except for the shrinkage, is productive and obtains a vertex that improves the worst one.The shrinkage operator reduces the volume of the simplex to guarantee algorithm termination.Unfortunately, the convergence of the Nelder-Mead simplex algorithm to a local minimizer is not guaranteed [16], and a number of modifications have been introduced to get convergence.Nevertheless, the Nelder-Mead simplex algorithm remains as one of the most popular direct search algorithms.
Model-based search methods are based on the idea of using function evaluations to compute a model of the cost function and to obtain derivative approximations from this model.Many model-based methods are trust-region algorithms [17] that interpolate the cost function in some region of appropriate shape and size to obtain a good approximation of the cost function that can later be easily optimized on that region.Typically, the model to be adjusted is a fixed-order polynomial, and the trust region is a sphere in some norm of the search space.The model function is sequentially optimized in the trust region, and the trust region is updated for the next iteration until the approximated model satisfies a first-order optimality condition.Model-based methods also include [18][19][20].During the last decade, there has been a considerable amount of work to handle constraints in the field of direct search methods.New algorithms have been developed to deal with bounds and linear inequalities [21,22] smooth nonlinear constraints [23,24] or non-smooth constraints [25][26][27].In this work, we do not deal with constraints.Our aim here is to develop a simple and efficient algorithm that improves the heuristics-based optimization methods that are presently used to obtain the global minimum in a multidimensional and multimodal continuous function.

Global Optimization Approaches
Local optimization methods are the best option for convex, or at least unimodal, optimization problems.However, most real-world problems are not convex, and the solution provided by a local method is not guaranteed to be global.Unfortunately, even if derivatives are available, it is not possible in general to prove that a local optimum is global.A rigorous global optimization algorithm for a multimodal cost function requires in-depth exploration of the search space, and this is only possible for functions with a small number of variables, because the computational time for space exploration increases exponentially with the dimension of the search space.
In recent years, there has been a great interest in derivative-free methods for global optimization; see the monographs [42][43][44][45][46] and the references therein.In particular, there are two classes of algorithms that are of special interest.The first class is Lipschitz optimization methods.Lipschitz methods substitute the cost function by an auxiliary function that underestimates it.The auxiliary function is linear piecewise and is obtained by partitioning the search space and repeatedly using the Lipschitz property [47][48][49][50].An interesting Lipschitz method is DIRECT (DIviding RECTangles), that was introduced in [51].In this method, the cost function is evaluated at several sample points by using all possible weights on local versus global search expressed by the Lipschitz constant.The DIRECT algorithm has gained popularity due to its simplicity.However, it has two weaknesses.First, it is slow in reaching the solution with high accuracy, and second, it spends an excessive number of evaluations exploring local minima.Some variants have been developed to improve these weaknesses [52][53][54][55].
Another class of global optimization methods uses a filled function to avoid local minima.The basic idea is to design an auxiliary function, called the filled function, such that when the algorithm reaches a local minimum, the minimization of the filled function provides a point that moves away from the basin of the local minimum.The filled function method was introduced by Ge [56,57].It has had increasing interest in filled methods in the last few years [58][59][60].

A Direct Search Algorithm for Global Optimization
The direct search algorithm for global optimization proposed here has two main ingredients: a transformed cost function and a simplex-based mechanism to evolve the initial point.The transformed cost function improves global exploration, while the evolution mechanism guarantees the termination and global convergence to a stationary point.To the best of our knowledge, the transformed cost function has not been previously used in the literature, and it is a key ingredient of our approach, because it provides global exploration of the search space.The new algorithm is an efficient alternative to heuristics-based optimization methods for global optimization and preserves the convergence properties of a well-designed direct search method.
The algorithm makes use of an initial point that is allocated as a qualified vertex of an n-dimensional initial simplex.This simplex is evolved by a sequence of operations that depend on the value of the transformed cost function at each vertex.Each operation produces a similar simplex of a different size.The sequence of operations is designed in such a way that the simplex tends to asymptotically collapse in a single point.The algorithm terminates when the simplex vertices are close enough to each other.
Before explaining the algorithm and its operators in detail, we shall review some basic geometric results about n-simplices.Later, we shall introduce the cost function transformation and its main properties.

Notation
A brief summary of the notation used in this paper is included next for quick reference.
Set of integer numbers in interval I: Z I := {z ∈ Z : z ∈ I}.Closed convex hull of a set A: Co(A).Difference set A\B = {x ∈ A : x ∈ B}.Larger integer less than or equal to x: x .Inner product of Euclidean space: x, y = x T y. p-norm of Euclidean space:

Review of the Basic Geometric Results about n-Simplices
Let V = {x i : i ∈ Z [0,n] } be a set of n + 1 affinely-independent points in R n , then the closed convex hull of V is an n-simplex Σ with vertex set V, and it is denoted as follows: Next, we introduce certain specific types of n-simplices that are non-standard in the literature.
Definition 1 (Right n-simplex).A right n-simplex has a vertex, such that the n edges intersecting at it are pairwise orthogonal.This vertex is called the right vertex.

Definition 2 (Isosceles right n-simplex).
A right n-simplex, such that every edge intersecting at the right vertex has the same length ∆, is an isosceles right n-simplex of length ∆.
Definition 3 (Standard n-simplex).The standard n-simplex is an isosceles right n-simplex of unit length.By locating the zero of R n in the right vertex, the standard n-simplex } where b 0 is the zero vector, and b i is the i-th element of the standard basis of R n .Alternatively, the standard n-simplex can be defined as the intersection of the closed unit ball in the one-norm of R n and the non-negative orthant, i.e., The content of an n-dimensional object is the measure of the amount of space inside its boundary.For instance, the content of a two-simplex is its area, and the content of a three-simplex is its volume.It can be proven by induction that the content of a standard n-simplex is 1 n! .A regular n-box is an n-dimensional box with edges of equal length.If the content of an n-simplex is not zero, then a geometric object of nonzero content can fit in its interior.The following lemma provides the edge length of the maximum regular n-box that can fit in the interior of a standard n-simplex.
Lemma 1.The regular n-box of maximum size that can be contained inside a standard n-simplex has edge length 1  n and content 1 n n .
Proof.The standard n-simplex is given as The maximum n-box that can be allocated inside the standard n-simplex has a vertex in the origin, and it is oriented along the coordinate directions.The opposite vertex to the origin of this regular n-box has coordinates x i = 1 n for i ∈ Z [1,n] ; therefore, its edge length is 1  n , and its content is The edges that intersect at the right vertex of a standard n-simplex are oriented along the coordinate directions, so these edge directions define a nonnegative orthant.Any direction inside this orthant can be expressed by a nonnegative n-dimensional vector of unit norm, i.e., d, d = 1.A direction d forms angles θ i with the coordinate directions whose cosines are called the directional cosines and are given by cos θ i = d, b i .The following lemma proves that any direction d contained in the nonnegative orthant has at least one directional cosine that never is less than 1 √ n .Lemma 2. The smallest directional cosine of any direction contained in the orthant formed by the edges intersecting at the right vertex of a standard n-simplex is never less than 1 √ n .
Proof.Without loss of generality, suppose that the right vertex is the zero point and that the edges are in the coordinate directions, characterized by the elements of the standard basis of R n , i.e., b i for i ∈ Z [0,n] .

Consider the direction d
. Suppose that unlike the lemma statement, there exists a direction d with unit norm d, d = 1, such that cos 1 n = 1.This contradicts the fact that d has unit norm, and the lemma is proven.
A consequence of this lemma is that any direction contained in the nonnegative n-orthant always forms an angle less than π 2 radians with at least one of the coordinate directions that form the orthant.In addition, if we consider an arbitrary direction d ∈ R n , then there always exists an n-simplex similar to the standard n-simplex, such that the edge directions of its right vertex define an orthant, such that the smallest directional cosine is never less than 1 √ n .This n-simplex is obtained by rotating π radians the edges of the standard n simplex corresponding to the negative components of the direction vector d.

Function Transformation
A key ingredient of our direct search global optimization algorithm is a transformation of the cost function that improves its exploratory features, providing the capability of jumping out of local minimizers, while preserving termination and convergence properties.
Let x ∈ R n be an n-dimensional vector; x denotes the n-dimensional vector whose components are the greatest integers less than or equal to the components of x.The fractional part of x is denoted as frac(x) and is defined as follows: Let ξ be an arbitrary n-dimensional vector of real numbers and P a positive integer.Let us introduce the following map: This map is onto and transforms the n-dimensional Euclidean space R n in the n-dimensional unit interval (0, 1) n .Let f be the cost function of the minimization problem given in Equation ( 1) and introduce the following function transformation: where P is a positive integer.Since the cost function f is not necessarily defined outside of the open unit n-box, the domain of the transformed cost function φ is: The reason is that if Pξ ∈ Z n , then frac(Pξ) = 0, and f is not necessarily defined for ξ = 0.The transformed function φ has certain interesting properties.The most important are the periodicity of period P −1 and piecewise continuity, both on each variable.Lemma 3. The transformed function φ(ξ) = f (frac(Pξ)) satisfies the following properties: The function φ is continuous for any ξ ∈ R n , such that Pξ ∈ (0, 1) n .(ii) Let ξ, η ∈ R n be such that Pξ ∈ (0, 1) n and Pη ∈ Z n , then φ(ξ + η) = φ(ξ).
Proof.(i) If Pξ ∈ (0, 1) n , then frac(Pξ) = Pξ and φ(ξ) = f (Pξ).Since f is continuous for any x ∈ (0, 1) n , then φ is continuous for any ξ ∈ R n , such that Pξ ∈ (0, 1) n ; (ii) If ξ and η are n-dimensional vectors, such that Pξ ∈ (0, 1) n and Pη ∈ Z n , respectively, then frac(Pη) = 0 and frac(P(ξ Let x ∈ R n and B P (x) denote the following n-box: The transformed function φ is continuous on B P (x), and there always exists ξ * ∈ B P (x), such that φ(ξ * ) = min{φ(ξ) : ξ ∈ dom φ}.Note that the n-box B(x) is given by B P (x) = {ξ = P −1 (z + Px ) : z ∈ (0, 1) n }.If x = 0, then B P (0) is the fundamental n-box of the function φ, i.e., B P (0) = (0, P −1 ) n .For another x ∈ R n , B P (x) is an n-box of length P −1 , where the function φ takes the same values as in the fundamental n-box B P (0).Note that there are P n different n-boxes for x ∈ (0, 1) n ; then, the function φ repeats P n times in the unit n-box (0, 1) n .∈ Z} for x ∈ R, but discontinuous at the boundary points of these intervals, which are given by ξ = 0.1k for any integer k.This discontinuity always occurs if f (0 + ) = f (1 − ) for any > 0, as occurs in this example and which is clearly visible in Figure 1.Moreover, the function φ is not necessarily defined for these points of discontinuity.
Consider the unconstrained optimization problem: The function φ has a global optimizer at ξ * if φ(ξ * ) ≤ φ(ξ) for all ξ ∈ dom φ.The following theorem proves that the transformed unconstrained minimization problem Equation ( 12) also provides the minimum value of the original optimization problem Equation (1).
Theorem 1 allows us to obtain a global minimizer of the original function f on the open unit n-box by solving the unconstrained optimization problem (12).If we have a method that computes a global minimizer ξ * of the unconstrained optimization problem (12), then x * = frac(Pξ * ) ∈ (0, 1) n , and it satisfies φ(ξ * ) = f (x * ).
The following lemma proves that any regular n-box of a size greater than or equal to P −1 always contains a point ξ * that attains the global minimum of the unconstrained optimization problem (12).Lemma 4. Let B be a regular n-box of edge length ∆ B , such that P∆ B ≥ 1, and let λ * = min{ f (x) : x ∈ (0, 1)}; then, the n-box B always contains a point ξ * ∈ R n , such that φ(ξ * ) = λ * .
Proof.Let x * ∈ (0, 1) n be such that f (x * ) = λ * ≤ f (x) for any x ∈ (0, 1) n , and let ξ ∈ B, such that P ξ = P ξ + x * .Consider the open n-box B P ( ξ) = {ξ ∈ R n : Pξ = P ξ , Pξ ∈ Z n }, then B P ( ξ) has edge length P −1 , and the point ξ We are also interested in determining the size of an n-simplex similar to the standard n-simplex that always contains a global minimizer of {φ(ξ) : ξ ∈ dom φ}.Since Lemma 4 establishes that such a point is always contained in a regular n-box of length not less than P −1 , then the solution is an isosceles right n-simplex that contains a regular n-box of length P −1 .From Lemma 1, this n-simplex has edge length nP −1 and content P −n /(n − 1)!.The following lemma states the result.Theorem 2. Let Σ be an isosceles right n-simplex of length ∆, such that P∆ ≥ n, and let λ * = min{ f (x) : x ∈ (0, 1) n }; then, the n-simplex Σ always contains a point ξ * ∈ dom φ, such that f (frac(ξ * )) = λ * .
Proof.A regular n-box of edge length ∆ B = ∆/n can be inscribed inside any isosceles right n-simplex of length ∆.Since P∆ ≥ n, then P∆ B ≥ 1, and the theorem statement is a straightforward consequence of Lemma 4.
In the rest of this section, we shall design a direct search algorithm to obtain a solution for the optimization problem (1).We distinguish two algorithms, the basic algorithm and the complete algorithm.The basic algorithm is a simplex-based algorithm for the transformed cost function.An initial point is embedded at the right vertex of an isosceles right n-simplex of edge length ∆ 0 .This n-simplex evolves by a sequence of operations.As a result of each iteration, another n-simplex is obtained that is similar to the standard n-simplex and has the point with minimum value of the transformed cost function located in the right vertex.The edge length of the n-simplex decreases by a constant factor if no operations performed during the iteration produce a vertex with a smaller value of the transformed cost function.Whenever the edge length is no less than nP −1 , we say that the algorithm is in exploratory phase, because the n-simplex always contains a global minimizer of {φ(ξ) : ξ ∈ dom φ}.When the edge length of the n-simplex is less than nP −1 , then we say that the algorithm is in convergence phase, and it aims to approach a stationary point.Thus, the basic algorithm is designed to preserve the good properties of direct search algorithms, such as convergence to a stationary point, but increases the possibility to reach the global optimum at the end of the exploratory phase.Another feature of the basic algorithm is that there is no interruption or change from one phase to the other, it is a straightforward consequence of using the transformed cost function.

The Basic Algorithm
The basic algorithm is a simplex-based algorithm for the transformed optimization problem Equation (12).It performs a sequence of operations over an initial simplex until a termination condition is satisfied.The main operators of this algorithm are expansive translation, rotation and shrinkage.The expansive translation is performed after a sufficient decrease of the transformed cost function to ensure that the best point is located at the right vertex.The rotation operation provides exploration of the search space, and the shrinkage operator guarantees termination and global convergence to a stationary point of the transformed cost function.The global improvement is a consequence of using the transformed cost function Equation (8).A sufficient decrease condition of the cost function to accept a successful iteration is a key aspect to prove the termination and global convergence to a stationary point.The main components of the basic algorithm are explained below.

Function Transformation
The cost function transformation is given by: This transformation aims to perform global improvement at each iteration k ∈ Z [0,∞) whenever the n-simplex has an edge length, such that P∆ k ≥ n.

Initial Simplex
Let vset(ξ, ∆) where ξ ∈ R n and ∆ ∈ R be defined as follows: vset(ξ, ∆) := {ξ i : where b 0 is an n-dimensional zero vector and b i for i ∈ Z [1,n] is the i-th element of the standard basis of R n .The initial n-simplex is obtained by varying an initial point at a distance ∆ 0 in each coordinate direction, such that P∆ 0 > n.Let ξ 0 ∈ (0, 1) n be an initial point; then, the vertex set of the initial simplex is given by: If the initial point ξ 0 is not provided as an input of the algorithm, then it can be randomly chosen.Consequently, the initial simplex is an isosceles right n-simplex with edge length ∆ 0 and content 1  n! ∆ n 0 whose right vertex is indexed by zero.

Expansive Translation
This operation aims to position the point with lowest value of the transformed cost function found so far at the right vertex.Let V be the vertex set of an isosceles right n-simplex with edge length ∆, and let ξ i ∈ V be such that φ(ξ i ) ≤ φ(ξ j ) for ξ j ∈ V; then, the expansive translation of the right n-simplex V with respect to the vertex i is given by the following expression: where 1 ≤ ρ < ∞.If the vertex set of the current simplex is given by V = vset(ξ 0 , ∆), then the transformed n-simplex has the vertex set V = vset(ξ j , ∆ ), where ∆ = ρ∆, and its edge length is |∆ |.
The new n-simplex, after the expansive translation, preserves its shape, but increases its content by a factor of ρ n .From a practical point of view, it is convenient to choose the expansive factor ρ close to one, but slightly greater, because this avoids obtaining candidate points that were already visited in previous iterations, in the next rotation operation.Let c be an arbitrary positive constant; if φ(ξ i ) < φ(ξ 0 ) − c∆ 2 for some ξ i ∈ V }, then the algorithm proceeds to a new iteration and performs an expansive translation operation.Otherwise, the algorithm executes a rotation operation.
The introduction of the positive term c∆ 2 guarantees that the iteration is considered successful only if it provides a point with a sufficient decrease of the cost function.

Rotation
This operation produces a similar n-simplex, but where each edge is rotated π radians with respect to the right vertex.A graphical interpretation is depicted in Figure 2.
Graphical representation of the simplex translation, rotation and shrinking simplex operations for three points in R 2 .The initial vertex set is V = {ξ 0 , ξ i , ξ j }, and the final vertex set is V = {ξ 0 , ξ i , ξ j }.
The three operations produce similar simplices.
Let V denote the vertex set of an n-simplex.The vertices of the rotated n simplex are given by the following expression: If the vertex set of the current simplex is given by V = vset(ξ 0 , ∆), then the transformed n-simplex has the vertex set V = vset(ξ 0 , ∆ ), where ∆ = −∆, and its edge length is |∆ |.If φ(ξ i ) < φ(ξ 0 ) − c∆ 2 for some ξ i ∈ V , then the rotation operation succeeds, and the algorithm proceeds to start a new iteration and performs an expansive translation operation.Otherwise, the rotation operation fails; the rotated vertex set is discarded; and the algorithm executes a shrinkage operation with the original vertex set V.

Shrinkage
This operation performs a contraction of the simplex by fixing the right vertex and placing the remaining vertices along the same directions, but at a distance that is reduced by a constant factor σ ∈ (0, 1).A graphical interpretation of the shrinkage operation is depicted in Figure 2 for R 2 .
Let V denote the vertex set of an n-simplex.The vertices of the transformed n-simplex are given by: If the vertex set of the current simplex is given by V = vset(ξ 0 , ∆), then the transformed n-simplex has the vertex set V = vset(ξ 0 , ∆ ), where ∆ = σ∆, and its edge length is |∆ |.After the shrinkage operation, the algorithm proceeds to start a new iteration by performing an expansive translation operation.

Sufficient Decrease Condition
The expansive translation and rotation operations are considered successful if some vertex of the transformed n-simplex satisfies the following sufficient decrease condition: where c is a positive constant.This condition is key to proving the termination of the basic algorithm and global convergence to a stationary point.The constant c is arbitrary, but usually, a small value is chosen.

Stopping Criterion
The sufficient decrease condition and the shrinkage operation guarantee that the size of the n-simplex converges to zero.Consequently, a stopping criterion can be defined by taking a sufficiently small number > 0. Let |∆| be the edge length of the n-simplex; then, the basic algorithm stops when the following condition is satisfied:

Space Transformation
The search space transformation is given by the map: Therefore, when the n-simplex with vertex set V reaches edge length ∆ ≤ , the basic algorithm stops, and the point obtained is x = frac(P ξ), where ξ ∈ arg min{ξ : ξ ∈ V }; and it attains the function value f ( x) = φ( ξ).

Implementation
An implementation of the basic algorithm in pseudocode is given in Algorithm 1.

Convergence of the Basic Algorithm
The basic algorithm has been designed to provide termination and improvement by performing a sequence of simplex transformations.If in addition, the cost function is differentiable and its gradient is Lipschitz continuous, then the basic algorithm converges to a stationary point.Theorem 3. Let f be the cost function of a minimization problem on the open unit n-box; then the basic algorithm always terminates.If, in addition, f is differentiable and its gradient ∆ f is Lipschitz continuous on the unit n-box, then the sequence of points generated at each iteration by the basic algorithm converges to a stationary point.
Proof.Let S and U denote the index sets of successful and unsuccessful iterations of the basic algorithm, respectively.Let k and ∆ k+1 = σ∆ k .Let us begin by proving that lim k→∞ |∆ k | = 0. Suppose not; then, for any k ∈ Z [0,∞] , there exists ≥ k, such that ∈ S because, if such an does not exist, then the set of successful iterations would be finite, and when k approaches infinity, the update rule ∆ k+1 = σ∆ k yields lim k→∞ |∆ k | = 0.In addition, there exists ∆ > 0, such that for each successful iteration ∈ S, |∆ | ≥ ∆ and: However, since approaches infinity, this implies that f (x +1 ) → −∞, which contradicts the fact that the function f is bounded below on (0, 1) n .Consequently, the boundedness of f implies that ∆ = 0 and lim k→∞ |∆ k | = 0.This proves the termination of the basic algorithm.
For the second part of the theorem, if f is continuous and differentiable on (0, 1) n , then so is φ(ξ) = f (frac(Pξ)) on the n-box B P (x) = {ξ ∈ R n : Pξ = Px , Pξ ∈ Z n } for any x ∈ R n .Furthermore, if ∇ f is Lipschitz continuous with constant M on (0, 1) n , then so is ∇φ in B P (x) with constant PM.Since we proved that lim k→∞ |∆ k | = 0, then there always exists a positive integer L, such that Pξ k = Pξ L for any positive integer k > L. The basic algorithm searches points along the coordinate directions; then, from Lemma 2 and for any positive integer k > L, no matter the value of ∇φ(ξ k ), there is always at least one (positive or negative) coordinate direction because k , the mean value theorem establishes that: Applying Equation ( 21) yields: Taking into account that φ is continuously differentiable in B P (ξ k ) and its gradient ∇φ is Lipschitz with constant PM in B P (ξ k ), and taking the limit when k approaches infinity, it is possible to conclude that: and finally, by making x k = frac(ξ k ): Consequently, the sequence of points generated by the basic algorithm converges to a stationary point.
Remark 2. The basic algorithm can be applied to any cost function f , not necessarily continuous or differentiable in the search space (0, 1) n .For a continuous and differentiable function with a Lipschitz continuous gradient, the basic algorithm converges to a stationary point.If in addition the cost function is strictly convex, then the unique optimizer is a stationary point, and Theorem 3 guarantees that the basic algorithm converges to the unique optimizer.Note that we do not need to know the value of the Lipschitz constant M.
Example 2. The evolution of the basic algorithm for the function f (x) = |x| − | √ x sin(3πx)| is depicted in Figure 3.The parameters of the algorithm are P = 10 4 , σ = 0.5, ρ = 1.05, c = 0.01, = 10 −4 .Since this problem is unidimensional, the simplex is an interval.The left plots represent the vertex points of the simplex in the original space (above) and their corresponding function values (below) after each iteration.The right plots represent the values of the worst (above) and best (below) points of the simplex in the original space that were obtained during the execution of the algorithm.Note that the algorithm explores the whole range of the unit interval.This exploration is performed whenever the length of the simplex edge ∆ k satisfies P∆ k ≥ 1, which corresponds to the exploratory phase of the algorithm.Once ∆ k is P∆ k < 1, then the algorithm enters the phase of convergence to a stationary point.In this example, the algorithm stops after 24 iterations.From Iterations 1 to 12, the algorithm is in the exploratory phase, and from Iterations 13 to 24, the algorithm is in the convergence phase to a stationary point.The point obtained corresponds to f (0.1721) = −0.2422,which is the global minimal value within the space tolerance .

The Complete Algorithm
The complete algorithm, that we call GDS (an acronym for Global Direct Search), is obtained by repeating the basic algorithm a number of times.A single application of the basic algorithm may obtain the global optimum within a space tolerance only in easy problems.For more difficult optimization problems, a new execution of the basic algorithm, starting with the best point obtained in the previous execution, may improve the result.Each execution produces a local minimizer; however, if there are better points in some coordinate direction, the algorithm may jump to one of these points during the exploratory phase of a new execution.Unfortunately, not every optimization problem satisfies the property that, for any local minimizer, there are points in some coordinate direction where the cost function is improved.In such a case, a strategy that turns out to be more effective is to start from a new random initial point.The class of cost functions that we are facing is not usually known in advance, and we do not know which strategy is more appropriate for a new execution of the basic algorithm.A trade-off solution is to repeat the basic algorithm N times, but starting from a new random initial point after a number R ≤ N of repetitions.

Initial Point
Let N denote the total number of executions of the basic algorithm and R the number of repetitions without choosing a new initial point.Let x denote the point obtained in the previous execution of the basic algorithm and ξ r a random vector in (0, 1) n with a uniform probability distribution.The initial point for the n-th execution of the basic algorithm is obtained as follows: where mod is the modulo operator that represents the remainder of the integer division of their arguments.Note that if R = 0, the initial point is always a random point; however, if R = N, the initial point is always the point obtained in the previous execution of the basic algorithm.

Best Point Storage
The minimum point found during the execution of the complete algorithm is preserved.The point obtained after each basic algorithm execution is compared to the best solution stored and, in the case of improvement, the new point replaces it.Let x denote the point obtained after the last execution of the basic algorithm and x min the best point stored so far; the storage rule is:

.3. Implementation of the Complete Algorithm
An implementation of the complete algorithm GDS in pseudocode is given in Algorithm 2.

13:
end for 14: return x min 15: end function

Test Functions
Global optimization is not a trivial problem.Theoretical results, such as the No Free Lunch (NFL) theorem [61], state that an efficient general-purpose universal optimization algorithm is not possible.In other words, this result says that one algorithm can outperform another only for a certain class of problems.It is a usual practice to analyze the performance of an optimization algorithm by using test functions.A large number of standard test functions have been used to study the performance of our direct search global optimization algorithm.In this section, we present the results for the 24 functions given in Table 1.These function are described in detail in [62].The functions are classified into five groups: separable functions (func # 1.xx), functions with low or moderate conditioning (func # 2.xx), unimodal functions with high conditioning (func # 3.xx), multimodal functions with an adequate global structure (func # 4.xx) and multimodal functions with a weak global structure (func # 5.xx).All of the functions are scalable with the dimension.All of the functions are defined and can be evaluated over R n , and the search space is [−5, 5] n , where n is the dimension of the function domain.The global optimal value is attained in this search space.

Selection of the GDS Parameters
The performance of the GDS algorithm for different parameter values has been empirically analyzed by means of extensive computer studies.Our algorithm has been implemented in the scientific software MATLAB [63], and its performance has been successfully compared to the global optimization strategies that are implemented in the Global Optimization Toolbox of MATLAB.The GDS algorithm has a number of configurable parameters, and this comparative study allowed us to select a set of parameters that are appropriate for many problems.The expansive and contractive factors are set to ρ = 1.05 and σ = 0.5.It is convenient to take the expansive value slightly greater than one to avoid evaluating points already checked in previous iterations.However, large values of ρ usually produce slow convergence.Consequently, values in the interval [1.01, 1.10] are usually a good option for most problems.Regarding the contractive factor σ, the closer to one, the larger the number of iterations of the basic GDS algorithm will be.From our experiments, σ = 0.5 is a convenient value that produces a satisfactory trade-off between the efficiency of the algorithm and the convergence time.The constant c of the sufficient decrease condition is chosen as c = 0.01.Any positive value of c provides the termination of the algorithm, but small values are recommended to improve the algorithm's performance during the initial iterations when ∆ is large.The parameters P and R were fixed to P = 1000 and R = 5.These values were selected after a massive experimentation.In our experiments, we analyzed the mean value of the CPU time for one execution of the basic GDS algorithm for different values of the parameter R. The empirical results showed that the larger the value of R, the lower the CPU time for execution.This means that when R increases, the number of executions N can also be increased to a certain extent without compromising the total CPU time.The reduction is larger for smaller values of R, e.g., the reduction is about 40% from R = 0 to R = 1 and about 30% from R = 1 to R = 2. Since increasing R is also beneficial for functions that can be globally optimized along the coordinate directions, a good option that trades off between moderate CPU time and efficiency in computing the global optimum is to choose R ∈ [2,10] and increase the number of repetitions of the basic algorithm N as much as possible, to maintain a moderate total CPU time.In fact, we have checked that a good selection of P and R could improve the results depending on the problem.However, since a practitioner usually does not know in advance what class its problem belongs to, it is important to fix a set of parameters that trade off for a large class of problems.In conclusion, the parameter values of the GDS algorithm for the experimental study were selected as P = 1000, R = 5, ρ = 1.05, σ = 0.5, c = 0.01 and N = 1e9, and the stopping criterion is the number of function evaluations.

Performance Evaluation
The evaluation of the GDS algorithm has been accomplished by solving the 24 functions in Table 1 using the fixed set of parameters values explained in the previous subsection.The performance of GDS is compared to DIRECT [51].DIRECT is a deterministic global optimization algorithm based on the domain partition strategy.Each iteration includes two phases.The first phase identifies hyper-rectangles that potentially contain a global optima.In the second phase, the hyper-rectangles identified in the first phase are partitioned into smaller hyper-rectangles, and the cost function is evaluated at the centers of the hyper-rectangles.The convergence properties guarantee that if no termination criteria are included, DIRECT will exhaustively sample the domain.The implementation of DIRECT in MATLAB by Finkel [64] has been used in this study.The experiments have been performed in a laptop PC equipped with a dual core processor Intel (R) Core (TM) i5-3317U CPU @1.70 GHz running MATLAB R2014a under the Windows operating system.
The comparative study has been performed as follows: For both algorithms, GDS and DIRECT, a fixed number of function evaluations is used as a stopping condition, and the error in computing the global minimum value is obtained.Each problem has been solved 15 times, and the median of the error is tabulated for each function.The study has been accomplished for the 24 functions of Table 1 and for different dimensions of the function domain and different numbers of function evaluations.The default parameters of DIRECT [64] have been used, but using the number of function evaluations as stopping conditions.
In Table 2, the results for functions of dimensions five and 10 and a number of function evaluations equal to 10 3 and 10 4 are shown, while in Table 3, the same results are shown for dimensions 10 and 20 and 10 4 and 10 5 function evaluations.GDS and DIRECT obtained different results for different functions.In general, for a small number of function evaluations, DIRECT gets a smaller median of the error.However, as the number of function evaluations increases, the improvement in approaching the minimum value is higher in GDS.One of the outstanding features of GDS is its simplicity.This can be checked by comparing the CPU time with DIRECT.The CPU time for all of the problems, dimensions and function evaluations is much smaller for GDS; see Figures 4 and 5.Note that GDS is more than 100-times faster than DIRECT for several functions of dimension 40 and 10 5 function evaluations.The results of a study of the GDS algorithm for functions of high dimension are presented in Figure 6.The GDS algorithm is executed on the functions 1.02, 1.02, 1.04 and 1.05 of dimensions 100 and 200 to search the global minimum.The error to the global minimum in the logarithmic scale and the CPU time are shown for 10 4 , 10 5 , 10 6 and 10 7 .It is interesting to remark that the CPU time grows linearly with the dimension of the problem.
converts the open interval (a, b) into the open unit interval.Consequently, both unconstrained and arbitrary open n-box optimization problems are included in our formulation.Furthermore, under Assumption A2 and by choosing appropriate values of a and b, the global minimum of function f is attained in an interior point of the open n-box (a, b) n .

Example 1 .
The function f (x) = |x| − | √ x sin(3πx)|, and the transformed function φ(ξ) = f (frac(Pξ)) for P = 10 are depicted for the open unit interval in Figure 1.The function f is continuous on the open unit interval (0, 1) and has three local minima.The transformed function φ is periodic, of period P −1 = 0.1, and piecewise continuous on the open unit interval.The fundamental period of φ is B(0) = (0, 0.1), and the function value repeats ten times on the unit interval.The function φ is continuous on each interval B(x) = {ξ : 10ξ = 10x , 10ξ

Figure 4 .
Figure 4. Comparative study of the CPU time of GDS (blue) and DIRECT (red) for solving 24 problems of dimension five (a) and dimension 10 (b) using 10 3 function evaluations (c) and 10 4 function evaluations (d).

Figure 5 .
Figure 5. Comparative study of CPU time of GDS (blue) and DIRECT (red) for solving 24 problems of dimension 20 (a) and dimension 40 (b) using 10 4 function evaluations (c) and 10 5 function evaluations (d).

Figure 6 .
Figure 6.Study of the error to attain the global optimum (a) and CPU time (b) for functions of dimension 100 (c) and dimension 200 (d) using the GDS algorithm.

Table 1 .
Description of the functions used in the experimental study.

Table 2 .
Comparative study of the median of the error for problems of dimensions 5 and 10 and the number of evaluations 1e + 03 and 1e + 04.

Table 3 .
Comparative study of the median of the error for problems of dimensions 20 and 40 and the number of evaluations 1e + 04 and 1e + 05.