Next Article in Journal
Impact Dynamics Analysis of Mobile Mechanical Systems
Next Article in Special Issue
Estimating the Highest Time-Step in Numerical Methods to Enhance the Optimization of Chaotic Oscillators
Previous Article in Journal
Acoustics of Fractal Porous Material and Fractional Calculus
Previous Article in Special Issue
Maximizing the Chaotic Behavior of Fractional Order Chen System by Evolutionary Algorithms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Randomized Simplicial Hessian Update

Faculty of Electrical Engineering, University of Ljubljana, Tržaška Cesta 25, SI-1000 Ljubljana, Slovenia
*
Author to whom correspondence should be addressed.
Mathematics 2021, 9(15), 1775; https://doi.org/10.3390/math9151775
Submission received: 16 June 2021 / Revised: 14 July 2021 / Accepted: 20 July 2021 / Published: 27 July 2021
(This article belongs to the Special Issue Optimization Theory and Applications)

Abstract

:
Recently, a derivative-free optimization algorithm was proposed that utilizes a minimum Frobenius norm (MFN) Hessian update for estimating the second derivative information, which in turn is used for accelerating the search. The proposed update formula relies only on computed function values and is a closed-form expression for a special case of a more general approach first published by Powell. This paper analyzes the convergence of the proposed update formula under the assumption that the points from R n where the function value is known are random. The analysis assumes that the N + 2 points used by the update formula are obtained by adding N + 1 vectors to a central point. The vectors are obtained by transforming a prototype set of N + 1 vectors with a random orthogonal matrix from the Haar measure. The prototype set must positively span a N n dimensional subspace. Because the update is random by nature we can estimate a lower bound on the expected improvement of the approximate Hessian. This lower bound was derived for a special case of the proposed update by Leventhal and Lewis. We generalize their result and show that the amount of improvement greatly depends on N as well as the choice of the vectors in the prototype set. The obtained result is then used for analyzing the performance of the update based on various commonly used prototype sets. One of the results obtained by this analysis states that a regular n-simplex is a bad choice for a prototype set because it does not guarantee any improvement of the approximate Hessian.

1. Introduction

Derivative-free optimization algorithms have attracted much attention due to the fact that in many optimization problems, the evaluation of the gradients of the function subject to optimization and constraints is expensive. Such optimization problems can be often formulated as constrained black-box optimization (BBO) [1] problems of the form
min f ( x ) subject to
c i ( x ) 0 i = 1 , 2 , , n C
Functions f and c i are maps from R n to R . The objective is to minimize f subject to n C nonlinear constraints defined by functions c i . The method for computing f and c i is treated as a black-box, and the gradients are usually not available. Such problems often arise in engineering optimization when simulation is used for obtaining the function values. BBO often relies on models of the function and of the constraints. Various approaches to building black-box models were developed in the past, such as linear [2] and quadratic models [3], radial-basis functions [4], support vector machines [5], neural networks [6], etc.
In this paper, we focus on the quadratic models of f and c i . The most challenging task in building these models is the computation of the Hessian matrix. Instead of using the exact Hessian, the model can utilize an approximate Hessian. The approximation can be improved gradually by applying an update formula based on the function and the gradient values at points visited in the algorithm’s past. As the algorithm converges towards a solution, the approximate Hessian converges to the true Hessian.
For derivative-based optimization, several approaches for updating the approximate Hessian are well studied and tested in practice (e.g., BFGS update, SR1 update [7]). Unfortunately, these approaches rely on the gradient of the function (constraints), which, by assumption, is not available in derivative-free optimization.
Let n denote the dimension of the search space. For derivative-free optimization, a Hessian update formula based on the function values computed at m n + 2 points visited in the algorithm’s past was proposed by Powell in [8]. The update formula was obtained by minimizing the Frobenius norm of the update applied to the approximate Hessian subject to linear constraints imposed by the function values at m points in the search space. The paper proposed an efficient way for computing the update and explored some of its properties. The convergence rate of the update formula was not studied.
In a later paper, a simple update formula that uses three collinear points for computing the updated approximate Hessian [9] was examined. The normalized direction along which the three points lie was assumed to be uniformly distributed on the unit sphere. With this assumption, the convergence rate of the update was analyzed and shown to be linear. This update formula was successfully used in a derivative-free algorithm from the family of mesh adaptive direct search algorithms (MADS) [10]. A similar Hessian updating approach was used for speeding up global optimization in [11].
The assumption that the points taking part in an update must be collinear is a significant limitation for the underlying derivative-free algorithm. With this in mind, a new simplicial update formula was proposed in [12]. The formula relies on m n + 2 points. The reason for choosing the term simplicial Hessian update is the fact that the m 1 points form a simplex centered around the first point. For m = n + 2 , the formula is a special case of the update formula proposed in [8]. By imposing some restrictions on the positions of the m points, the update formula can be used for any m that satisfies 3 m n + 2 . The case m = 3 corresponds to the update formula proposed in [9].
To illustrate the approach for obtaining the update formula, let us assume that the current quadratic model of function f is given by
m ( x ) = 1 2 x T B x + g ^ T x + c ^
B is the current approximate Hessian. Let the points where the function is known be denoted by x i . For the sake of simplicity, let f i denote f ( x i ) . Based on these points, we are looking for an updated model:
m + ( x ) = 1 2 x T B + x + g ^ + T x + c ^ + .
The model must satisfy m constraints
m + ( x i ) = f i
that are linear in c ^ and the components of B + and g ^ + . Based on these constraints, we are looking for an updated approximate Hessian B + . Because we have fewer constraints than there are unknowns, we also require that B + B F is minimal ( · F denotes the Frobenius norm). The update formula we obtain in this way is a minimum Frobenius norm update formula.
For computing the expected improvement of the approximate Hessian, we first assume f itself is quadratic. We also assume the aforementioned m points are obtained by applying a random orthogonal transformation to m 1 vectors that form a prototype set and adding the resulting vectors to a central point. As in [9], the convergence rate of the update is linear. The speed with which the approximate Hessian converges to the true Hessian depends on the choice of the prototype set. Our result is a generalization of the result published in [9].
This paper is divided as follows. In Section 2, some basic properties of minimum Frobenius norm updates are explored. The Frobenius product is revisited with the purpose of simplifying the notation, and the update formula is derived. In the next section, uniformly distributed orthogonal matrices are introduced. Some auxiliary results are derived that are later used for computing the expected improvement of the approximate Hessian. Section 4 analyzes the convergence of the proposed update and derives the expected value of the improvement in the sense of the Frobenius norm of the difference between the approximate Hessian and the true Hessian. The expected improvement is computed for several prototype sets. The section is followed by an example demonstrating the convergence of the proposed update and concluding remarks.
Notation. Components of vectors ( a ) and matrices ( A ) are denoted by subscripts (i.e., a i and a i j , respectively). The i-th column of matrix A is denoted by a i . The unit vectors forming an orthogonal basis for R n are denoted by e i . Vectors are assumed to be column vectors, and the inner product of two vectors is written in matrix notation as a T b . The Frobenius norm and the trace of a matrix are denoted by · F and tr ( · ) , respectively. The expected value of a random variable is denoted by E · .

2. Obtaining the Update Formula

Let H denote the Hessian of a function. Minimum Frobenius norm (MFN) update formulas replace the current Hessian approximation B with a new (better) approximation B + in such manner that the Frobenius norm of the change (i.e., B + B ) is minimal, subject to constraints imposed on B + .
The Frobenius norm is a norm induced by the Frobenius (inner) product on the space of n-by-n matrices. The Frobenius product of two matrices is given by
A : B = i = 1 n j = 1 n a i j b i j = tr ( A T B ) = tr ( B T A ) .
Using the Frobenius product, one can express the Frobenius norm of matrix A as
A F 2 = A : A .
Quadratic terms can be expressed with the Frobenius product as
x T A x = A : ( x x T ) .
The Frobenius product introduces the notion of perpendicularity into the set of matrices (not to be confused with the orthogonality of matrices, which is equivalent to Q T Q = I ).
Definition 1.
Two nonzero matrices A and B are perpendicular (denoted by A B ) if A : B = 0 .
The Frobenius product can also be used for expressing linear constraints. A linear equality constraint on matrix X can be formulated as
A : X = a .
The following Lemma provides motivation for the use of minimum Frobenius norm updating.
Lemma 1.
Let H , B , and B + denote the exact, the current approximate, and the updated approximate Hessian, respectively. Suppose we have m linear equality constraints of the form
A i : B + = a i , i = 1 , , m .
imposed on B + . Let P denote the subspace spanned by matrices A i . Then, the corresponding MFN update satisfies
  • ( B + B ) P , and
  • B + H F B H F .
Proof. 
Finding the MFN update is equivalent to minimizing the Frobenius norm of B + B subject to linear equality constraints (10). These constraints define an affine subspace in the n ( n + 1 ) / 2 dimensional space of Hessian matrices, and B + is a member of this affine subspace. Because the true Hessian also satisfies constraints (10), it is also a member of the aforementioned affine subspace.
To simplify the problem, we can translate it in such manner that H becomes 0 . When we do this, the linear constraints become homogeneous, and instead of an affine subspace, they now define an ordinary subspace P . Its orthogonal complement P is spanned by matrices A i . Due to translation, B and B + are replaced by B H and B + H , of which the latter is a member of P . Points with constant B + B F = B + H ( B H ) F lie on a sphere centered at B H . Matrix B + H that corresponds to the smallest B + B F lies on a sphere centered at B H that is tangential to subspace P . Therefore, B + B must be perpendicular to P , i.e., B + B P . This proves the first claim.
Due to B + H P , we can see that B + H and B + B are perpendicular. From B H = B + H ( B + B ) , we have
B H F 2 = B + H F 2 + B + B F 2
The second claim immediately follows from this result. □
Consider a quadratic function
q ( x ) = 1 2 x T H x + g T x + c
where H is its Hessian and g its gradient at x = 0 . Let the current and the updated approximation to q ( x ) be given by
m ( x ) = 1 2 x T B x + g ^ T x + c ^
and
m + ( x ) = 1 2 x T B + x + g ^ + T x + c ^ + ,
respectively. In MFN, updating B + is obtained by minimizing B + B F . The following lemma introduces one such update based on the case when the value of q is known at N + 2 points.
Lemma 2.
Let q 0 , , q N + 1 , where N n denote the values of q ( x ) corresponding to distinct points x 0 , , x N + 1 , respectively. Let v i = x i x 0 and assume i = 1 N + 1 α i v i = 0 with at least one α i 0 . Then the simplicial MFN update satisfying the interpolation conditions m + ( x i ) = q i for i = 0 , 1 , , N + 1 can be computed as
B + = B + β A
where
A = 1 2 i = 1 N + 1 α i v i v i T ,
β = i = 1 N + 1 α i v i T ( H B ) v i 2 A F 2 = i = 1 N + 1 α i 2 ( q i q 0 ) v i T B v i 2 A F 2 .
Proof. 
By assumption we have
q i = q ( x i ) = 1 2 x i T H x i + g T x i + c = 1 2 x 0 + v i T H x 0 + v i + g T x 0 + v i + c
Due to the interpolation conditions, we have N + 2 constraints
q i = m + ( x i ) = 1 2 x 0 + v i T B + x 0 + v i + g ^ + T x 0 + v i + c ^ +
By subtraction, we eliminate c ^ + and obtain N + 1 constraints
q i q 0 = 1 2 v i T B + v i + ( g ^ + + B + x 0 ) T v i i = 1 , .. , N + 1 .
Multiplying (20) with α i and adding the resulting equations yields
1 2 i = 1 N + 1 α i v i T B + v i + ( g ^ + + B + x 0 ) T i = 1 N + 1 α i v i = i = 1 N + 1 α i ( q i q 0 ) .
By assumption, the second term on the left-hand side of (21) vanishes (thus, g ^ + is eliminated). We are left with a single linear constraint on B + :
1 2 i = 1 N + 1 α i v i T B + v i = i = 1 N + 1 α i ( q i q 0 )
which can be rewritten by recalling (8) as
A : B + = i = 1 N + 1 α i ( q i q 0 ) ,
where
A = 1 2 i = 1 N + 1 α i v i v i T .
Equation (23) is a linear constraint on the updated Hessian approximation B + . This is the only constraint on B + . From Lemma 1, we can see that P is spanned by A . Therefore, we can write
B + B = β A .
By computing the Frobenius product of (25) with A and taking into account (23), we arrive at
i = 1 N + 1 α i ( q i q 0 ) A : B = β A : A .
Now we can compute β :
β = i = 1 N + 1 α i ( q i q 0 ) A : B A : A = i = 1 N + 1 α i 2 ( q i q 0 ) v i T B v i 2 A F 2 .
 □
The simplicial update formula introduced by Lemma 2 is the closed-form solution of the equations arising from the MFN update in [8] for N = n . One can see this by comparing the interpolation conditions to those in [8]. Due to the assumption i = 1 N + 1 α i v i = 0 , we can also apply it when N < n . The assumption implies the points x 1 , , x N + 1 are positioned in a specific manner with respect to x 0 (i.e., there exists a nontrivial linear combination i = 1 N + 1 α i ( x i x 0 ) = 0 ).
By choosing N = 1 , we obtain a special case of the simplicial MFN update, where all three distinct points must be collinear to satisfy i = 1 N + 1 α i v i = 0 . Suppose v 1 = v 2 = v and α 1 = α 2 = 1 . Then,
A = v v T ,
β = ( q 1 + q 2 2 q 0 ) v T B v v 4 = q v ( 2 ) ( x 0 ) v T B v v 4
where q v ( 2 ) ( x 0 ) = v T H v is the second directional derivative of q along direction v . The convergence properties of this MFN update formula were analyzed in [9]. The formula was used in the derivative-free optimization algorithm proposed in [10].

3. Uniformly Distributed Orthogonal Matrices

The notion of a uniform distribution over the group of orthogonal matrices ( O n ) can be introduced via the Haar measure [13]. Let A denote a matrix with independent normally distributed elements with zero mean and variance 1. A random orthogonal matrix from the Haar measure ( O ) can then be obtained with Algorithm 1.
Algorithm 1 Constructing a random orthogonal matrix from the Haar measure.
  •  Perform QR decomposition A = Q R .
  •  Construct a diagonal matrix D with d i i = 1 if r i i 0 and d i i = 1 otherwise.
  • O = Q D .
Multiplying O with any unit vector results in a random unit vector that is uniformly distributed on the unit sphere ( S n ) [14]. It can be shown that O T is also a uniformly distributed orthogonal matrix. Consequently, every column and every row of O are a random unit vector with uniform distribution on S n .
The results of this section are obtained with the help of the following lemma.
Lemma 3.
Let x R n and let d σ denote the surface element of S n . Then,
V n ( r 1 , r 2 , , r n ) = x = 1 x 1 2 r 1 x 2 2 r 2 · · x n 2 r n d σ = 2 i = 1 n Γ r i + 1 / 2 Γ n / 2 + i = 1 n r i
Proof. 
See [15], Appendix B. □
From Lemma 3, we can obtain the surface area of S n by choosing r 1 = = r n = 0 .
S n = 2 Γ ( 1 / 2 ) n Γ ( n / 2 )
Let o i and o j denote two random vectors that correspond to the i-th and the j-th column of O . If i j , then o i T o j = 0 . We denote the k-th component of o i as o k i .
Lemma 4.
Let O be a uniformly distributed random orthogonal matrix. Then,
E e k T o i 2 = E o k i 2 = 1 n ,
E e k T o i 2 e l T o i 2 = E o k i 2 o l i 2 = 3 n ( n + 2 ) k = l 1 n ( n + 2 ) k l ,
E e k T o i 2 e l T o j 2 = E o k i 2 o l j 2 = 1 n ( n + 2 ) k = l , i j n + 1 ( n 1 ) n ( n + 2 ) k l , i j .
Proof. 
For proving (31), we can assume without loss of generality k = i = 1 . Because o 1 = u is uniformly distributed on S n , the expected value of f ( u ) can be obtained by computing the mean value of f ( u ) over S n . We use Lemma 3 for expressing the integral over the surface of S n .
E o k i 2 = E o 11 2 = E u 1 2 = S n 1 u = 1 u 1 2 d σ
= S n 1 2 Γ ( 3 / 2 ) Γ ( 1 / 2 ) n 1 Γ ( 1 + n / 2 ) = 1 n .
Regarding (32) for k = l , we have
E o k i 2 o l i 2 = E o k i 4 = S n 1 u = 1 u 1 4 d σ
= S n 1 2 Γ ( 5 / 2 ) Γ ( 1 / 2 ) n 1 Γ ( 2 + n / 2 ) = 3 n ( n + 2 ) .
For k l , we assume without loss of generality k = 1 , l = 2 . From Lemma 3, we have
E o k i 2 o l i 2 = S n 1 u = 1 u 1 2 u 2 2 d σ
= S n 1 2 Γ ( 3 / 2 ) 2 Γ ( 1 / 2 ) n 2 Γ ( 2 + n / 2 ) = 1 n ( n + 2 )
For (33) with k = l , we can show that it is identical to (32) with k l . We have
e k T o i = e k T O e i = e i T O T e k = e i T O T k
where O T k is the k-th column of O T . This implies
e k T o i 2 e k T o j 2 = e i T O T k 2 e j T O T k 2
To confirm (33) for k = l , we take into account that O T is also a random orthogonal matrix from the Haar measure, replace O with O T in (32), and rename i, k, and l to k, i, and j, respectively.
Finally, to prove (33) for k l , we can assume without loss of generality i = k = 1 and j = l = 2 . The cosine of the angle between o 1 and e 2 can be expressed as o 21 = e 2 T o 1 = cos ϕ . Random vector o 2 is orthogonal to o 1 . Its realizations cover a unit sphere in an n 1 -dimensional subspace B orthogonal to o 1 . Unit vectors b 1 , , b n 1 form an orthogonal basis for this subspace. Note that b i T o 1 = 0 . The conditional probability density distribution of o 2 is uniform on the aforementioned unit sphere in B . Vector o 2 can be expressed as
o 2 = i = 1 n 1 η i b i , i = 1 n 1 η i 2 = 1 ,
where vector η 1 , , η n 1 is uniformly distributed on S n 1 . Without loss of generality we can choose vectors b i in such manner that e 2 = o 1 cos ϕ + b 1 sin ϕ , where ϕ is the angle between e 2 and o 1 . Now we have
o 22 = e 2 T o 2 = i = 1 n 1 o 1 cos ϕ + b 1 sin ϕ T η i b i = η 1 sin ϕ
and
o 11 2 o 22 2 = ( e 1 T o 1 ) 2 ( e 2 T o 2 ) 2 = o 11 2 η 1 2 sin 2 ϕ = η 1 2 o 11 2 1 o 21 2 .
Next, we can express
E o k i 2 o l j 2 = E η 1 2 E o 11 2 o 11 2 o 21 2 ,
where the first expected value refers to η 1 , , η n 1 S n 1 and the second one to o 1 S n . Using Lemma 3 and previously proven (32), we arrive at
E η 1 2 E o 11 2 o 11 2 o 21 2 = 1 n 1 1 n 1 n ( n + 2 ) = n + 1 ( n 1 ) n ( n + 2 )
 □

4. Convergence of the Proposed Update

Multiplying vectors in a prototype set D = { d 1 , , d N + 1 } with a uniformly distributed random orthogonal matrix O results in a set of random vectors V = { v 1 , , v N + 1 } such that every v / v is uniformly distributed on S n . The angles between the vectors in a realization of such a set are identical to the angles between the corresponding vectors from the prototype set.
Suppose one is interested in the expected amount of improvement resulting from one application of the update formula from Lemma 2. We assume that the N + 2 points where the function value is computed comprise x 0 and additional N + 1 points generated using a random orthogonal matrix O and a prototype set of vectors { d 1 , , d N + 1 } in the following manner.
x i = x 0 + O d i = x 0 + v i , i = 1 , , N + 1 .
First, we prove an auxiliary lemma.
Lemma 5.
Let a , b , and O denote two unit vectors with cos φ = a T b and a uniformly distributed orthogonal matrix, respectively. Let u = O a and v = O b . Then,
E u k 2 v l 2 = 1 + 2 cos 2 φ n ( n + 2 ) k = l n + 1 2 cos 2 φ ( n 1 ) n ( n + 2 ) k l
Proof. 
Without loss of generality, the coordinate system can be rotated in such manner that a = e 1 and b = e 1 cos φ + e 2 sin φ . Then, we have
u = o 1 ,
v = o 1 cos φ + o 2 sin φ .
For k = l , we have
E u k 2 v k 2 = E o k 1 2 o k 1 cos φ + o k 2 sin φ 2
= E o k 1 4 cos 2 φ + E o k 1 2 o k 2 2 sin 2 φ + 2 E o k 1 3 o k 2 cos φ sin φ
The last term vanishes because the integral of odd powers of o i j over S n is zero. By invoking Lemma 4, we arrive at
E u k 2 v k 2 = 3 cos 2 φ n ( n + 2 ) + sin 2 φ n ( n + 2 ) = 1 + 2 cos 2 φ n ( n + 2 )
For k l ,
E u k 2 v l 2 = E o k 1 2 o l 1 cos φ + o l 2 sin φ 2 = E o k 1 2 o l 1 2 cos 2 φ + E o k 1 2 o l 2 2 sin 2 φ
+ 2 E o k 1 2 o l 1 o l 2 cos φ sin φ
The last term vanishes due to odd powers of o i j . Together with Lemma 4, we have
E u k 2 v l 2 = cos 2 φ n ( n + 2 ) + ( n + 1 ) sin 2 φ ( n 1 ) n ( n + 2 ) = n + 1 2 cos 2 φ ( n 1 ) n ( n + 2 ) .
 □
Lemma 6.
Let { d 1 , , d N + 1 } be a prototype set of vectors satisfying i = 1 N + 1 α i d i = 0 , where all α i 0 and at least one α i 0 . Let O be a uniformly distributed random orthogonal matrix, and let v i = O d i with d i d j cos φ i j = d i T d j = v i T v j . Then, the MFN update formula from Lemma 2 involving N + 2 points ( x 0 and the additional N + 1 points constructed according to (47)) satisfies
E B + B F 2 = E β 2 A F 2 = ( γ 1 γ 2 ) B H F 2 + γ 2 tr B H 2
where
γ 1 = μ + 2 n ( n + 2 )
γ 2 = ( n + 1 ) μ 2 ( n 1 ) n ( n + 2 )
μ = i = 1 N + 1 α i d i 2 2 A F 2 = i = 1 N + 1 α i d i 2 2 i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 cos 2 φ i j
Proof. 
By repeating the reasoning in the proof of Lemma 2 on (18), we obtain
A : H = i = 1 N + 1 α i ( q i q 0 )
which yields together with the expression for β from Lemma 2
β = H B : A A : A = i = 1 N + 1 α i v i T H B v i 2 A F 2 .
Now we can express
β 2 A F 2 = i = 1 N + 1 α i v i T B H v i 2 4 A F 2 .
Because vectors v i are uniformly distributed on the unit sphere, we can rotate the coordinate system without affecting E β 2 A F 2 so that B H is diagonalized.
E β 2 A F 2 = E i = 1 N + 1 α i v i T D v i 2 4 A F 2 .
Let v i k denote the k-th component of vector v i and λ k the k-th eigenvalue of B H (k-th diagonal element of D ).
E β 2 A F 2 = E i = 1 N + 1 j = 1 N + 1 α i α j k = 1 n l = 1 n v i k 2 v j l 2 λ k λ l 4 A F 2 .
We can rewrite (65) as
E β 2 A F 2 = k = 1 n i = 1 N + 1 j = 1 N + 1 α i α j E i k j k λ k 2 + k l i = 1 N + 1 j = 1 N + 1 α i α j E i k j l λ k λ l 4 A F 2
The expected value of v i k 2 v j l 2 depends on cos φ i j = v i T v j . From Lemma 5, we have
E i k j l = E v i k 2 v j l 2 = 1 + 2 cos 2 φ i j n ( n + 2 ) v i 2 v j 2 k = l n + 1 2 cos 2 φ i j ( n 1 ) n ( n + 2 ) v i 2 v j 2 k l
Because the eigenvalues of D are the same as the eigenvalues of B H , we have
B H F 2 = D F 2 = k = 1 n λ k 2 ,
tr B H 2 = tr D 2 = k = 1 n λ k 2 = k = 1 n l = 1 n λ k λ l
and
( γ 1 γ 2 ) B H F 2 + γ 2 tr B H 2 = ( γ 1 γ 2 ) k = 1 n λ k 2 + γ 2 k = 1 n l = 1 n λ k λ l = γ 1 k = 1 n λ k 2 + γ 2 k l λ k λ l
Note that v i = d i . Taking into account (66), (67), and (70) yields
γ 1 = i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 1 + 2 cos 2 φ i j 4 n ( n + 2 ) A F 2 ,
γ 2 = i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 n + 1 2 cos 2 φ i j 4 ( n 1 ) n ( n + 2 ) A F 2 .
The Frobenius norm of A can be expressed as
A F 2 = tr A T A = tr 1 2 i = 1 N + 1 α i v i v i T T 1 2 j = 1 N + 1 α j v j v j T = 1 4 i = 1 N + 1 j = 1 N + 1 α i α j tr v i v i T v j v j T = 1 4 i = 1 N + 1 j = 1 N + 1 α i α j v i T v j 2 = 1 4 i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 cos 2 φ i j
By substituting (73) in (71) and (72), we arrive at
γ 1 = 8 A F 2 + i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 4 n ( n + 2 ) A F 2 ,
γ 2 = ( n + 1 ) i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 8 A F 2 4 ( n 1 ) n ( n + 2 ) A F 2 .
We also have
i = 1 N + 1 j = 1 N + 1 α i α j d i 2 d j 2 = i = 1 N + 1 α i d i 2 2
Substituting (76) into (74) and (75) concludes the proof. □
Theorem 1.
Let γ 1 , γ 2 , and μ be defined as in Lemma 6. Then,
E B + H F 2 B H F 2 1 2 ( n μ ) ( n 1 ) n ( n + 2 ) μ 2 n + 1 1 μ n μ < 2 n + 1 .
Proof. 
We start with the following identity.
B + H + B B + = B H .
Computing the Frobenius norm on both sides and considering ( B + B ) ( B + H ) results in
B + H F 2 + B + B F 2 = B H F 2 .
Taking into account (15) results in
B + H F 2 = B H F 2 β 2 A F 2 .
After Lemma 6 is applied, we have
E B + H F 2 B H F 2 = 1 γ 1 γ 2 + γ 2 tr B H 2 B H F 2
By definition, μ 0 and γ 1 0 . For γ 2 0 , we must have μ 2 / ( n + 1 ) . By considering tr ( B H ) 2 / B H F 2 0 , we arrive at
E B + H F 2 B H F 2 1 γ 1 γ 2 = 1 2 ( n μ ) ( n 1 ) n ( n + 2 ) .
For γ 2 < 0 , we must have μ < 2 / ( n + 1 ) . Invoking Lemma A1 yields
E B + H F 2 B H F 2 1 γ 1 γ 2 + n γ 2 = 1 γ 1 + ( n 1 ) γ 2 = 1 μ n .
 □
From Theorem 1, several results can be derived. First, we will assume the prototype set is a regular N-simplex (i.e., comprises N + 1 vectors positively spanning an N-dimensional subspace). This case is interesting because the update formula in [9] is obtained for N = 1 . We are going to show that our estimate of the expected Hessian improvement is identical to the one published in [9]. This update formula (with N = 1 ) was used in an optimization algorithm published in [10].
Next, we are going to show that using a regular n-simplex as the prototype set is a bad choice. According to Theorem 1, no improvement of the Hessian is guaranteed. Even worse, we show that improvement occurs only at the first application of the update formula.
Finally, we will analyze the case where the prototype set is what we refer to as an augmented set of N orthonormal vectors. Such a prototype set with N = n was used in the optimization algorithm published in [14].
Corollary 1.
Let D be a regular N-simplex ( N n ). Then,
E B + H F 2 B H F 2 1 2 ( n N ) ( n 1 ) n ( n + 2 )
Proof. 
For all i j , we have cos φ i j = N 1 and d i = 1 . Because the sum of all vectors in a regular N-simplex is 0 , we conclude α i = 1 and
μ = i = 1 N + 1 α j 2 i = 1 N + 1 j = 1 N + 1 α i α j cos 2 φ i j = ( N + 1 ) 2 ( N + 1 ) · 1 + N ( N + 1 ) · N 2 = N
Because 2 / ( n + 1 ) 1 for all n 1 , we have μ 2 / ( n + 1 ) , and the result follows from Theorem 1. □
Corollary 1 implies that the most efficient approach to MFN updating with a regular simplex in the role of the prototype set of unit vectors is to use a regular 1-simplex (three collinear points).
Corollary 2.
For N = 1 and d 1 = d 2 , set D is a regular 1-simplex and
E B + H F 2 B H F 2 1 2 n ( n + 2 ) .
This result was proven in [9] with a less general approach. Here, we obtain it as a special case of Corollary 1 for N = 1 .
According to Corollary 1, there is no guaranteed improvement of B H F if a regular n-simplex ( N = n ) is used in the update process. In fact, the situation is even worse as we show in the following Lemma.
Lemma 7.
If D is a regular n-simplex ( N = n ), then the MFN update from Lemma 2 improves the Hessian approximation only in its first application.
Proof. 
From Lemma 2, we can see that
B + = B + β A
where (see (62))
β = i = 1 N + 1 α i v i T H B v i 2 A F 2 .
For a regular simplex, α i = 1 and d i = 1 . The Frobenius norm of A is
A F 2 = 1 4 i = 1 N + 1 j = 1 N + 1 cos 2 φ i j = 1 4 · 1 · ( n + 1 ) + 1 n 2 · n ( n + 1 ) = ( n + 1 ) 2 4 n
Due to Lemma A3 (see Appendix A for proof), we have
i = 1 N + 1 α i v i T H B v i = n + 1 n tr ( H B )
and
β = 2 n + 1 tr ( H B )
From definition of A , we obtain
tr ( A ) = 1 2 i = 1 n + 1 α i tr v i v i T = 1 2 i = 1 n + 1 α i v i 2 = n + 1 2
From B + H = B H + β A , we can express
tr ( B + H ) = tr ( B H ) + β tr ( A ) = 0 .
Let B ++ denote the approximate Hessian after the second application of the update formula.
B ++ = B + + β + A +
Because β + = 2 tr ( H B + ) / ( n + 1 ) = 0 , we have B ++ = B + , and the proof is complete. □
Intuition can mislead one into considering the regular n-simplex as the best choice for positioning n + 1 points around an origin x 0 when computing an MFN update based on Lemma 2. Lemma 7 shows the exact opposite—a regular n-simplex is the worst choice because the update formula does not improve the Hessian approximation in its second and all subsequent applications.
Definition 2.
An augmented set of 1 N n orthonormal vectors is a set comprising N mutually orthogonal unit vectors e 1 , , e N and their normalized negative sum N 1 / 2 ( e 1 + + e N ) .
Note that an augmented set of N = 1 orthonormal vectors is equivalent to a regular 1-simplex. Now, we have d i = 1 and cos φ i i = 1 . For i j , we have cos φ i j = 0 except for i = n + 1 or j = n + 1 when cos φ i j = N 1 / 2 . Because d N + 1 is the normalized negative sum of the first N vectors, α 1 = = α n = 1 and α n + 1 = N 1 / 2 .
Corollary 3.
If the prototype set of unit vectors is an augmented set of N orthonormal vectors, then
E B + H F 2 B H F 2 1 2 n N N 1 / 2 ( n 1 ) n ( n + 2 ) .
Proof. 
μ = j = 1 N + 1 α i 2 i = 1 N + 1 j = 1 N + 1 α i α j cos 2 φ i j = N · 1 + N 1 / 2 2 N · 1 · 1 · 1 2 + 1 · N 1 / 2 · N 1 / 2 · 1 2 + 2 N · 1 · N 1 / 2 · N 1 = N ( N 1 / 2 + 1 ) 2 2 N 1 / 2 ( N 1 / 2 + 1 ) = N + N 1 / 2 2
Because 2 / ( n + 1 ) 1 for all n 1 , we conclude μ 2 / ( n + 1 ) , and the result follows from Theorem 1. □
A special case of Corollary 1 is the following result.
Corollary 4.
If the prototype set of unit vectors is an augmented set of N = n orthonormal vectors, then
E B + H F 2 B H F 2 1 1 ( n + n 1 / 2 ) ( n + 2 ) .
Corollaries 2 and 4 indicate that for an augmented set of N = n orthonormal vectors (used in [12]), the expected improvement of the approximate Hessian approaches half of the improvement obtained using a regular 1-simplex (introduced in [9]) when n approaches infinity.
Corollaries 1–4 indicate that the update formula yields a greater improvement of the approximate Hessian when the prototype set of vectors exhibits more directionality, in the sense that the vectors are confined to an N < n dimensional subspace of the search space. Lower values of N result in faster convergence.

5. Example

We illustrate the proposed update with a simple example. The sequence of uniformly distributed orthogonal matrices is generated as in [14]. Three prototype sets are examined—regular 1-dimensional and ( n 1 ) -dimensional simplex and the augmented set of n orthonormal vectors. The true Hessian H is chosen randomly and the initial Hessian approximation is set to B = 0 . The progress of the update is measured by the normalized Frobenius distance between H and B .
Figure 1 depicts the progress of the proposed update with various prototype sets for n = 5 and n = 10 . It is clearly visible that the convergence of the update is linear and depends on the choice of the prototype set. The convergence rate of the update using an augmented set of n orthonormal vectors is approximately half of the convergence rate exhibited by the update using a regular 1-simplex. It can also be seen that the bound on the amount of progress obtained from one update (Theorem 1) is fairly conservative. The actual progress of the update is much better in practice.

6. Discussion

The convergence of a Hessian update formula that requires only function values for computing the update was analyzed. The update formula is based on the formula published in [8] that generally requires the function values at m n + 2 points. The proposed update is based on the case where m = n + 2 . An additional requirement is introduced, namely that the m 1 vectors from the central point to the remaining m 1 points must positively span a m 2 dimensional subspace of R n . This requirement extends the usability of the proposed update to sets of points with 3 m n + 2 members. The set of m points used by the update is generated by adding m 1 vectors to a central point in the set. The vectors are obtained by applying a random orthogonal transformation to a prototype set of vectors that spans a m 2 dimensional subspace of R n .
A lower bound on the expected improvement of the Hessian approximation was derived (Theorem 1). Up to now, no such result was published for the update from [8] and m = n + 2 . The obtained result was applied to several different prototype sets. The general result obtained for the case when the prototype set is a regular m 2 -dimensional simplex (Corollary 1) shows that the expected improvement of the Hessian approximation is greatest for m = 3 (i.e., 1-dimensional regular simplex) and decreases as the dimensionality of the simplex increases. The special case when m = 3 (1-dimensional regular simplex) corresponds to the update from [9]. The lower bound on the expected improvement obtained with our general result (Corollary 2) matches the one that was published in [9]. For the n-dimensional regular simplex, our result indicates that the lower bound on expected improvement of the Hessian approximation is 0. Furthermore, it was shown that the Hessian approximation is possibly improved only by the first application of the proposed update formula (Lemma 7). Therefore, the use of the n-dimensional regular simplex in the role of the prototype set is a bad choice.
Next, the expected improvement of the approximate Hessian for a prototype set comprising N n orthogonal vectors and their normalized negative sum was derived. Such a prototype set with N = n was used in the optimization algorithm published in [12]. It was shown that using this kind of prototype set does guarantee a positive lower bound on the expected improvement of the Hessian approximation (Corollary 4). The general result (Corollary 3), however, again indicates that using a prototype set of lower dimensionality results in faster convergence. The result for N = 1 (two collinear vectors in the role of the prototype set) is the same as the one obtained for the update from [9].
Finally, the results were illustrated by running the proposed update on a quadratic function with a randomly chosen Hessian for several choices of the prototype set. The observed progress was compared to the lower bound predicted by Theorem 1. The results indicate that the lower bound is quite pessimistic, and that the actual progress is faster. The observed performance was closest to the predicted lower bound for the update formula from [9].

Author Contributions

Conceptualization, Á.B. and T.T.; methodology, Á.B. and J.O.; software, J.O.; validation, J.O.; formal analysis, Á.B.; investigation, Á.B.; resources, T.T.; data curation, Á.B.; writing—original draft preparation, Á.B.; writing—review and editing, Á.B. and J.O.; visualization, J.O.; supervision, T.T.; project administration, T.T.; funding acquisition, T.T. All authors have read and agreed to the published version of the manuscript.

Funding

The research was co-funded by the Ministry of Education, Science, and Sport (Ministrstvo za Šolstvo, Znanost in Šport) of the Republic of Slovenia through the program P2-0246 ICT4QoL—Information and Communications Technologies for Quality of Life.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank the anonymous referees for their useful comments that helped to improve the paper. Most notably, the authors would like to thank the second referee whose suggestion lead to the simplification of the proof of Lemma 4.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MFNMinimum Frobenius norm
BFGSBroyden-Fletcher-Goldfarb-Shanno
SR1Symmetric rank-one

Appendix A

The following lemma is used in the proof of the main result.
Lemma A1.
Let B be a n × n matrix. Then,
tr B 2 n B F 2
Proof. 
Let λ i R denote the n eigenvalues of B . We have
tr ( B ) 2 = i = 1 n λ i 2 ,
B F 2 = i = 1 n λ i 2 = a .
The maximum of tr ( B ) 2 can be obtained by finding max i = 1 n λ i 2 subject to i = 1 n λ i 2 = a . The solution of this problem is
| λ i | = ( a / n ) 1 / 2 i = 1 , 2 , , n .
Considering i = 1 n λ i n ( a / n ) 1 / 2 along with (A2) and (A3) concludes the proof. □
Let S be a n × ( n + 1 ) matrix whose columns are the vectors comprising a regular simplex in n dimensions. By definition, the following must hold
S T S = 1 n 1 n 1 n 1 1 n 1 n 1 n 1 1 ( n + 1 ) × ( n + 1 ) = C
Clearly, there are infinitely many possible solutions to (A5). We will assume that S is upper triangular. A solution to (A5) with this property is unique and can be obtained via Cholesky decomposition of the submatrix of C comprising the first n rows and columns which yields the first n columns of S . The last column is then obtained as the negative sum of the first n columns. Matrix S is in row echelon form and represents what we will refer to as the standard regular simplex. Its components can be expressed as
s i i 2 = ( n + 1 ) ( n i + 1 ) n ( n i + 2 )
s i j = s i i n i + 1 j > i 0 otherwise
Lemma A2.
Let columns of V represent a regular simplex. Then,
V V T = S S T = n + 1 n I n × n
Proof. 
Let columns of S comprise a standard regular simplex. Diagonal elements of S S T can be obtained as
i = 1 n + 1 s k i 2 = i = k n + 1 s k i 2 = s k k 2 + ( n k + 1 ) s k ( k + 1 ) 2 = s k k 2 · n k + 2 n k + 1 = n + 1 n .
Because S S T is symmetric, we assume k > l for computing the extradiagonal elements
i = 1 n + 1 s k i s l i = i = k n + 1 s k i s l i = s l ( l + 1 ) i = k n + 1 s k i = s l ( l + 1 ) s k k + ( n k + 1 ) · s k ( k + 1 )
= s l ( l + 1 ) s k k ( n k + 1 ) s k k n k + 1 = 0
This proves S S T = ( n + 1 ) I n × n / n . Any regular simplex V can be expressed with the standard regular simplex as V = Q S , where Q is an orthogonal matrix. Therefore, we have
V V T = Q S S T Q T = n + 1 n Q I n × n Q T = n + 1 n I n × n
 □
Lemma A3.
Let columns of V represent a regular simplex, and let H be a symmetric matrix. Then,
i = 1 n + 1 v i T H v i = n + 1 n tr ( H )
Proof. 
i = 1 n + 1 v i T H v i = tr ( V T H V ) = tr H : V V T = n + 1 n tr H : I = n + 1 n i = 1 n h i i
 □

References

  1. Audet, C.; Kokkolaras, M. Blackbox and derivative-free optimization: Theory, algorithms and applications. Optim. Eng. 2016, 17, 1–2. [Google Scholar] [CrossRef] [Green Version]
  2. Powell, M.J.D. A Direct Search Optimization Method That Models the Objective and Constraint Functions by Linear Interpolation. In Advances in Optimization and Numerical Analysis. Mathematics and Its Applications; Gomez, S., Hennart, J.P., Eds.; Springer: Dordrecht, The Netherlands, 1994; Volume 275, pp. 51–67. [Google Scholar]
  3. Powell, M.J.D. UOBYQA: Unconstrained optimization by quadratic approximation. Math. Program. 2002, 92, 555–582. [Google Scholar] [CrossRef]
  4. Buhmann, M.D. Radial Basis Functions: Theory and Implementations, Volume 12 of Cambridge Monographs on Applied and Computational Mathematics; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar]
  5. Suykens, J.A.K. Nonlinear modelling and support vector machines. In Proceedings of the IMTC 2001 18th IEEE Instrumentation and Measurement Technology Conference. Rediscovering Measurement in the Age of Informatics, Budapest, Hungary, 21–23 May 2001; IEEE: Piscataway, NJ, USA, 2001; pp. 287–294. [Google Scholar]
  6. Fang, Z.Y.; Roy, K.; Chen, B.; Sham, C.-W.; Hajirasouliha, I.; Lim, J.B.P. Deep learning-based procedure for structural design of cold-formed steel channel sections with edge-stiffened and un-stiffened holes under axial compression. Thin-Walled Struct. 2021, 166, 108076. [Google Scholar] [CrossRef]
  7. Nocedal, J.; Wright, S.J. Numerical Optimization, 2nd ed.; Springer: New York, NY, USA, 2006. [Google Scholar]
  8. Powell, M.J.D. Least Frobenius norm updating of quadratic models that satisfy interpolation conditions. Math. Program. 2004, 100, 183–215. [Google Scholar] [CrossRef]
  9. Leventhal, D.; Lewis, A.S. Randomized Hessian estimation and directional search. Optimization 2011, 60, 329–345. [Google Scholar] [CrossRef]
  10. Bűrmen, Á.; Olenšek, J.; Tuma, T. Mesh adaptive direct search with second directional derivative-based Hessian update. Comput. Optim. Appl. 2015, 62, 693–715. [Google Scholar] [CrossRef]
  11. Stich, S.U.; Müller, C.L. On Spectral Invariance of Randomized Hessian and Covariance Matrix Adaptation Schemes. In Proceedings of the Parallel Problem Solving from Nature—PPSN XII: 12th International Conference, Taormina, Italy, 1–5 September 2012; Coello Coello, C.A., Cutello, V., Deb, K., Forrest, S., Nicosia, G., Pavone, M., Eds.; Springer: New York, NY, USA, 2012; pp. 448–457. [Google Scholar]
  12. Bűrmen, Á.; Fajfar, I. Mesh adaptive direct search with simplicial Hessian update. Comput. Optim. Appl. 2019, 74, 645–667. [Google Scholar] [CrossRef]
  13. Stewart, G.W. The efficient generation of random orthogonal matrices with an application to condition estimators. SIAM J. Numer. Anal. 1980, 17, 403–409. [Google Scholar] [CrossRef]
  14. Bűrmen, Á.; Tuma, T. Generating Poll Directions for Mesh Adaptive Direct Search with Realizations of a Uniformly Distributed Random Orthogonal Matrix. Pac. J. Optim. 2016, 12, 813–832. [Google Scholar]
  15. Sykora, S. Quantum Theory and the Bayesian Inference Problems. J. Stat. Phys. 1974, 11, 17–27. [Google Scholar] [CrossRef]
Figure 1. Progress of three simplicial updates for n = 5 (left) and n = 10 (right). Dashed lines represent the progress of the update assuming every update application improves the approximate Hessian by the amount predicted in Theorem 1.
Figure 1. Progress of three simplicial updates for n = 5 (left) and n = 10 (right). Dashed lines represent the progress of the update assuming every update application improves the approximate Hessian by the amount predicted in Theorem 1.
Mathematics 09 01775 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Bűrmen, Á.; Tuma, T.; Olenšek, J. Randomized Simplicial Hessian Update. Mathematics 2021, 9, 1775. https://doi.org/10.3390/math9151775

AMA Style

Bűrmen Á, Tuma T, Olenšek J. Randomized Simplicial Hessian Update. Mathematics. 2021; 9(15):1775. https://doi.org/10.3390/math9151775

Chicago/Turabian Style

Bűrmen, Árpád, Tadej Tuma, and Jernej Olenšek. 2021. "Randomized Simplicial Hessian Update" Mathematics 9, no. 15: 1775. https://doi.org/10.3390/math9151775

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop