Next Article in Journal
Boundedness of Vector Linéard Equation with Multiple Variable Delays
Next Article in Special Issue
Spherical Gravity Forwarding of Global Discrete Grid Cells by Isoparametric Transformation
Previous Article in Journal
John von Neumann’s Space-Frequency Orthogonal Transforms
Previous Article in Special Issue
Predator–Prey Model Considering Implicit Marine Reserved Area and Linear Function of Critical Biomass Level
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Efficient Limited Memory Multi-Step Quasi-Newton Method

by
Issam A. R. Moghrabi
1,2,* and
Basim A. Hassan
3
1
Department of Information Systems and Technology, Kuwait Technical College, Abu-Halifa 54753, Kuwait
2
Department of Computer Science, School of Arts and Science, University of Central Asia, Naryn 722918, Kyrgyzstan
3
Department of Mathematics, College of Computer Sciences and Mathematics, University of Mosul, Mosul 41002, Iraq
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(5), 768; https://doi.org/10.3390/math12050768
Submission received: 22 January 2024 / Revised: 15 February 2024 / Accepted: 17 February 2024 / Published: 4 March 2024

Abstract

:
This paper is dedicated to the development of a novel class of quasi-Newton techniques tailored to address computational challenges posed by memory constraints. Such methodologies are commonly referred to as “limited” memory methods. The method proposed herein showcases adaptability by introducing a customizable memory parameter governing the retention of historical data in constructing the Hessian estimate matrix at each iterative stage. The search directions generated through this novel approach are derived from a modified version closely resembling the full memory multi-step BFGS update, incorporating limited memory computation for a singular term to approximate matrix–vector multiplication. Results from numerical experiments, exploring various parameter configurations, substantiate the enhanced efficiency of the proposed algorithm within the realm of limited memory quasi-Newton methodologies category.

1. Introduction

Unconstrained Optimization is concerned with the minimization of a specific objective function as follows:
minimize f x , where f :   R n     R .
The aforementioned problem can be solved by employing a class of methods referred to as the quasi-Newton techniques for unconstrained optimization. Only the function and its first derivatives are required for quasi-Newton (QN) methods [1]. The Hessian does not need to be available or even coded. To integrate updates in the function and the corresponding gradient, an approximation matrix to the actual Hessian is retained and updated across the iterations.
Given B i , the current Hessian approximation, a new Hessian estimation B i + 1 needs to be constructed, for the new solution estimate x i + 1 . To find B i + 1 , we can utilize the Taylor series of order one to the gradient vector about the point x i + 1 to obtain the following relation (known as the Secant equation) [2,3,4,5,6]:
B i + 1 s i = y i ,
where
s i = x i + 1 x i ,
and
y i = g i + 1 g i ,
where g i g x i ,   i s   t h e   g r a d i e n t   a t   i t e r a t e   x i . The computed Hessian matrix approximations must satisfy (1). To estimate the next Hessian approximation matrix B i + 1 using the current matrix B i and the vectors s i and y i , an update of the form B i + 1 =   B i + C i is applied, where C i is a correctional update matrix. It may be preferable to use H i + 1 = H i + D i , where D i is an update term and H i + 1 = B i + 1 1 . The update can be of rank one or two. A widely used rank two correction formula is referred to as the BFGS formula. The BFGS is given by
B i + 1 B F G S = B i + y i y i T y i T s i B i s i s i T B i s i T B i s i ,  
and the corresponding inverse matrix as
H i + 1 B F G S = H i + 1 + y i T H i y i y i T s i s i s i T y i T s i s i y i T H i + H i y i s i T y i T s i .
Reported numerical outcomes show that this formula outperforms other updating formulas, particularly when the line search is inexact. BFGS is regarded as a standard update formula [3,5,7,8].
The search direction for the standard quasi-Newton methods is computed using
B i p i = g i .

2. Literature Review

Limited memory QN methods (L-BFGS) have gained considerable prominence in optimization due to their ability to handle large-scale problems efficiently, especially with memory constraints. These iterative methods are mostly classified as quasi-Newton techniques [9,10,11,12]. The versatility of L-BFGS methods is evident in their application across various domains. In machine learning, they are integral to training support vector machines (SVMs), optimizing logistic regression models, and fine-tuning neural networks. Their memory-efficient design makes them invaluable when working with large datasets. Additionally, L-BFGS plays a pivotal role in structural optimization, a field where finite element analysis involves high-dimensional design spaces. In computational chemistry, L-BFGS aids in optimizing molecular structures, facilitating breakthroughs in drug discovery and materials science.
Nevertheless, L-BFGS distinguishes itself from the classical quasi-Newton methods by storing only a limited number of recent iterations data, often referred to as memory pairs ( s i , y i ), where s i , represents the alteration in the optimization variable, and y i denotes the change in the gradient (see (1)). These memory pairs are instrumental in updating the approximate Hessian matrix, achieving a balance between computational efficiency and optimization accuracy.
In the past decade, L-BFGS has seen significant methodological advancements and the emergence of new variants tailored to address specific optimization challenges. For example, L-BFGS-B extends the method to handle bound-constrained optimization problems effectively [10]. Innovations like L-BFGS-TF and L-BFGS++ have been introduced to enhance scalability and convergence properties. These methodological innovations have substantially widened the scope of L-BFGS applications, encompassing areas such as machine learning, deep learning, and numerical simulations.
Limited memory BFGS algorithms often derive using the updated identity for the inverse Hessian given as [10]
H i + 1 = I ρ i s i y i T H i I ρ i y i s i T + ρ i s i s i T ,
where s i and y i are as in (1) and ρ i = 1 / s i T y i .
For a given identity matrix I, a sequence of vectors q i m , . . ,   q m is defined such that q i I ρ i s i y i T q i + 1 . Then, the quantities q i and q i + 1 are recursively computed. For more details see [10].
Next, we present some successful limited-memory BFGS formulas. These formulas are all variations of the standard L-BFGS formula, and they are designed to enhance the numerical behavior of the L-BFGS algorithm for certain types of problems. One of the well-known limited memory methods is the Liu–Nocedal formula [9]. The updated Hessian matrix is the previous Hessian matrix updated with a correction term. The correction term is a weighted sum of the two recent steps and gradient differences, where the weights are chosen to make certain that the updated Hessian matrix is positive-definite. The formula is given as
H i = H i 1 + s i 1 s i 1 T s i 1 T y i 1 H i 1 y i 1 y i 1 T H i 1 y i 1 T H i 1 y i 1 + ρ i s i 2 s i 2 T s i 2 T y i 2 ρ i H i 1 y i 2 y i 2 T H i 1 y i 2 T H i 1 y i 2
Another successful formula is the Byrd–Nocedal–Schnabel formula [13]. The correction term is a weighted sum of the latest m steps and gradient differences, where the weights are chosen to ascertain that the newly computed Hessian matrix is positive-definite. The formula is defined as
H i = H i 1 + s i 1 s i 1 T s i 1 T y i 1 H i 1 y i 1 y i 1 T H i 1 y i 1 T H i 1 y i 1 + j = i m i 1 ρ i s j s j T s j T y j j = i m i 1 ρ i H i 1 y j y j T H i 1 y j T H i 1 y j .
A variation of (3) is the Zhu–Byrd–Nocedal formula [14]. The correction component is a weighted sum of the latest m steps and gradient differences, where the weights are also chosen to make certain that the newly computed Hessian matrix is positive-definite. Additionally, the correction term includes a term that is designed to improve the behavior of the L-BFGS algorithm for problems with non-smooth objective functions [15]. The formula is defined by
H i = H i 1 + s i 1 s i 1 T s i 1 T y i 1 H i 1 y i 1 y i 1 T H i 1 y i 1 T H i 1 y i 1 + j = i m i 1 ρ i s j s j T s j T y j j = i m i 1 ρ i H i 1 y j y j T H i 1 y j T H i 1 y j + ρ i s i m s i m T s i m T y i m ρ i H i 1 y i m y i m T H i 1 y i m T H i 1 y i m .
In (3) and (4), the parameter m controls the number of past update terms that are used to update the Hessian matrix. A larger value of m will result in a more precise approximation of the Hessian matrix, closer to that of the standard QN methods.
Another idea, presented in [16], focuses on applying regularized Newton methods in a versatile category of unconstrained optimization algorithms. The method they propose has the potential to combine favorable characteristics from both approaches. The primary emphasis is on the integration of regularization with limited memory quasi-Newton methods, leveraging the unique structure inherent in limited memory algorithms. The paper explores an alternative globalization technique, akin to the less familiar siblings of line search and trust-region methods, known as regularized Newton methods. The derived methods are referred to as regularized quasi-Newton methods. The methods utilize the relationship:
( B i + μ i I ) p i = g i ,
which is used in the computation of the search direction p i , instead of the one used by the standard quasi-Newton methods. The parameter μ i serves as the regularization parameter. The derived methods integrate features from both line search and trust region techniques and update a limited memory version of the Hessian matrix estimate. The method displays good behavior overall though it involves the solution of a system of linear equations at each iteration, which increases the time complexity of the algorithm.
A recent paper also introduces a limited memory quasi-Newton type method for unconstrained multi-objective optimization problems [17], which is suitable for large-scale scenarios. The proposed algorithm approximates the convex combination of the Hessian matrices of the objective function using a positive definite matrix. The update formula for the approximation matrix is an extension of the one used in the L-BFGS method for scalar optimization [18]. The method is well defined even in the nonconvex case and exhibits global and R-linear convergence to Pareto optimality for twice continuously differentiable strongly convex problems. In the method derived in [17], the matrix used to update the inverse Hessian approximation is given by
H i + 1 = I ρ i s i u i T H i I ρ i u i s i T + ρ i s i s i T ,
where ρ i = 1 / s i T u i and u i is the summation of the previous y i dictated by the number of vectors retained in memory. Those y values are multiplied by optimal Lagrange multipliers needed to achieve global convergence.
The performance of the algorithm is evaluated through computational comparisons with state-of-the-art Newton and quasi-Newton approaches in multi-objective optimization. The results show that the proposed approach is generally efficient. While testing the method, it failed to converge on some of our test functions, but it did well on others and managed to find global minimizers.
The paper in [19] presents a new numerical method for solving large-scale unconstrained optimization problems, derived from a modified BFGS-type update. The update formula is extended to the framework of a limited memory scheme. The update utilized in that paper takes the form
H i + 1 = I ρ i s i u i T H i I ρ i u i s i T + ρ i s i s i T ,
where ρ i = 1 / s i T u i and u i = y i + γ i s i , for some scalar γ i to ensure convergence. The paper discusses the global convergence and convergence rate of the algorithm with weak Wolfe–Powell line search.
A modified q-BFGS algorithm is proposed for nonlinear unconstrained optimization problems [20]. It uses a simple symmetric positive definite matrix and a new q-quasi-Newton equation to build an approximate q-Hessian. The method preserves global convergence properties without assuming the convexity of the objective function. Numerical results show improvement over the original q-BFGS method. A limited memory quasi-Newton method is introduced for large-scale unconstrained multi-objective optimization problems. It approximates the convex combination of the Hessian matrices of the objectives and exhibits global and R-linear convergence to Pareto optimality. Empirical comparisons demonstrate the efficiency and effectiveness of the proposed approach [21]. A new L-BFGS method with regularization techniques is proposed for large-scale unconstrained optimization. It guarantees global convergence and is robust in terms of solving more problems [4]. The momentum-accelerated quasi-Newton (MoQ) method approximates Nesterov’s accelerated gradient as a linear combination of past gradients, and its performance is evaluated on a function approximation problem.

3. A New Multi-Step Limited Memory Method

In this section, we devise a new QN-limited memory method inspired by the methods in (3) and (4). The method is characterized by its utilization of more of already computed past data to better the quality of the generated Hessian (inverse) approximations.
In the basic Secant equation, a straight line L is utilized to find a new iteration xi+1 given the iterate xi, whereas higher-order interpolants are used in the multi-step approaches [22,23]. The main advantage of using polynomials is that they exploit more of the already computed data in the matrix update rather than simply discarding them. This is expected to result in better Hessian approximations.
Multi-step methods generally update the Hessian estimates to satisfy the form
B i + 1 u i = w i ,
for which the classical Secant relation simply refers to the choices u k = s k and w k = y k , respectively. One of the alternatives to u i and w i proposed in [24] is
u i = s i μ i 1 s i 1   and   w i = y i μ i 1 y i 1 ,
where μ i 1 = δ 2 / ( 1 + 2 δ ) and δ = s i / s i 1 .
Thus, Equation (5) becomes:
H i + 1 y i μ i 1 y i 1 = s i μ i 1 s i 1
Concentrating on the BFGS updating method (2), the inverse Hessian approximation at iteration i may be expressed after manipulation as
H i + 1 = H i + u i T w i 1 1 + w i T r i u i T w i u i u i T u i r i T r i u i T
where r i = H i w i .
Contained memory techniques are developed using the identity
p i + 1 = H i + 1 g i + 1
which is equivalent to (using (7))
p i + 1 = p i H i y i u i T w i 1 1 + w i T r i u i T w i u i T g i + 1 r i T g i + 1 u i + u i T g i + 1 r i .
However, the problem in using (7) is that of computing r i and H i y i without actually storing H i . The simplest option would be to make the choice H i = I at each iteration and, thus, r i = y i and H i y i , in that case, reduces to y i . However, this is a numerically improper choice since it abandons previous information collected in the updated matrices. Another proposal is considered here and that is to retain some approximation of the vector r i and correct every cycle. This can be performed using:
Start with r 0 = H 0 w 0   = w 0 (since H 0 = I )
r 1 = H 1 w 1 = w 1 + u 0 T w 0 1 1 + w 0 w 0 T u 0 T w 0 u 0 T w 1 w 0 T w 1 u 0 u 0 T w 1 w 0
and, recurrently, for any i , we use
r i = H i 1 w i + u i 1 T w i 1 1 1 + w i 1 T r i 1 u i 1 T y i 1 u i 1 T w i r i 1 T w i u i 1 u i 1 T w i r i 1 .
Relation (10) can then be utilized in (8) to compute the next direction vector. Still, the term H i 1 w i in (10) is not accessible. The obvious option is to set, r i 1 to w i 1 in (8). Such a choice is expected to affect negatively the updated matrix. Thus, our goal next is to build an expression for both H i 1 w i and H i y i to complete the derivation. By doing so, we will then have derived a method that is expected to be computationally superior to that of setting H i to be just the identity matrix in (8). This is because more of the previously accumulated information is utilized in updating the matrix. Therefore, the preference computationally is to approximate the aforementioned matrix vector products as will be shown next.
To proceed with the derivation, it should be noted that for a diagonal matrix H 0 , the next matrix in recurrence can be expressed as
H 1 = H 0 + U H 0 , u 0 , w 0
and
H 2 = H 0 + U H 0 , u 0 , w 0 + U H 1 , u 1 , w 1 H 3 = H 0 + U H 0 , u 0 , w 0 + U H 1 , u 1 , w 1 + U H 2 , u 2 , w 2 .   and   so   on
for vectors u and w, as in (5).
Let η refer to the upper limit of the correction terms U that can be saved. Because H 0 is diagonal, the limit count of n -vectors that can be utilized to build the matrix approximation is 2 η + 1 . We have hit our storage limit once H η is produced. In this case, newer vectors start replacing older ones:
H η = H 0 + U H 0 , u 0 , w 0 + + U H η 1 , u η 1 , w η 1 .
This idea was influenced by Nocedal’s algorithm [10]. Nocedal’s technique is founded on the idea that the latest data should be provided in the update, and replacing the oldest information is one compelling option. However, there is no assurance that the computed matrices will be positive-definite. Loss of positive-definiteness leads to search directions that are not necessarily descent, thus threatening the method’s convergence. To avoid this problem, the standard BFGS updating formula is expressed in the form [2]:
H i + 1 = V i T H i V i + ρ i u i u i T ,
where ρ i = u i T w i 1 and V i = I + ρ i   w i u i T , where ρ i is a parameter that is chosen to ascertain that the newly computed Hessian matrix is positive-definite and m is a parameter that controls the number of past updates that are used to update the Hessian matrix.
Thus, given the above-proposed strategy, Nocedal suggests a form of the update as follows:
For   i + 1 η , H i + 1 = V i T V i 1 T V 0 H 0 V i 1 V i + V i T V 1 T ρ 0 u 0 u 0 T V 1 V i + + V i T V i 1 T ρ i 2 u i 2 u i 2 T V i 1 V i + V i T ρ i 1 u i 1 u i 1 T V i + ρ i u i u i T For   i + 1 > η , H i + 1 = V i T V i 1 T V i η + 1 H 0 V i η + 1 V i 1 V i + V i T . . V i η + 2 T ρ i η + 1 u i η + 1 u i η + 1 T V i + + V i T . . V i 1 T ρ i 2 u i 2 u i 2 T V i 1 V i + V i T ρ i 1 u i 1 u i 1 T V i + + ρ i u i u i T .
Back to our problem, we adopt Nocedal’s approach with a suitably chosen small η to build an approximation to the product H i 1 w i in (8). For example, for η = 1 , we have
      H i 1 w i = V i 2 T H 0 V i 2 w i + ρ i 1 u i 2 T w i u i 2 = H 0 w i ρ i 2 u i 2 T w i H 0 w i 2 + ρ i 2 2 w i 2 T H 0 w i 2 u i 2 T w i u i 2 T H 0 w i u i 2 T w i 2 u i 2 + ρ i 2 s i 2 T w i u i 2 .
Similarly, to compute the product H i y i the following may be used:
        H i y i = V i 1 T H 0 V i 1 y i + ρ i 1 u i 1 T y i u i 1 = H 0 y i ρ i 1 u i 1 T y i H 0 w i 1 + ρ i 1 2 w i 1 T H 0 w i 1 u i 1 T y i u i 1 T H 0 y i u i 1 T w i 1 u i 1 + ρ i 1 s i 1 T y i u i 1 .
In general, the following is employed to build the above products for a given η and an appropriate replacement of the matrix/vectors involved (Algorithm 1):
Algorithm 1 MSCBFGS
0. LET k = i 1 ;
1. IF k η   THEN   i n c r = 0 ; b o u n d = k
       ELSE i n c r = k η ; b o u n d = η
2. q b o u n d = w i
3. FOR l = b o u n d 1 0
      { j = l + i n c r ;
      α l = ρ j u j T q l + 1 ;
      q l = q l + 1 α l w j ;}
4. z 0 = H 0 q 0
5. FOR l = 0 , 1 , , b o u n d 1 ;
       { j = l + i n c r ;
       β j = ρ j w j T z l ;
       IF   k η   THEN   z l + 1 = z l + u α l β j
ELSE   z l + 1 = z l + u α l β l } .
Our multi-step limited memory BFGS (MSCBFGS) algorithm can be stated as:
Start with an approximate point x0 to the solution
Start with H0 = I (or a scalar multiple of it)
i ← 0
Find g 0 = g x 0
Repeat
Step 1. Let p i = H i g i
Step 2. Minimize f x i + α p i where α R   ( α 0 ) to find a step length α i along p i . To make certain that the obtained search direction is adequately descent, the parameter α i is required to satisfy the strong Wolfe conditions [25]
f ( x k + α k d k ) f ( x k ) + δ α k g k T d k

And
d k T g x k + α k d k σ   d k T g k ,

where 0 < δ < σ < 1
Step 3. x i + 1 = x i + α i p i
Step 4. Compute s i = x i + 1 x i   y i = g i + 1 g i ,   w i = y i μ i y i 1   and  
  u i = s i μ i s i 1 .
Step 5. If w i T u i > 0 , then compute H i + 1 B F G S using the recurrence in (8).
Step 6. i = i + 1
until g i 2 < ε , (where ε R is a convergence parameter).
We now need to establish that (8) produces a descent direction vector. To do so, it suffices to show that the matrix used in the computation of (8) is positive-definite.
Theorem 1.
The matrix  H i + 1  update in (7) is positive-definite if   H i  is positive-definite and if  u i T w i > 0 .
Proof of Theorem 1.
We first show that if H i + 1 is positive-definite, then u i T w i > 0 . From (5), it follows that
w i T u i = w i T H i + 1 w i > 0 .
Now, given that w i T u i > 0 and r i = H i w i (from (7)), we proceed to prove that H i + 1 is positive-definite by showing that v T H i + 1 v > 0 , for any vector v 0 . Using (7) and (11), we have
v T H i + 1 v = v T V i T H i V i v + ρ i ( u i T v ) 2 ,
where ρ = u i T w i 1 , u i = s i μ i 1 s i 1 , V i = I + ρ i   w i u i T , and ρ i is a parameter that is chosen to make certain that the newly computed Hessian matrix is positive-definite. Since H i is positive-definite, V i T H i V i is also positive-definite. The second term is also positive. Thus, the whole expression in (14) is positive-definite. □
It should be noted here that conditions (13) ensure only that   s i T y i > 0 while there are no guarantees for w i T u i . In our implementations, the condition w i T u i > 0 is tested, and if it is not satisfied, we revert to using the classical Secant update at that specific iteration with u i = s i and w i = y i .
As the new algorithm is not stripping away the main structure of the BFGS update, the derived method possesses the same convergence properties as those that belong to the same class such as those in [10,26]. The R-linear convergence rate of the MSCBFGS method is proved under the assumption that the objective function is strongly convex. However, the method can also converge R-linearly for non-convex objective functions. The convergence rate of the MSCBFGS method can be affected by the choice of the memory parameter η. A larger value of η can lead to a faster convergence rate, but it can also make the method more sensitive to noise. The proof for the convergence rate is very similar to that conducted in [26].
Estimating the time complexity of the described optimization algorithm involves several factors. The number of iterations (k) in the optimization process, along with the computational costs of algorithmic components (O(f)) and convergence conditions (O(g)), collectively shape the overall time complexity. A very rough estimate of the algorithm’s time complexity could be expressed as O(k × (f + g)), considering the iterative nature of the optimization and the associated computations. However, a more precise analysis would require a detailed breakdown of specific operations and their computational costs. As a result, the per iteration complexity and storage requirement is O( η n) where η n is the size of the stored frame and n is the problem dimension, thus, reducing the O( n 2 ) computational complexity and memory requirements of standard quasi-Newton methods.
The main feature of this algorithm is that it computes the search vector at each iteration using a formula that is almost the full memory multi-step BFGS version with only an approximation applied to the term H i 1 w i in (10). This is expected to produce results close to the full multi-step BFGS version, as the results reveal.

4. Numerical Results

The new method (7) is benchmarked against the methods in (3) and (4). Those are experimented on thirty distinct test problems with dimensions varying from 2 to 10,000. The total number of tested problems is 900. The functions tested are classified into four categories and can be found in [7,13,27,28,29]. The categories are as follows:
a.
Low dimension 2 n 15
b.
Medium dimension 16 n 45
c.
Moderate-High dimension 46 n 10000
d.
High dimension 81 n 10000
Each of the test problems possesses one or more of the following properties:
(a)
Non-convexity;
(b)
Global minimum;
(c)
Multi-modality;
(d)
Ill-conditioned;
(e)
Highly non-linear;
(f)
Symmetry around the global minimum;
(g)
Periodic;
(h)
Discrete optimization domain.
The functions and their properties are listed in Table 1.
Table 2, Table 3, Table 4 and Table 5 report the results for each category. Table 6 reports the totals. The figures presented in each table indicate the total number of iterations, function/gradient evaluations, the timings, and the points for every algorithm. A point is granted to a method if it obtains the minimal evaluations count in solving a problem. Ties are resolved using the count of iterations. The percentages in each table indicate the improvements (or otherwise) of each method on each of the evaluation categories (iterations, evaluations, and time score) as opposed to the benchmark method of Byrd et al. A lower percentage indicate the savings obtained in comparison to the benchmark method that is assigned the full percentage. Our results for the above method seem to bias method 3 for η = 3 . Bigger η have not introduced worthwhile improvements in performance over the other methods to justify the extra memory expense. Therefore, the test results reported on the new method are conducted with η = 3 .
The results reveal clearly that M S C B F G S (corresponding to η = 3 ) method is superior to the other two, especially on large problems. Figure 1, Figure 2 and Figure 3 report the results comparisons on each criterion while Figure 4 presents an overall overview of those.
The analysis of the numerical scores for the newly developed limited memory MSCBFGS method in comparison to the Byrd et al. [13] and Zhu and Byrd [14] methods reveals notable improvements in efficiency and effectiveness. In terms of evaluations, the MSCBFGS method requires only 91.21% of the evaluations compared to the Byrd et al. [13] method, indicating enhanced computational efficiency. Additionally, it achieves a further reduction in iterations, requiring only 88.01% of the iterations compared to the original method, demonstrating improved convergence properties. Moreover, the MSCBFGS method significantly reduces computation time, utilizing only 87.65% of the time compared to the Byrd et al. [13] method, while outperforming both the original method and the Zhu and Byrd method in terms of overall effectiveness, as evidenced by its higher score of 369 compared to 232 and 299, respectively. Overall, these results suggest that the newly developed MSCBFGS method offers substantial improvements in efficiency and effectiveness, making it a promising approach for optimization problems.

5. Conclusions

In conclusion, this paper has presented a contribution to the field of optimization through the development of a novel quasi-Newton method known as memory-sensitive or limited BFGS (MSCBFGS). The method has been tailored specifically to address the challenges posed by memory constraints in solving complex large-scale problems. By allowing for flexibility in the choice of a storage variable that governs the retention of past data in the formation of the new Hessian approximation, MSCBFGS exhibits adaptability and versatility.
The outcomes of extensive numerical experiments have underscored the efficacy of the MSCBFGS algorithm. These findings not only affirm the algorithm’s potential but also pave the way for its practical application in a range of memory-limited optimization tasks.
One notable advantage of the proposed method lies in its adaptability to varying memory constraints, offering a customizable parameter for controlling the retention of past data. This flexibility enables the algorithm to effectively navigate memory-limited environments, enhancing its applicability to a wide range of optimization problems.

6. Discussion

By utilizing a modified version of the full memory multi-step BFGS update, the method derived in this paper maintains a balance between computational efficiency and accuracy in approximating the Hessian matrix. However, it is crucial to acknowledge certain limitations. One such drawback is the potential sensitivity of the algorithm’s performance to the selection of parameters, which may require careful tuning for optimal results. Additionally, while the method demonstrates improved effectiveness within constrained memory contexts, its performance in comparison to alternative approaches warrants further investigation, particularly in scenarios with highly complex or nonlinear objective functions. Overall, the presented method offers promising advancements in addressing memory limitations in quasi-Newton optimization methods, yet ongoing research is essential to fully understand its capabilities and limitations in diverse optimization scenarios.
Future research can deeper investigate the optimization of the memory parameter selection process within MSCBFGS. Exploring adaptive or learning-based techniques to dynamically adjust this parameter during optimization could enhance the algorithm’s performance further, especially when taking the nature of the problem being solved into account [30].

Author Contributions

Conceptualization, I.A.R.M.; methodology, I.A.R.M.; software, B.A.H.; validation, B.A.H. and I.A.R.M.; formal analysis, B.A.H. and I.A.R.M.; investigation, I.A.R.M.; writing—original draft preparation, B.A.H. and I.A.R.M.; writing—review and editing, I.A.R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Kuwait Technical College, Kuwait.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Davidon, W.C. Variable metric methods for minimization. Argonne Natl. Labs. Rep. 1971, 2, 1–11. [Google Scholar]
  2. Powell, M.J.D. Some Global Convergence Properties of a variable Metric Algorithm for Minimization without Exact Line Searches. In Nonlinear Programming, SIAM-AMS Proceedings; Cottle, R.W., Lemke, C.E., Eds.; American Mathematical Society: Providence, RI, USA, 1976; Volume 4, pp. 53–72. [Google Scholar]
  3. Broyden, C.G. The convergence of a class of double-rank minimization algorithms—Part 2: The new algorithm. J. Inst. Math. Appl. 1970, 6, 222–231. [Google Scholar] [CrossRef]
  4. Dennis, J.E.; Schnabel, R.B. Least change Secant updates for quasi-Newton methods. SIAM Rev. 1979, 21, 443–459. [Google Scholar] [CrossRef]
  5. Fletcher, R. A new approach to variable metric algorithms. Comput. J. 1970, 13, 317–322. [Google Scholar] [CrossRef]
  6. Fletcher, R. An Overview of Unconstrained Optimization. In Mathematical and Physical Sciences; Springer: Dordrecht, The Netherlands, 1994; Volume 434. [Google Scholar]
  7. Fletcher, R. Practical Methods of Optimization; Wiley: Hoboken, NJ, USA, 1987. [Google Scholar]
  8. Oren, S.S.; Luenberger, D. Self-scaling variable metric (SSVM) algorithms V1. Manag. Sci. 1974, 20, 845–862. [Google Scholar] [CrossRef]
  9. Liu, D.C.; Nocedal, J. On the limited memory BFGS method for large scale optimization. Math. Program. 1989, 45, 503–528. [Google Scholar] [CrossRef]
  10. Nocedal, J. Updating Quasi-Newton Matrices with Limited Storage. Math. Program. 1980, 35, 773–782. [Google Scholar] [CrossRef]
  11. Yuan, G.; Wei, Z.; Wu, Y. Modified limited memory BFGS method with non-monotone line search for unconstrained optimization. J. Korean Math. Soc. 2010, 47, 767–788. [Google Scholar] [CrossRef]
  12. Zhang, J.; Deng, N.; Chen, L. Quasi-Newton equation and related methods for unconstrained optimization. J. Optim. Theory Appl. 1999, 102, 147–167. [Google Scholar] [CrossRef]
  13. Byrd, R.H.; Nocedal, J.; Schnabel, R.B. Representations of quasi-Newton matrices and their use in large scale optimization. SIAM J. Optim. 1994, 4, 677–696. [Google Scholar]
  14. Zhu, C.; Byrd, R.H.; Nocedal, J. Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization. ACM Trans. Math. Softw. 1997, 23, 550–560. [Google Scholar] [CrossRef]
  15. Biggs, M.C. Minimization algorithms making use of non-quadratic properties of the objective function. J. Inst. Math. Its Appl. 1971, 8, 315–327. [Google Scholar] [CrossRef]
  16. Kanzow, C.; Steck, D. Regularization of limited memory quasi-Newton methods for large-scale nonconvex minimization. Math. Program. Comput. 2023, 15, 417–444. [Google Scholar] [CrossRef]
  17. Lapucci, M.; Mansueto, P. A limited memory Quasi-Newton approach for multi-objective optimization. Comput. Optim. Appl. 2023, 85, 33–73. [Google Scholar] [CrossRef]
  18. Wei, Z.; Li, G.; Qi, L. The superlinear convergence of a modified BFGS- type method for unconstrained optimization. Comput. Optim. Appl. 2004, 29, 315–332. [Google Scholar] [CrossRef]
  19. Xiao, Y.H.; Wei, Z.X.; Zhang, L. A modified BFGS method without line searches for nonconvex unconstrained optimization. Adv. Theor. Appl. Math. 2006, 1, 149–162. [Google Scholar]
  20. Lai, K.K.; Shashi, K.M. Ravina Sharma, Manjari Sharma & Bhagwat Ram. A Modified q-BFGS Algorithm for Unconstrained Optimization. Mathematics 2023, 11, 1420–1431. [Google Scholar]
  21. Lai, K.K.; Mishra, S.K.; Panda, G.; Chakraborty, S.K.; Samei, M.E.; Ram, B. A limited memory q-BFGS algorithm for unconstrained optimization problems. J. Appl. Math. Comput. 2021, 66, 183–202. [Google Scholar] [CrossRef]
  22. Ford, J.A.; Moghrabi, I.A.R. Multi-step quasi-Newton methods for optimization. J. Comput. Appl. Math. 1994, 50, 305–323. [Google Scholar] [CrossRef]
  23. Moghrabi, I.A.R. A non-Secant quasi-Newton Method for Unconstrained Nonlinear Optimization. Cogent Eng. 2022, 9, 11–25. [Google Scholar] [CrossRef]
  24. Ford, J.A.; Moghrabi, I.A. Alternative Parameter Choices for Multi-Step Quasi-Newton Methods. Optim. Methods Softw. 1993, 2, 357–370. [Google Scholar] [CrossRef]
  25. Wolfe, P. Convergence conditions for ascent methods. II: Some corrections. SIAM Rev. 1971, 3, 185–188. [Google Scholar] [CrossRef]
  26. Xiao, Y.; Wei, Z.; Wang, Z. A limited memory BFGS-type method for large-scale unconstrained optimization. Comput. Math. Appl. 2008, 56, 1001–1009. [Google Scholar] [CrossRef]
  27. Anderi, N. An Unconstrained Optimization Test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
  28. Moré, J.J.; Garbow, B.S.; Hillstrom, K.E. Testing unconstrained optimization software. ACM Trans. Math. Softw. 1981, 7, 17–41. [Google Scholar] [CrossRef]
  29. Yuan, Y.; Sun, W. Optimization Theory and Methods; Science Press: Beijing, China, 1999. [Google Scholar]
  30. Jin, Q.; Mokhtari, A. Non-asymptotic super linear convergence of standard quasi-Newton methods. J. Strateg. Decis. 2021, 121, 11–28. [Google Scholar]
Figure 1. Overall evaluations for 900 problems.
Figure 1. Overall evaluations for 900 problems.
Mathematics 12 00768 g001
Figure 2. Overall iterations for 900 problems.
Figure 2. Overall iterations for 900 problems.
Mathematics 12 00768 g002
Figure 3. Overall timings for 900 problems.
Figure 3. Overall timings for 900 problems.
Mathematics 12 00768 g003
Figure 4. Performance overview of all criteria.
Figure 4. Performance overview of all criteria.
Mathematics 12 00768 g004
Table 1. Set of test problems.
Table 1. Set of test problems.
Function Name (Dimension)Properties
Rosenbrock n = 2 a, b, c, d, f
Quadratic function n = 2 b
Watson function 3 n 31 c, e
Extended Rosenbrock ( 2 n 10000 , n even ) a, c, d
Extended Powell ( 2 n 10000 , n divisible by 4 ) a, c
Penalty function I 2 n 10000 a, c, e
Variably dimensioned function 2 n 10000 a, c
Trigonometric function 2 n 10000 a, c, g
Modified Trigonometric function 2 n 10000 a, c, g
Broyden Tridiagonal function 2 n 10000 a, c
Discrete Boundary value function 2 n 10000 a, c, h
Oren and Spedicato Power function 2 n 10000 a, c, e
Full Set of Distinct Eigen Values Problem 2 n 10000 a (if all eigenvalues are negative)
Tridiagonal function 2 n 10000 a, c, f
Wolfe function 2 n 10000 a, c, e
Diagonal Rosenbrock’s function ( 2 n 10000 ,   n even ) c, e
Generalized Shallow function ( 2 n 10000 ,   n even ) c, e
Freudenstein and Roth n = 2 a, c, e
Powell Badly Scaled n = 2 a, d, e
Brown Badly Scaled n = 2 a, d, e
Beale n = 2 a, b, c
Bard n = 3 a, b, c
Freudenstein and Roth n = 2 a, b, c
Powell Badly Scaled n = 2 a, b, c
Brown Badly Scaled n = 2 a, b, c
Beale n = 2 a, b, c
Bard n = 3 a, b, c
Gaussian (n = 3)B
Table 2. Large problems test results.
Table 2. Large problems test results.
MethodEvaluationsIterationsTime (s)Scores
Byrd et al. [13]25,03922,106426.4232
100%100%100%
Zhu and Byrd [14]22,69919,838392.0851
90.66%89.7%91.95%
MSCBFGS21,60918,461319.9571
86.30%83.5%75.03%
Table 3. Moderate high dimension problems test results.
Table 3. Moderate high dimension problems test results.
MethodEvaluationsIterationsTime (s)Scores
Byrd et al. [13]12,84196733348.590
100%100%100%
Zhu and Byrd [14]12,34388643075.9129
96.12%91.64%91.8%
MSCBFGS11,65487172980.4162
90.76%90.12%89.1%
Table 4. Medium-size dimension problems test results.
Table 4. Medium-size dimension problems test results.
MethodEvaluationsIterationsTime (s)Scores
Byrd et al. [13]11,74511,06118,645.7166
100%100%100%
Zhu and Byrd [14]11,60311,13217,422.5178
98.79%100.64%93.44%
MSCBFGS10,869985013,916.57110
92.54%89.06%74.64%
Table 5. Small dimension problems test results.
Table 5. Small dimension problems test results.
MethodEvaluationsIterationsTime (s)Scores
Byrd et al. [13]9597504910,744.044
100%100%100%
Zhu and Byrd [14]9443495310,503.041
98.39%98.11%97.76%
MSCBFGS9754518010,779.426
101.64%102.61%100.3%
Table 6. Overall scores for the test results for 900 problems.
Table 6. Overall scores for the test results for 900 problems.
MethodEvaluationsIterationsTime (s)Scores
Byrd et al. [13]59,07947,96031,942232
100%100%100%
Zhu and Byrd [14]56,08744,78731,394299
94.94%93.38%98.28%
MSCBFGS53,88742,20927,996369
91.21%88.01%87.65%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Moghrabi, I.A.R.; Hassan, B.A. An Efficient Limited Memory Multi-Step Quasi-Newton Method. Mathematics 2024, 12, 768. https://doi.org/10.3390/math12050768

AMA Style

Moghrabi IAR, Hassan BA. An Efficient Limited Memory Multi-Step Quasi-Newton Method. Mathematics. 2024; 12(5):768. https://doi.org/10.3390/math12050768

Chicago/Turabian Style

Moghrabi, Issam A. R., and Basim A. Hassan. 2024. "An Efficient Limited Memory Multi-Step Quasi-Newton Method" Mathematics 12, no. 5: 768. https://doi.org/10.3390/math12050768

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop