Abstract
The adaptive cubic regularization method solves an unconstrained optimization model by using a three-order regularization term to approximate the objective function at each iteration. Similar to the trust-region method, the calculation of the sub-problem highly affects the computing efficiency. The Lanczos method is an useful tool for simplifying the objective function in the sub-problem. In this paper, we implement the adaptive cubic regularization method with the aid of the Lanczos method, and analyze the error of Lanczos approximation. We show that both the error between the Lanczos objective function and the original cubic term, and the error between the solution of the Lanczos approximation and the solution of the original cubic sub-problem are bounded up by the condition number of the optimal Hessian matrix. Furthermore, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method and the adaptive cubic regularization algorithm without using the Lanczos approximation for unconstrained optimization problems. Numerical experiments show that the Lanczos method improves the computation efficiency of the adaptive cubic method remarkably.
1. Introduction
For the unconstrained optimization problem
Cartis et al. [1] proposed an adaptive cubic regularization (ACR) algorithm. It is an alternative to classical globalization techniques, which uses a cubic over-estimator of the objective function as a regularization technique, and uses an adaptive parameter to replace the Lipschitz constant in the cubic Taylor-series model. At each iteration, the objective function is approximated by a cubic function. Numerical experiments in [1] show that the ACR is comparable with trust-region method for small-scale problems. Despite the fact that the method has been shown to have powerful local and global convergence properties, the practicality and efficiency of the adaptive cubic regularization method depend critically on the efficiency of solving its sub-problem at each iteration.
For solving the trust-region sub-problem, many efficient algorithms have been proposed. These algorithms can be grouped into three broad categories: the accurate methods for dense problems, the accurate methods for large-sparse problems, and the approximation methods for large-scale problems. The first category are the accurate methods for dense problems, such as the classical algorithm proposed by Moré and Sorensen [2], which used Newton’s method to iteratively solve symmetric positive definite linear systems via the Cholesky factorization. The second category are the accurate methods for large-sparse problems. For instance, the Lanczos method was employed to solve the large-scale trust-region sub-problem through a parameterized eigenvalue problem [3,4]. Another accurate approach [5], is based on a parametric eigenvalue problem within a semi-definite framework, which employed the Lanczos method for the smallest eigenvalue as a black box. Hager [6] and Erway et al. [7] utilized the subspace projection algorithms for accurate methods. The third category are the approximation methods for large-scale problems. The generalized Lanczos trust-region method (GLTR) [8,9] was proposed as an improved Steihaug [10]-Toint [11] conjugate-gradient method. For the GLTR method, Zhang et al. established prior upper bounds [12] and posterior error bounds [13] for the optimal objective value and the optimal solution between the original trust-region sub-problem and their projected counterparts.
For solving cubic models sub-problems, many algorithms are extensions of trust-region algorithms. Cartis et al. [1] provided the Newton’s method to solve the sub-problem of ACR, which employs Cholesky factorization at each iteration. This method usually applies to small-scale problems. Moreover, Cartis et al. briefly described the process of using the Lanczos method for the ACR sub-problem in [1]. Carmon and Duchi [14] provided the gradient descent method to approximate the cubic-regularized Newton step, and gave the convergence rate. However, the convergence rate of the gradient descent method is worse than that of the Krylov subspace method. Birgin et al. [15] proposed a Newton-like method for unconstrained optimization, whose sub-problem is similar to but different from that of ACR. They introduced a mixed factorization, which is a cheaper factorization than the Cholesky factorization. Brás et al. [16] used the Lanczos method efficiently to solve the sub-problems associated with a special type of cubic models, and also embedded the Lanczos method in a large-scale trust-region strategy. Furthermore, an accelerated first-order method for the ACR sub-problem was developed by Jiang et al. [17].
In this paper, we employ the Lanczos method to solve the sub-problem of the adaptive cubic regularization method (ACRL) for large-scale problems. The ACRL algorithm mainly includes the following three steps. Firstly, the ACRL generates the jth Krylov subspace using the Lanczos method. Next, we project the original sub-problem onto the jth Krylov subspace to obtain a smaller-sized sub-problem. Finally, we solve the resulting smaller-sized sub-problem to get an approximate solution. Such procedures are based on the minimization of the local model of the objective function over a sequence of small-sized sub-spaces. As a result, the ACRL is applicable for large-scale problems. Moreover, we analyze the error of the Lanczos approximation. For unconstrained optimization problems, we perform numerical experiments and compare our method with the method of not using the Lanczos approximation (ACRN).
The outline of this paper is as follows. In Section 2, we introduce the adaptive cubic regularization method and its optimality condition. The method using the Lanczos algorithm to solve the ACR sub-problem is introduced in Section 3. In Section 4, we show the error bounds of the approximate solution and approximate objective value obtained using the ACRL method. Numerical experiments demonstrating the efficiency of the algorithm are given in Section 5. Finally, we give some concluding remarks in Section 6.
2. Preliminaries
Throughout the paper, a matrix is represented by a capital letter, while a lower case bold letter is used for a vector and a lower case letter for a scalar.
The adaptive cubic regularization method [1,18] is proposed by Cartis et al. for unconstrained optimization problems. It mainly uses a cubic over-estimator of the objective function as a regularization technique to calculate the step at each iteration. Assuming that is the current iteration point, the objective function is second-order continuously differentiable, and its Hessian matrix is globally Lipschitz continuous. For any , by expressing the Taylor expansion of at the point , we obtain
where , , and L is the Lipschitz constant. Here, and for the remainder of this paper, denotes an norm. The inequality is obtained by using the Lipschitz property of . In [1], Cartis et al. proposed to replace the constant in Equation (1) with a dynamic positive parameter . In the cubic regularization model, the matrix needs not to be globally or locally continuous in general. Furthermore, the approximation of by a symmetric matrix is employed at each iteration. Therefore, the model
is used to estimate at each iteration. Then, the adaptive cubic regularization method sub-problem aims to compute a descent direction vector . Finally, the sub-problem is given with the form of
in which is short for
Cartis et al. introduced the following global optimality result of ACR, which is similar to the optimality conditions of the trust-region method.
Theorem 1
([1], Theorem 3.1). The vector is a global minimizer of the sub-problem (3) if and only if there is a scalar satisfying the following system of equations:
where , and is a positive semi-definite matrix. If is positive definite, then is unique.
The optimality condition of the trust-region sub-problem [19] aims to minimize within an -norm trust region , where is the trust-region radius. For a trust-region sub-problem, the vector satisfies , which means either . When both the trust-region sub-problem and the cubic regularization sub-problem approximate the original objective function precisely enough, we get from Theorem 1. Therefore, the parameter in the ACR algorithm is inversely proportional to the trust-region radius, and it plays the same role as the trust region-radius, while we adjust the estimation accuracy of the sub-problem.
3. Computation of the ACR Sub-Problem with the Lanczos Method
The Lanczos algorithm [20] was proposed to solve sparse linear systems and to find the eigenvalues of sparse matrices. It builds up an orthogonal basis for the Krylov space By utilizing the orthogonal basis the original symmetric matrix B is transformed into a tridiagonal matrix.
Normally, the dimension of the increases by 1 as j increases by 1. However, the Lanczos process may break down and the dimension of stops increasing at a certain j. We define as the smallest nonnegative integer, such that the Lanczos process breaks down. If the dimension of the Krylov space is much less than the size of the matrix, it greatly saves the storage space and highly improves the calculation speed by projecting B onto a subspace. Specially, we find a proper using the Lanczos method, such that is tridiagonal. We state the procedure in the following algorithm.
Algorithm 1 computes an orthogonal matrix , where
is tridiagonal. Moreover, it follows directly from Algorithm 1 that
where is the first unit vector of in length.
| Algorithm 1 Lanczos algorithm |
|
For a large-scale trust-region sub-problem, an effective solution is to approximately calculate it using the Krylov subspace methods. The Lanczos algorithm, as one of the Kryolv subspace methods, was first introduced in [8] for the trust-region method. Similar to the trust-region method, the Lanczos algorithm is also suitable for solving the cubic regularization sub-problem. By employing Algorithm 1, we find
where is defined by (3). The original sub-problem (3) is transformed into the following sub-problem
Theorem 1 illustrates that is a global minimizer of the above sub-problem, if and only if a pair of satisfies
where is positive semi-definite. Equation (8) can finally be solved by Newton’s method ([1], Algorithm 6.1). Newton’s method for solving the sub-problem requires the eigenvalue decomposition of for various . When the scale of the original problem is large, it is very expensive to directly use the iterative method.
In summary, an approximation of the solution of the ACR sub-problem (3) can be obtained in the following steps. First, we apply j steps of the Lanczos method to the cubic function appearing in (3) to obtain a tridiagonal matrix . Then, we use the Newton’s method for a small-size sub-problem with matrix to compute the Lagrange multiplier and . Finally, the matrix is used to recover . Thus, it should be noted that the Lanczos vectors need to be saved. We sketch the algorithm as follows.
In the GLTR algorithm, ([8], Theorem 5.8) discussed a restarting strategy for the degenerate case, which means that multiple global solutions exist. Similar to the GLTR, a restarting strategy also applies to the ACRL, although this is just discussed from a theoretical perspective. Therefore, we mainly consider the nondegenerate case in the following analysis.
4. Convergence Analysis
Theorem 1 shows that we aim to seek a pair of satisfying
Then, we have . In this section, we will analyze the error between the optimal objective function value of the original sub-problem and the the optimal objective function value of the sub-problem in the subspace generated by the Algorithm 2, as well as the distance between and under the assumption when the Equation (9) was satisfied.
We set
which is positive definite in the nondegenerate case. The spectral condition number of is
where are the eigenvalues of . We define
in which is defined by (3).
Next, for the vector defined in (6), we analyze the errors
| Algorithm 2 The ACRL method |
|
Theorem 2.
Suppose (3) is nondegenerate; and is the jth approximation of generated by ACRL satisfying , then for any nonzero , we have
and
Proof.
It can be seen that . Then, we obtain
Let , where
Based on (16), we obtain
We immediately have
where the last equality follows from (15).
Furthermore, for any ,
Therefore, we have
From
and (17), we get Then, the equality (19) holds. The conclusion in (13) is given based on the above analysis.
The inequality (14) holds. □
5. Numerical Experiments
In order to show the efficiency of the Lanczos for improving the adaptive cubic regularization algorithm, we perform the following two numerical experiments. In this section, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method (ACRL) and the adaptive cubic regularization algorithm by just using Newton’s method (ACRN) for unconstrained optimization problems.
The ACRL and ACRN algorithms are implemented with the following parameters
Convergence in both algorithms for the sub-problem occurs as soon as
or if more than the maximum number of iterations has been performed, which we set to 2000. All numerical experiments in this paper were performed on a laptop with i5-10210U CPU at 1.60 GHz and 16.0 GB of RAM.
Example 1
(Generalized Rosenbrock function [21]). The Generalized Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960, which is defined as follows:
From the (23), the solution is obviously , and the minimum .
In Table 1, we show the results of the ACRL and the ACRN for computing the minima of the Generalized Rosenbrock function, with variables from 10 to 2000. In addition to the dimensions of the Generalized Rosenbrock function, we give the number of iterations (“Iter.”), the total CPU time required in seconds and the relative error between the computational result and the exact minimum (“Err.”). It can be seen that, using the Lanczos method to solve the adaptive cubic regularization sub-problem of Generalized Rosenbrock function is much more efficient than not using the Lanczos method. Moreover, it is not only faster, but also more accurate to calculate, especially when the scale is relatively large.
Table 1.
Results for computing the minima of the Generalized Rosenbrock function.
Example 2
(Eigenvalues of tensors arising from hypergraphs). Next, we consider the problem of computing extreme eigenvalues of sparse tensors arising from a hypergraph. An adaptive cubic regularization method on a Stiefel manifold named ACRCET is proposed to solve the eigenvalues of tensors [22]. We compare the numerical performances of the ACRL and the ACRN method when applying to the sub-problem of ACRCET. Before going to the experiment part, we first introduce the concepts of tensor eigenvalues and hypergraphs.
A real mth order n-dimensional tensor has entries:
for If the value of is invariable under any permutation of its indices, is a symmetric tensor.
Qi [23] defined a scalar as a Z-eigenvalue of and a nonzero vector as its associated Z-eigenvector if they satisfy
Definition 1
(Hypergraph). A hypergraph is defined as , where is the vertex set and is the edge set for If for and when , we call G an r-uniform hypergraph.
For each vertex , the degree is defined as
Definition 2
(adjacency tensor and Laplacian tensor). The adjacency tensor of a m-uniform hypergraph G is a symmetric tensor with entries
For an m-uniform hypergraph G, the degree tensor is a diagonal tensor whose ith diagonal element is . Then, the Laplacian tensor is defined as
A triangle has three vertices and three edges. In this example, we subdivide the triangles by connecting the midpoints of each edge of the triangles. Then, the s-order subdivision of a triangle has faces, and each face is a triangle. As shown in Figure 1, three vertices as well as the center of the triangles are regarded as an edge of a 4-uniform graph .
Figure 1.
Four-uniform hypergraphs: subdivision of a triangle.
We compute the largest Z-eigenvalue of the Laplacian tensor via the ACRCET method, using ACRL and ACRN, respectively. In each run, 10 points on the unit sphere are randomly chosen, and 10 estimated eigenvalues are calculated. Then, we take the best one as the estimated largest eigenvalue. For different subdivision order the computation results, including the estimated largest Z-eigenvalue, the total number of iterations, and the total CPU time (in seconds) of the 10 runs are reported in Table 2.
Table 2.
Results for finding the largest Z-eigenvalues of .
It can be seen that both the ACRL and the ACRN find all the largest eigenvalues. However, the ACRL takes almost no time compared to the ACRN. When , the ACRL method only costs 236 s, while the ACRN needs 103,900 s. The numerical comparison between the ACRL and the ACRN verifies that the Lanczos method dramatically accelerates the running speed when solving the ACR sub-problem (3), and is powerful for large-scale problems.
6. Conclusions
In this paper, we have used the Lanczos method to solve the adaptive cubic regularization method sub-problem (ACRL). The ACRL method first projects a large-scale ACR sub-problem (3) into a much smaller sub-problem (7) using the Lanczos method, and then solves the smaller sub-problem (7) using the Newton’s method. For the convergence analysis, we also established prior error bounds on the differences between the approximate objective value and the approximate solution with its corresponding optimal ones. Numerical experiments illustrate that the ACRL method greatly improves the computing efficiency and performs well, even for large-scale problems.
Author Contributions
Methodology, Z.Z. and J.C.; writing—original draft preparation, Z.Z.; writing—review and editing, J.C.; supervision, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Natural Science Foundation of China, grant No. 11901118 and No. 62073087.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Not applicable.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 2011, 127, 245–295. [Google Scholar] [CrossRef]
- Moré, J.J.; Sorensen, D.C. Computing a trust region step. SIAM J. Sci. Stat. Comput. 1983, 4, 553–572. [Google Scholar] [CrossRef]
- Sorensen, D.C. Minimization of a large-scale quadratic functionsubject to a spherical constraint. SIAM J. Optim. 1997, 7, 141–161. [Google Scholar] [CrossRef]
- Rojas, M.; Santos, S.A.; Sorensen, D.C. A new matrix-free algorithm for the large-scale trust-region subproblem. SIAM J. Optim. 2000, 11, 611–646. [Google Scholar] [CrossRef]
- Rendl, F.; Wolkowicz, H. A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 1997, 77, 273–299. [Google Scholar] [CrossRef]
- Hager, W.W. Minimizing a quadratic over a sphere. SIAM J. Optim. 2001, 12, 188–208. [Google Scholar] [CrossRef]
- Erway, J.B.; Gill, P.E.; Griffin, J.D. Iterative methods for finding a trust-region step. SIAM J. Optim. 2009, 20, 1110–1131. [Google Scholar] [CrossRef]
- Gould, N.I.; Lucidi, S.; Roma, M.; Toint, P.L. Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 1999, 9, 504–525. [Google Scholar] [CrossRef]
- Conn, A.R.; Gould, N.I.; Toint, P.L. Trust Region Methods; SIAM: Philadelphia, PA, USA, 2000; pp. 91–105. [Google Scholar]
- Steihaug, T. The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 1983, 20, 626–637. [Google Scholar] [CrossRef]
- Toint, P. Towards an efficient sparsity exploiting Newton method for minimization. In Sparse Matrices and Their Uses; Academic Press: Cambridge, MA, USA, 1981; pp. 57–88. [Google Scholar]
- Zhang, L.H.; Shen, C.; Li, R.C. On the generalized Lanczos trust-region method. SIAM J. Optim. 2017, 27, 2110–2142. [Google Scholar] [CrossRef]
- Zhang, L.; Yang, W.; Shen, C.; Feng, J. Error bounds of Lanczos approach for trust-region subproblem. Front. Math. China 2018, 13, 459–481. [Google Scholar] [CrossRef]
- Carmon, Y.; Duchi, J. Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 2019, 29, 2146–2178. [Google Scholar] [CrossRef]
- Birgin, E.G.; Martínez, J.M. A Newton-like method with mixed factorizations and cubic regularization for unconstrained minimization. Comput. Optim. Appl. 2019, 73, 707–753. [Google Scholar] [CrossRef]
- Brás, C.P.; Martínez, J.M.; Raydan, M. Large-scale unconstrained optimization using separable cubic modeling and matrix-free subspace minimization. Comput. Optim. Appl. 2020, 75, 169–205. [Google Scholar] [CrossRef]
- Jiang, R.; Yue, M.C.; Zhou, Z. An accelerated first-order method with complexity analysis for solving cubic regularization subproblems. Comput. Optim. Appl. 2021, 79, 471–506. [Google Scholar] [CrossRef]
- Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function-and derivative-evaluation complexity. Math. Program. 2011, 130, 295–319. [Google Scholar] [CrossRef]
- Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 1999; pp. 69–71. [Google Scholar]
- Parlett, B.N.; Reid, J.K. Tracking the Progress of the Lanczos Algorithm for Large Symmetric Eigenproblems. IMA J. Numer. Anal. 1981, 1, 135–155. [Google Scholar] [CrossRef]
- Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
- Chang, J.; Zhu, Z. An adaptive cubic regularization method for computing extreme eigenvalues of tensors. arXiv 2022, arXiv:2209.04971. [Google Scholar]
- Qi, L. Eigenvalues of a real supersymmetric tensor. J. Symb. Comput. 2005, 40, 1302–1324. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).