Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method

Zhu, Zhi; Chang, Jingya

doi:10.3390/sym14102191

Open AccessArticle

Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method

by

Zhi Zhu

and

Jingya Chang

^*

School of Mathematics and Statistics, Guangdong University of Technology, Guangzhou 510520, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(10), 2191; https://doi.org/10.3390/sym14102191

Submission received: 14 September 2022 / Revised: 4 October 2022 / Accepted: 9 October 2022 / Published: 18 October 2022

(This article belongs to the Special Issue Tensors and Matrices in Symmetry with Applications)

Download

Browse Figure

Versions Notes

Abstract

The adaptive cubic regularization method solves an unconstrained optimization model by using a three-order regularization term to approximate the objective function at each iteration. Similar to the trust-region method, the calculation of the sub-problem highly affects the computing efficiency. The Lanczos method is an useful tool for simplifying the objective function in the sub-problem. In this paper, we implement the adaptive cubic regularization method with the aid of the Lanczos method, and analyze the error of Lanczos approximation. We show that both the error between the Lanczos objective function and the original cubic term, and the error between the solution of the Lanczos approximation and the solution of the original cubic sub-problem are bounded up by the condition number of the optimal Hessian matrix. Furthermore, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method and the adaptive cubic regularization algorithm without using the Lanczos approximation for unconstrained optimization problems. Numerical experiments show that the Lanczos method improves the computation efficiency of the adaptive cubic method remarkably.

Keywords:

adaptive cubic regularization sub-problem; Lanczos method; large-scale optimization; Krylov subspaces

1. Introduction

For the unconstrained optimization problem

min_{x \in R^{n}} f (x),

Cartis et al. [1] proposed an adaptive cubic regularization (ACR) algorithm. It is an alternative to classical globalization techniques, which uses a cubic over-estimator of the objective function as a regularization technique, and uses an adaptive parameter

σ

to replace the Lipschitz constant in the cubic Taylor-series model. At each iteration, the objective function is approximated by a cubic function. Numerical experiments in [1] show that the ACR is comparable with trust-region method for small-scale problems. Despite the fact that the method has been shown to have powerful local and global convergence properties, the practicality and efficiency of the adaptive cubic regularization method depend critically on the efficiency of solving its sub-problem at each iteration.

For solving the trust-region sub-problem, many efficient algorithms have been proposed. These algorithms can be grouped into three broad categories: the accurate methods for dense problems, the accurate methods for large-sparse problems, and the approximation methods for large-scale problems. The first category are the accurate methods for dense problems, such as the classical algorithm proposed by Moré and Sorensen [2], which used Newton’s method to iteratively solve symmetric positive definite linear systems via the Cholesky factorization. The second category are the accurate methods for large-sparse problems. For instance, the Lanczos method was employed to solve the large-scale trust-region sub-problem through a parameterized eigenvalue problem [3,4]. Another accurate approach [5], is based on a parametric eigenvalue problem within a semi-definite framework, which employed the Lanczos method for the smallest eigenvalue as a black box. Hager [6] and Erway et al. [7] utilized the subspace projection algorithms for accurate methods. The third category are the approximation methods for large-scale problems. The generalized Lanczos trust-region method (GLTR) [8,9] was proposed as an improved Steihaug [10]-Toint [11] conjugate-gradient method. For the GLTR method, Zhang et al. established prior upper bounds [12] and posterior error bounds [13] for the optimal objective value and the optimal solution between the original trust-region sub-problem and their projected counterparts.

For solving cubic models sub-problems, many algorithms are extensions of trust-region algorithms. Cartis et al. [1] provided the Newton’s method to solve the sub-problem of ACR, which employs Cholesky factorization at each iteration. This method usually applies to small-scale problems. Moreover, Cartis et al. briefly described the process of using the Lanczos method for the ACR sub-problem in [1]. Carmon and Duchi [14] provided the gradient descent method to approximate the cubic-regularized Newton step, and gave the convergence rate. However, the convergence rate of the gradient descent method is worse than that of the Krylov subspace method. Birgin et al. [15] proposed a Newton-like method for unconstrained optimization, whose sub-problem is similar to but different from that of ACR. They introduced a mixed factorization, which is a cheaper factorization than the Cholesky factorization. Brás et al. [16] used the Lanczos method efficiently to solve the sub-problems associated with a special type of cubic models, and also embedded the Lanczos method in a large-scale trust-region strategy. Furthermore, an accelerated first-order method for the ACR sub-problem was developed by Jiang et al. [17].

In this paper, we employ the Lanczos method to solve the sub-problem of the adaptive cubic regularization method (ACRL) for large-scale problems. The ACRL algorithm mainly includes the following three steps. Firstly, the ACRL generates the jth Krylov subspace using the Lanczos method. Next, we project the original sub-problem onto the jth Krylov subspace to obtain a smaller-sized sub-problem. Finally, we solve the resulting smaller-sized sub-problem to get an approximate solution. Such procedures are based on the minimization of the local model of the objective function over a sequence of small-sized sub-spaces. As a result, the ACRL is applicable for large-scale problems. Moreover, we analyze the error of the Lanczos approximation. For unconstrained optimization problems, we perform numerical experiments and compare our method with the method of not using the Lanczos approximation (ACRN).

The outline of this paper is as follows. In Section 2, we introduce the adaptive cubic regularization method and its optimality condition. The method using the Lanczos algorithm to solve the ACR sub-problem is introduced in Section 3. In Section 4, we show the error bounds of the approximate solution and approximate objective value obtained using the ACRL method. Numerical experiments demonstrating the efficiency of the algorithm are given in Section 5. Finally, we give some concluding remarks in Section 6.

2. Preliminaries

Throughout the paper, a matrix is represented by a capital letter, while a lower case bold letter is used for a vector and a lower case letter for a scalar.

The adaptive cubic regularization method [1,18] is proposed by Cartis et al. for unconstrained optimization problems. It mainly uses a cubic over-estimator of the objective function as a regularization technique to calculate the step at each iteration. Assuming that

x_{k}

is the current iteration point, the objective function

f (x)

is second-order continuously differentiable, and its Hessian matrix

H (x) = \nabla_{xx} f (x)

is globally Lipschitz continuous. For any

p \in R^{n}

, by expressing the Taylor expansion of

f (x_{k} + p)

at the point

x_{k}

, we obtain

\begin{matrix} f (x_{k} + p) & = f (x_{k}) + p^{T} g (x_{k}) + \frac{1}{2} p^{T} H (x_{k}) p + \int_{0}^{1} (1 - t) p^{T} [H (x_{k} + t p) - H (x_{k})] p d t \\ \leq f (x_{k}) + p^{T} g (x_{k}) + \frac{1}{2} p^{T} H_{k} p + \frac{1}{6} L {∥ p ∥}^{3}, \end{matrix}

(1)

where

g (x) = \nabla_{x} f (x)

,

H (x) = \nabla_{xx} f (x)

, and L is the Lipschitz constant. Here, and for the remainder of this paper,

∥ \cdot ∥

denotes an

l_{2}

norm. The inequality is obtained by using the Lipschitz property of

\nabla_{xx} f (x)

. In [1], Cartis et al. proposed to replace the constant

\frac{1}{2} L

in Equation (1) with a dynamic positive parameter

σ_{k}

. In the cubic regularization model, the matrix

H (x)

needs not to be globally or locally continuous in general. Furthermore, the approximation of

H (x)

by a symmetric matrix

B_{k}

is employed at each iteration. Therefore, the model

min m_{k} (p) : = f (x_{k}) + p^{T} g (x_{k}) + \frac{1}{2} p^{T} B_{k} p + \frac{1}{3} σ_{k} {∥ p ∥}^{3}

(2)

is used to estimate

f (x_{k})

at each iteration. Then, the adaptive cubic regularization method sub-problem aims to compute a descent direction vector

p

. Finally, the sub-problem is given with the form of

min q_{k} (p) : = p^{T} g_{k} + \frac{1}{2} p^{T} B_{k} p + \frac{1}{3} σ_{k} {∥ p ∥}^{3},

(3)

in which

g_{k}

is short for

g (x_{k}) .

Cartis et al. introduced the following global optimality result of ACR, which is similar to the optimality conditions of the trust-region method.

Theorem 1

([1], Theorem 3.1). The vector

p_{o p t}

is a global minimizer of the sub-problem (3) if and only if there is a scalar

λ_{o p t} \geq 0

satisfying the following system of equations:

(B_{k} + λ_{o p t} I) p_{o p t} = - g_{k},

(4)

where

λ_{o p t} = σ_{k} ∥ p_{o p t} ∥

, and

B_{k} + λ_{o p t} I

is a positive semi-definite matrix. If

B_{k} + λ_{o p t} I

is positive definite, then

p_{o p t}

is unique.

The optimality condition of the trust-region sub-problem [19] aims to minimize

g_{k}^{T} p + \frac{1}{2} p^{T} B_{k} p

within an

l_{2}

-norm trust region

∥ p ∥ \leq △_{k}

, where

△_{k} > 0

is the trust-region radius. For a trust-region sub-problem, the vector

p_{o p t}

satisfies

λ_{o p t} (△_{k} - ∥ p_{o p t} ∥) = 0

, which means either

λ_{o p t} = 0 or ∥ p_{o p t} ∥ = △_{k}

. When both the trust-region sub-problem and the cubic regularization sub-problem approximate the original objective function precisely enough, we get

△_{k} = λ_{o p t} / σ_{k}

from Theorem 1. Therefore, the parameter

σ_{k}

in the ACR algorithm is inversely proportional to the trust-region radius, and it plays the same role as the trust region-radius, while we adjust the estimation accuracy of the sub-problem.

3. Computation of the ACR Sub-Problem with the Lanczos Method

The Lanczos algorithm [20] was proposed to solve sparse linear systems and to find the eigenvalues of sparse matrices. It builds up an orthogonal basis

Q_{j} = {q_{0}, q_{1}, \dots, q_{j}}

for the Krylov space

K_{j} (B, g) : = {g, B g, B^{2} g, \dots, B^{j} g} .

By utilizing the orthogonal basis

Q_{j},

the original symmetric matrix B is transformed into a tridiagonal matrix.

Normally, the dimension of the

K_{j} (B, g)

increases by 1 as j increases by 1. However, the Lanczos process may break down and the dimension of

K_{j} (B, g)

stops increasing at a certain j. We define

j_{m a x}

as the smallest nonnegative integer, such that the Lanczos process breaks down. If the dimension of the Krylov space is much less than the size of the matrix, it greatly saves the storage space and highly improves the calculation speed by projecting B onto a

j + 1

subspace. Specially, we find a proper

Q_{j}

using the Lanczos method, such that

Q_{j}^{T} B Q_{j} = T_{j}

is tridiagonal. We state the procedure in the following algorithm.

Algorithm 1 computes an orthogonal matrix

Q_{j} = [q_{0}, q_{1}, . . ., q_{j}] \in R^{n \times (j + 1)}

, where

T_{j} = (\begin{matrix} α_{0} & β_{1} \\ β_{1} & α_{1} & ⋱ \\ ⋱ & ⋱ & ⋱ \\ ⋱ & ⋱ & β_{j - 1} \\ β_{j - 1} & α_{j} \end{matrix})

is tridiagonal. Moreover, it follows directly from Algorithm 1 that

Q_{j}^{T} Q_{j} = I, T_{j} = Q_{j}^{T} B Q_{j}, Q_{j}^{T} g = β_{0} e_{1},

(5)

where

e_{1}

is the first unit vector of

j + 1

in length.

Algorithm 1 Lanczos algorithm

1:: $j = 0, β_{0} = ∥ g ∥, r_{0} = g, q_{0} = r_{0} / ∥ r_{0} ∥$
2:: while $j = 0$ or $β_{j} \neq 0$ do
3:: $q_{j + 1} = r_{j} / β_{j}$
4:: $j = j + 1$
5:: $α_{j} = q_{j}^{T} B q_{j}$
6:: $r_{j} = (B - α_{j} I) q_{j} - β_{j - 1} q_{j - 1}$
7:: $β_{j} = ∥ r_{j} ∥$
8:: end while

For a large-scale trust-region sub-problem, an effective solution is to approximately calculate it using the Krylov subspace methods. The Lanczos algorithm, as one of the Kryolv subspace methods, was first introduced in [8] for the trust-region method. Similar to the trust-region method, the Lanczos algorithm is also suitable for solving the cubic regularization sub-problem. By employing Algorithm 1, we find

p_{j}^{k} = Q_{j}^{k} u_{j}^{k} : = \underset{p \in K_{j} (B_{k}, g_{k})}{arg min} q_{k} (p) \in K_{j} (B_{k}, g_{k}),

(6)

where

q_{k} (p)

is defined by (3). The original sub-problem (3) is transformed into the following sub-problem

min q_{k} (u) : = β_{0} u^{T} e_{1} + \frac{1}{2} u^{T} T_{j} u + \frac{1}{3} σ_{k} {∥ u ∥}_{2}^{3} .

(7)

Theorem 1 illustrates that

u

is a global minimizer of the above sub-problem, if and only if a pair of

(u, λ)

satisfies

(T_{j} + λ I) u = - β_{0} e_{1} and λ^{2} = σ_{k}^{2} u^{T} u,

(8)

where

T_{j} + λ I

is positive semi-definite. Equation (8) can finally be solved by Newton’s method ([1], Algorithm 6.1). Newton’s method for solving the sub-problem requires the eigenvalue decomposition of

B + λ I

for various

λ

. When the scale of the original problem is large, it is very expensive to directly use the iterative method.

In summary, an approximation

p_{j}^{k}

of the solution of the ACR sub-problem (3) can be obtained in the following steps. First, we apply j steps of the Lanczos method to the cubic function appearing in (3) to obtain a tridiagonal matrix

T_{j}

. Then, we use the Newton’s method for a small-size sub-problem with matrix

T_{j}

to compute the Lagrange multiplier

λ_{k}

and

u_{j}^{k}

. Finally, the matrix

Q_{j}

is used to recover

p_{j}^{k}

. Thus, it should be noted that the Lanczos vectors need to be saved. We sketch the algorithm as follows.

In the GLTR algorithm, ([8], Theorem 5.8) discussed a restarting strategy for the degenerate case, which means that multiple global solutions

p_{o p t}

exist. Similar to the GLTR, a restarting strategy also applies to the ACRL, although this is just discussed from a theoretical perspective. Therefore, we mainly consider the nondegenerate case in the following analysis.

4. Convergence Analysis

Theorem 1 shows that we aim to seek a pair of

(p_{o p t}, λ_{o p t})

satisfying

(B_{k} + λ_{o p t} I) p_{o p t} = - g_{k} and λ_{o p t} = σ_{k} ∥ p_{o p t} ∥ .

(9)

Then, we have

∥ p_{o p t} ∥ = λ_{o p t} / σ_{k}

. In this section, we will analyze the error between the optimal objective function value of the original sub-problem

q_{k} (p_{o p t})

and the the optimal objective function value

q_{k} (p_{j}^{k})

of the sub-problem in the subspace

K_{j} (B_{k}, g_{k})

generated by the Algorithm 2, as well as the distance between

p_{o p t}

and

p_{j}^{k}

under the assumption

σ_{k} > 0

when the Equation (9) was satisfied.

We set

B_{k}^{o p t} : = B_{k} + λ_{o p t} I,

(10)

which is positive definite in the nondegenerate case. The spectral condition number of

B_{k}^{o p t}

is

κ (B_{k}^{o p t}) = \frac{θ_{n} + λ_{o p t}}{θ_{1} + λ_{o p t}},

(11)

where

θ_{1} \leq θ_{2} \leq . . . \leq θ_{n}

are the eigenvalues of

B_{k}

. We define

q_{k}^{o p t} (p) : = \frac{1}{2} p^{T} B_{k}^{o p t} p + p^{T} g_{k} + \frac{1}{3} σ_{k} {∥ p ∥}^{3} = q_{k} (p) + \frac{1}{2} λ_{o p t} {∥ p ∥}^{2},

(12)

in which

q_{k} (p)

is defined by (3).

Next, for the vector

p_{j}^{k}

defined in (6), we analyze the errors

∥ p_{j}^{k} - p_{o p t} ∥ and | q_{k} (p_{j}^{k}) - q_{k} (p_{o p t}) | .

Algorithm 2 The ACRL method

1:: for $j = 0 \dots$ do
2:: Obtain $T_{j}$ and $Q_{j}$ from Algorithm 1
3:: Solve the tridiagonal sub-problem (7) to get $u_{j}^{k}$ by Newton’s method
4:: $p_{j}^{k} = Q_{j} u_{j}^{k}$
5:: $j = j + 1$
6:: end for

Theorem 2.

Suppose (3) is nondegenerate;

∥ p_{o p t} ∥ = λ_{o p t} / σ_{k}

and

p_{j}^{k}

is the jth approximation of

p_{o p t}

generated by ACRL satisfying

∥ p_{j}^{k} ∥ = λ_{o p t} / σ_{k}

, then for any nonzero

\tilde{p} \in K_{j} (B_{k}, g_{k})

, we have

0 \leq q_{k} (p_{j}^{k}) - q_{k} (p_{o p t}) \leq 2 ∥ B_{k}^{o p t} ∥ ∥ \tilde{p} - p_{o p t} ∥^{2}

(13)

and

∥ p_{j}^{k} - p_{o p t} ∥ \leq 2 κ (B_{k}^{o p t}) ∥ \tilde{p} - p_{o p t} ∥ .

(14)

Proof.

It can be seen that

| ∥ \tilde{p} ∥ - \frac{λ_{o p t}}{σ_{k}} | = | ∥ \tilde{p} ∥ - ∥ p_{o p t} ∥ | \leq ∥ \tilde{p} - p_{o p t} ∥

. Then, we obtain

| 1 - \frac{λ_{o p t}}{σ_{k} ∥ \tilde{p} ∥} | \leq \frac{∥ \tilde{p} - p_{o p t} ∥}{∥ \tilde{p} ∥} .

(15)

Let

s = v - p_{o p t}

, where

v = \frac{λ_{o p t}}{σ_{k}} \cdot \frac{\tilde{p}}{∥ \tilde{p} ∥} .

(16)

Based on (16), we obtain

∥ v ∥ = ∥ p_{o p t} ∥ = \frac{λ_{o p t}}{σ_{k}} .

(17)

We immediately have

\begin{matrix} ∥ s ∥ = ∥ p_{o p t} - v ∥ & \leq ∥ p_{o p t} - \tilde{p} ∥ + ∥ \tilde{p} - v ∥ \\ = ∥ p_{o p t} - \tilde{p} ∥ + ∥ \tilde{p} - \frac{λ_{o p t}}{σ_{k}} \cdot \frac{\tilde{p}}{∥ \tilde{p} ∥} ∥ \\ \leq ∥ p_{o p t} - \tilde{p} ∥ + ∥ \tilde{p} ∥ | 1 - \frac{λ_{o p t}}{σ_{k} ∥ \tilde{p} ∥} | \\ \leq 2 ∥ p_{o p t} - \tilde{p} ∥, \end{matrix}

(18)

where the last equality follows from (15).

Furthermore, for any

0 \leq i \leq j_{m a x} - 1

,

q_{k} (p_{i}^{k}) = min_{p \in K_{i} (B_{k}, g_{k})} q_{k} (p) \geq q_{k} (p_{i + 1}^{k}) = min_{p \in K_{i + 1} (B_{k}, g_{k})} q_{k} (p) \geq q_{k} (p_{o p t}) = min q_{k} (p) .

Therefore, we have

\begin{matrix} 0 & \leq q_{k} (p_{j}^{k}) - q_{k} (p_{o p t}) \leq q_{k} (v) - q_{k} (p_{o p t}) = q_{k} (s + p_{o p t}) - q_{k} (p_{o p t}) \\ = \frac{1}{2} s^{T} B_{k} s + s^{T} (B_{k} p_{o p t} + g_{k}) + \frac{1}{3} σ_{k} {(∥ v ∥}^{3} - ∥ p_{o p t} ∥^{3}) \\ = \frac{1}{2} s^{T} B_{k} s - λ_{o p t} s^{T} p_{o p t} (by (17) and (9)) \\ = \frac{1}{2} s^{T} (B_{k} + λ_{o p t} I) s \end{matrix}

(19)

\begin{matrix} \leq \frac{∥ B_{k}^{o p t} ∥}{2} {∥ s ∥}^{2} \leq 2 ∥ B_{k}^{o p t} ∥ ∥ \tilde{p} - p_{o p t} ∥^{2} (by (18)) . \end{matrix}

(20)

From

{(\frac{λ_{o p t}}{σ_{k}})}^{2} = {∥ v ∥}^{2} = {∥ s ∥}^{2} + {∥ p_{o p t} ∥}^{2} + 2 s^{T} p_{o p t}

and (17), we get

s^{T} p_{o p t} = - {∥ s ∥}^{2} / 2 = - s^{T} s / 2 .

Then, the equality (19) holds. The conclusion in (13) is given based on the above analysis.

Next, we prove the inequality (14). From the definition of

q_{k}^{o p t}

in (12), for any

p_{j}^{k}

, by

∥ p_{o p t} ∥ = ∥ p_{j}^{k} ∥

, we have

\begin{matrix} q_{k}^{o p t} (p_{j}^{k}) - q_{k}^{o p t} (p_{o p t}) & = (q_{k} (p_{j}^{k}) + \frac{1}{2} λ_{o p t} {∥ p_{j}^{k} ∥}^{2}) - (q_{k} (p_{o p t}) + \frac{1}{2} λ_{o p t} {∥ p_{o p t} ∥}^{2}) \\ = q_{k} (p_{j}^{k}) - q_{k} (p_{o p t}) . \end{matrix}

(21)

Furthermore, we obtain

\begin{matrix} q_{k}^{o p t} (p_{j}^{k}) - q_{k}^{o p t} (p_{o p t}) & = \frac{1}{2} {(p_{j}^{k})}^{T} B_{k}^{o p t} p_{j}^{k} + {(p_{j}^{k})}^{T} g_{k} - \frac{1}{2} p_{o p t}^{T} B_{k}^{o p t} p_{o p t} - p_{o p t}^{T} g_{k} \\ = \frac{1}{2} {(p_{j}^{k} - p_{o p t})}^{T} B_{k}^{o p t} (p_{j}^{k} - p_{o p t}), \end{matrix}

where the last equality follows from (9) and (10). Then,

q_{k}^{o p t} (p_{j}^{k}) - q_{k}^{o p t} (p_{o p t}) \geq \frac{1}{2} (θ_{1} + λ_{o p t}) {∥ p_{j}^{k} - p_{o p t} ∥}^{2} .

(22)

Combining (13), (21), and (22), we get

\frac{1}{2} (θ_{1} + λ_{o p t}) ∥ p_{j}^{k} - p_{o p t} ∥^{2} \leq q_{k} (p_{j}^{k}) - q_{k} (p_{o p t}) \leq 2 ∥ B_{k}^{o p t} ∥ ∥ \tilde{p} - p_{o p t} ∥^{2} .

The inequality (14) holds. □

5. Numerical Experiments

In order to show the efficiency of the Lanczos for improving the adaptive cubic regularization algorithm, we perform the following two numerical experiments. In this section, we compare the numerical performances of the adaptive cubic regularization algorithm when using the Lanczos approximation method (ACRL) and the adaptive cubic regularization algorithm by just using Newton’s method (ACRN) for unconstrained optimization problems.

The ACRL and ACRN algorithms are implemented with the following parameters

η_{1} = 0.1, η_{2} = 0.8, γ_{1} = 0.25, γ_{2} = 1.2, and γ_{3} = 2 .

Convergence in both algorithms for the sub-problem occurs as soon as

∥ \nabla q_{k} (p_{j}^{k}) ∥ \leq min (0.0001, \frac{∥ p_{j}^{k} ∥}{max (1, σ_{k})}) ∥ \nabla q_{k} (0) ∥

or if more than the maximum number of iterations has been performed, which we set to 2000. All numerical experiments in this paper were performed on a laptop with i5-10210U CPU at 1.60 GHz and 16.0 GB of RAM.

Example 1

(Generalized Rosenbrock function [21]). The Generalized Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960, which is defined as follows:

f (x) = \sum_{i = 1}^{n - 1} c {(x_{i + 1} - x_{i}^{2})}^{2} + {(1 - x_{i})}^{2}, c = 100 .

(23)

From the (23), the solution

x^{★}

is obviously

{(1, 1, . . ., 1)}^{T}

, and the minimum

f (x^{★}) = 0

.

In Table 1, we show the results of the ACRL and the ACRN for computing the minima of the Generalized Rosenbrock function, with variables from 10 to 2000. In addition to the dimensions of the Generalized Rosenbrock function, we give the number of iterations (“Iter.”), the total CPU time required in seconds and the relative error between the computational result and the exact minimum (“Err.”). It can be seen that, using the Lanczos method to solve the adaptive cubic regularization sub-problem of Generalized Rosenbrock function is much more efficient than not using the Lanczos method. Moreover, it is not only faster, but also more accurate to calculate, especially when the scale is relatively large.

Example 2

(Eigenvalues of tensors arising from hypergraphs). Next, we consider the problem of computing extreme eigenvalues of sparse tensors arising from a hypergraph. An adaptive cubic regularization method on a Stiefel manifold named ACRCET is proposed to solve the eigenvalues of tensors [22]. We compare the numerical performances of the ACRL and the ACRN method when applying to the sub-problem of ACRCET. Before going to the experiment part, we first introduce the concepts of tensor eigenvalues and hypergraphs.

A real mth order n-dimensional tensor

A \in R^{[m, n]}

has

n^{m}

entries:

{a_{i_{1} i_{2} \dots i_{m}}}

for

i_{j} \in {1, 2, \dots, n} and j \in {1, 2, \dots, m} .

If the value of

{a_{i_{1} i_{2} \dots i_{m}}}

is invariable under any permutation of its indices,

A

is a symmetric tensor.

Qi [23] defined a scalar

Λ \in R

as a Z-eigenvalue of

A

and a nonzero vector

x \in R^{n}

as its associated Z-eigenvector if they satisfy

A x^{m - 1} = Λ x and x^{T} x = 1 .

Definition 1

(Hypergraph). A hypergraph is defined as

G = (V, E)

, where

V = {1, 2, \dots, n}

is the vertex set and

E = {e_{1}, e_{2}, \dots, e_{m}}

is the edge set for

e_{p} \subset V, p = 1, 2, \dots, m .

If

| e_{p} | = r \geq 2

for

p = 1, 2, \dots, m

and

e_{i} \neq e_{j}

when

i \neq j

, we call G an r-uniform hypergraph.

For each vertex

i \in V

, the degree

d (i)

is defined as

d (i) = | {e_{p} : i \in e_{p}, e_{p} \in E} | .

Definition 2

(adjacency tensor and Laplacian tensor). The adjacency tensor

A \in R^{[m, n]}

of a m-uniform hypergraph G is a symmetric tensor with entries

a_{i_{1} \dots i_{m}} = \{\begin{matrix} \frac{1}{(m - 1)!} & i f {i_{1}, \dots, i_{m}} \in E, \\ 0 & o t h e r w i s e . \end{matrix}

For an m-uniform hypergraph G, the degree tensor

D

is a diagonal tensor whose ith diagonal element is

d (i)

. Then, the Laplacian tensor

L

is defined as

L = D - A .

A triangle has three vertices and three edges. In this example, we subdivide the triangles by connecting the midpoints of each edge of the triangles. Then, the s-order subdivision of a triangle has

4^{s}

faces, and each face is a triangle. As shown in Figure 1, three vertices as well as the center of the triangles are regarded as an edge of a 4-uniform graph

G_{T}^{s}

.

We compute the largest Z-eigenvalue of the Laplacian tensor

L (G_{T}^{s})

via the ACRCET method, using ACRL and ACRN, respectively. In each run, 10 points on the unit sphere are randomly chosen, and 10 estimated eigenvalues are calculated. Then, we take the best one as the estimated largest eigenvalue. For different subdivision order

s,

the computation results, including the estimated largest Z-eigenvalue, the total number of iterations, and the total CPU time (in seconds) of the 10 runs are reported in Table 2.

It can be seen that both the ACRL and the ACRN find all the largest eigenvalues. However, the ACRL takes almost no time compared to the ACRN. When

s = 6

, the ACRL method only costs 236 s, while the ACRN needs 103,900 s. The numerical comparison between the ACRL and the ACRN verifies that the Lanczos method dramatically accelerates the running speed when solving the ACR sub-problem (3), and is powerful for large-scale problems.

6. Conclusions

In this paper, we have used the Lanczos method to solve the adaptive cubic regularization method sub-problem (ACRL). The ACRL method first projects a large-scale ACR sub-problem (3) into a much smaller sub-problem (7) using the Lanczos method, and then solves the smaller sub-problem (7) using the Newton’s method. For the convergence analysis, we also established prior error bounds on the differences between the approximate objective value

q_{k} (p_{j}^{k})

and the approximate solution

p_{j}^{k}

with its corresponding optimal ones. Numerical experiments illustrate that the ACRL method greatly improves the computing efficiency and performs well, even for large-scale problems.

Author Contributions

Methodology, Z.Z. and J.C.; writing—original draft preparation, Z.Z.; writing—review and editing, J.C.; supervision, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant No. 11901118 and No. 62073087.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part I: Motivation, convergence and numerical results. Math. Program. 2011, 127, 245–295. [Google Scholar] [CrossRef]
Moré, J.J.; Sorensen, D.C. Computing a trust region step. SIAM J. Sci. Stat. Comput. 1983, 4, 553–572. [Google Scholar] [CrossRef]
Sorensen, D.C. Minimization of a large-scale quadratic functionsubject to a spherical constraint. SIAM J. Optim. 1997, 7, 141–161. [Google Scholar] [CrossRef]
Rojas, M.; Santos, S.A.; Sorensen, D.C. A new matrix-free algorithm for the large-scale trust-region subproblem. SIAM J. Optim. 2000, 11, 611–646. [Google Scholar] [CrossRef]
Rendl, F.; Wolkowicz, H. A semidefinite framework for trust region subproblems with applications to large scale minimization. Math. Program. 1997, 77, 273–299. [Google Scholar] [CrossRef]
Hager, W.W. Minimizing a quadratic over a sphere. SIAM J. Optim. 2001, 12, 188–208. [Google Scholar] [CrossRef]
Erway, J.B.; Gill, P.E.; Griffin, J.D. Iterative methods for finding a trust-region step. SIAM J. Optim. 2009, 20, 1110–1131. [Google Scholar] [CrossRef]
Gould, N.I.; Lucidi, S.; Roma, M.; Toint, P.L. Solving the trust-region subproblem using the Lanczos method. SIAM J. Optim. 1999, 9, 504–525. [Google Scholar] [CrossRef]
Conn, A.R.; Gould, N.I.; Toint, P.L. Trust Region Methods; SIAM: Philadelphia, PA, USA, 2000; pp. 91–105. [Google Scholar]
Steihaug, T. The conjugate gradient method and trust regions in large scale optimization. SIAM J. Numer. Anal. 1983, 20, 626–637. [Google Scholar] [CrossRef]
Toint, P. Towards an efficient sparsity exploiting Newton method for minimization. In Sparse Matrices and Their Uses; Academic Press: Cambridge, MA, USA, 1981; pp. 57–88. [Google Scholar]
Zhang, L.H.; Shen, C.; Li, R.C. On the generalized Lanczos trust-region method. SIAM J. Optim. 2017, 27, 2110–2142. [Google Scholar] [CrossRef]
Zhang, L.; Yang, W.; Shen, C.; Feng, J. Error bounds of Lanczos approach for trust-region subproblem. Front. Math. China 2018, 13, 459–481. [Google Scholar] [CrossRef]
Carmon, Y.; Duchi, J. Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 2019, 29, 2146–2178. [Google Scholar] [CrossRef]
Birgin, E.G.; Martínez, J.M. A Newton-like method with mixed factorizations and cubic regularization for unconstrained minimization. Comput. Optim. Appl. 2019, 73, 707–753. [Google Scholar] [CrossRef]
Brás, C.P.; Martínez, J.M.; Raydan, M. Large-scale unconstrained optimization using separable cubic modeling and matrix-free subspace minimization. Comput. Optim. Appl. 2020, 75, 169–205. [Google Scholar] [CrossRef]
Jiang, R.; Yue, M.C.; Zhou, Z. An accelerated first-order method with complexity analysis for solving cubic regularization subproblems. Comput. Optim. Appl. 2021, 79, 471–506. [Google Scholar] [CrossRef]
Cartis, C.; Gould, N.I.; Toint, P.L. Adaptive cubic regularisation methods for unconstrained optimization. Part II: Worst-case function-and derivative-evaluation complexity. Math. Program. 2011, 130, 295–319. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: Berlin/Heidelberg, Germany, 1999; pp. 69–71. [Google Scholar]
Parlett, B.N.; Reid, J.K. Tracking the Progress of the Lanczos Algorithm for Large Symmetric Eigenproblems. IMA J. Numer. Anal. 1981, 1, 135–155. [Google Scholar] [CrossRef]
Andrei, N. An unconstrained optimization test functions collection. Adv. Model. Optim. 2008, 10, 147–161. [Google Scholar]
Chang, J.; Zhu, Z. An adaptive cubic regularization method for computing extreme eigenvalues of tensors. arXiv 2022, arXiv:2209.04971. [Google Scholar]
Qi, L. Eigenvalues of a real supersymmetric tensor. J. Symb. Comput. 2005, 40, 1302–1324. [Google Scholar] [CrossRef]

Figure 1. Four-uniform hypergraphs: subdivision of a triangle.

Table 1. Results for computing the minima of the Generalized Rosenbrock function.

n	ACRL			ACRN
n	Iter.	Time (s)	Err.	Iter.	Time (s)	Err.
10	11	0.01	4.05 $\times 10^{- 14}$	8	0.01	4.85 $\times 10^{- 11}$
500	17	0.05	8.33 $\times 10^{- 13}$	12	0.88	1.10 $\times 10^{- 13}$
1000	21	0.27	7.35 $\times 10^{- 13}$	12	4.56	5.53 $\times 10^{- 11}$
5000	23	6.24	1.47 $\times 10^{- 15}$	9	191.95	1.41 $\times 10^{- 12}$
10,000	21	20.33	2.42 $\times 10^{- 15}$	11	1524.88	6.29 $\times 10^{- 13}$
20,000	21	77.8	6.90 $\times 10^{- 13}$	14	16,166.62	3.03 $\times 10^{- 11}$

Table 2. Results for finding the largest Z-eigenvalues of

L (G_{T}^{s})

.

Table 2. Results for finding the largest Z-eigenvalues of

L (G_{T}^{s})

.

			ACRL			ACRN
s	m	n	Iter.	Time (s)	$λ_{max}^{Z} (L (G_{T}^{s}))$	Iter.	Time (s)	$λ_{max}^{Z} (L (G_{T}^{s}))$
1	10	4	53	0.04	3	53	0.05	3
2	31	16	57	0.05	6	56	0.17	6
3	109	64	55	0.08	6	54	0.79	6
4	409	256	56	0.78	6	54	13.05	6
5	1585	1024	67	13.02	6	66	727.77	6
6	6241	4096	81	236.35	6	85	103,900	6

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhu, Z.; Chang, J. Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method. Symmetry 2022, 14, 2191. https://doi.org/10.3390/sym14102191

AMA Style

Zhu Z, Chang J. Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method. Symmetry. 2022; 14(10):2191. https://doi.org/10.3390/sym14102191

Chicago/Turabian Style

Zhu, Zhi, and Jingya Chang. 2022. "Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method" Symmetry 14, no. 10: 2191. https://doi.org/10.3390/sym14102191

APA Style

Zhu, Z., & Chang, J. (2022). Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method. Symmetry, 14(10), 2191. https://doi.org/10.3390/sym14102191

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Solving the Adaptive Cubic Regularization Sub-Problem Using the Lanczos Method

Abstract

1. Introduction

2. Preliminaries

3. Computation of the ACR Sub-Problem with the Lanczos Method

4. Convergence Analysis

5. Numerical Experiments

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI