A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization

Wang, Xinyi; Ding, Xianfeng; Qu, Quan

doi:10.3390/sym12020208

Open AccessArticle

A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization

by

Xinyi Wang

¹,

Xianfeng Ding

^1,2,*

and

Quan Qu

¹

School of Science, Southwest Petroleum University, Chengdu 610500, China

²

School of Artificial Intelligence, Southwest Petroleum University, Chengdu 610500, China

^*

Author to whom correspondence should be addressed.

Symmetry 2020, 12(2), 208; https://doi.org/10.3390/sym12020208

Submission received: 17 December 2019 / Revised: 20 January 2020 / Accepted: 21 January 2020 / Published: 2 February 2020

(This article belongs to the Special Issue Advance in Nonlinear Analysis and Optimization)

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, a new filter nonmonotone adaptive trust region with fixed step length for unconstrained optimization is proposed. The trust region radius adopts a new adaptive strategy to overcome additional computational costs at each iteration. A new nonmonotone trust region ratio is introduced. When a trial step is not successful, a multidimensional filter is employed to increase the possibility of the trial step being accepted. If the trial step is still not accepted by the filter set, it is possible to find a new iteration point along the trial step and the step length is computed by a fixed formula. The positive definite symmetric matrix of the approximate Hessian matrix is updated using the MBFGS method. The global convergence and superlinear convergence of the proposed algorithm is proven by some classical assumptions. The efficiency of the algorithm is tested by numerical results.

Keywords:

unconstrained optimization; adaptive trust region; nonmonotone; filter; convergence

1. Introduction

Consider the following unconstrained optimization problem:

\min_{x \in R^{n}} f (x),

(1)

where

f

:

R^{n} \to R

is a twice continuously differentiable function. The trust region method is one of the prominent classes of iterative methods. At the iteration point

x_{k}

, the trial step

d_{k}

is obtained by the following quadratic subproblem:

\min_{d \in R^{n}} m_{k} (d) = g_{k}^{T} d + \frac{1}{2} d^{T} B_{k} d,

(2)

‖ d ‖ \leq Δ_{k}

where

‖ . ‖

is the Euclidean norm,

f_{k} = f (x_{k})

,

g_{k} = \nabla f (x_{k})

,

G_{k} = \nabla^{2} f (x_{k})

,

B_{k}

is a symmetric approximation of

G_{k}

, and

Δ_{k}

is a trust region radius. The most ordinary ratio is defined as follows:

ρ_{k} = \frac{f_{k} - f (x_{k} + d)}{m_{k} (0) - m_{k} (d_{k})},

(3)

Generally, the numerator is referred to as the actual reduction and the denominator is known as the predicted reduction.

The disadvantage of the traditional trust region method is that the subproblem needs to be solved many times to achieve an acceptable trial step in one iteration. To overcome this drawback, Mo et al. [1] first proposed a nonmonotone trust region algorithm with a fixed step length. When the trial step is not acceptable, we use a fixed step length to find a new iteration point instead of resolving the subproblem. Based on this advantage, Ou, Hang, and Wang have proposed a trust region algorithm with fixed step length in [2,3,4], respectively. The fixed step length is computed by

α_{k} = - \frac{δ g_{k}^{T} d_{k}}{d_{k}^{T} B_{k} d_{k}},

(4)

It is well known that the strategy of selecting a trust region radius has a significant impact on the performance of the trust region methods. In 1997, Sartenaer [5] presented a strategy which can automatically determine an initial trust region radius. This fact leads to an increase in the number of subproblems to be solved in some problems, thereby reducing the efficiency of these methods. In 2002, Zhang et al. [6] provided another scheme to reduce the number of subproblems that need to be solved, where the trust region radius uses:

Δ_{k} = c^{p} ‖ g_{k} ‖ ‖ {\hat{B}}_{k}^{- 1} ‖

,

{\hat{B}}_{k} = B_{k} + i I

, where

i \in N

. Zhang’s strategy requires an estimation of the inverse of the matrixes

B_{k}

and

{\hat{B}}_{k}^{- 1}

in each iteration; however, Li [4] has suggested another practically efficient adaptive trust region radius that uses

Δ_{k} = \frac{‖ d_{k - 1} ‖}{‖ y_{k - 1} ‖} ‖ g_{k} ‖

. The strategy requires not only the gradient value but also the function value. Inspired by these facts, some modified versions of adaptive trust region methods have been proposed in [7,8,9,10].

As we all know, monotone techniques may slow down the rate of convergence, especially in the presence of the narrow curved valley. The monotone techniques require the value of the function to be decreased at each iteration. In order to overcome these disadvantages, Deng et al. [11] proposed a nonmonotone trust region algorithm in 1993. The general nonmonotone term

f_{l (k)}

is defined by

f_{l (k)} = f (x_{l (k)}) = \max_{0 \leq j \leq m (k)} {f_{k - j}} k = 0, 1, 2, \dots,

(5)

where

m (0) = 0

,

0 \leq m (k) \leq \min {N, m (k - 1) + 1}

, and

N \geq 0

is an integer. Deng et al. [11] modified the ratio (3) which evaluates the consistency between the quadratic model and the objective function in trust region methods. The most common nonmonotone ratio is defined as follows:

{\tilde{ρ}}_{k} = \frac{f_{l (k)} - f (x_{k} + d_{k})}{m_{k} (0) - m_{k} (d_{k})}

(6)

The general nonmonotone term

f_{l (k)}

suffers from various drawbacks, such as the fact that numerical performance is highly dependent on the choice of

N

. In order to introduce a more suitable nonmonotone strategy, Ahookhosh et al. [12] proposed a new nonmonotone ratio as follows.

{\hat{ρ}}_{k} = \frac{R_{k} - f (x_{k} + d_{k})}{m_{k} (0) - m_{k} (d_{k})},

(7)

where

R_{k} = η_{k} f_{l (k)} + (1 - η_{k}) f_{k},

(8)

in which

η_{k} \in [η_{\min}, η_{\max}]

, with

η_{\min} \in [0, 1)

, and

η_{\max} \in [η_{\min}, 1]

. We recommend that interested readers refer to [13,14] for more details and progress on the nonmonotone trust region algorithm.

In order to overcome the difficulties associated with using the penalty function, especially the adjustment of the penalty parameter. The filter methods were recently presented by Fletcher and Leyffer [15] for constrained nonlinear optimization. More recently, Gould et al. [16] explored a new nonmonotone trust region algorithm for the unconstrained optimization problems with the multidimensional filter technique in [17]. Compared with the standard nonmonotone algorithm, the new algorithm dynamically determines iterations based on filter elements and increases the possibility of the trial step being accepted. Therefore, this topic has received great attention in recent years (see [18,19,20,21]).

The remainder of this paper is organized as follows. In Section 2, we describe a new trust region method. The global convergence is investigated in Section 3. In Section 4, we prove the superlinear convergence of the algorithm. Numerical results are shown in Section 5. Finally, the paper ends with some conclusions in Section 6.

2. The New Algorithm

In this section, we propose a trust region method by combining a new trust region radius and the modified trust region ratio to effectively solve unconstrained optimization problems. In each iteration, a trial step

d_{k}

is generated by solving an adaptive trust region subproblem,

\min_{d \in ℝ^{n}} m_{k} (d) = g_{k}^{T} d + \frac{1}{2} d^{T} B_{k} d,

(9)

‖ d ‖ \leq Δ_{k} : = c_{k} {‖ g_{k} ‖}^{γ},

(10)

where

0 < γ < 1

and

c_{k}

is an adjustment parameter. Prompted by the adaptive technique, the proposed method has the following effective properties: it is not necessary to calculate the matrix of the inverse and the value of each iteration point, and the algorithm also reduces the related workload and calculation time.

In fact, the matrix

B_{k}

is usually obtained by approximation, and the subproblems are only solved roughly. In this case, it may be more reasonable to adjust the next trust radius

Δ_{k + 1}

, not only according to

{\hat{ρ}}_{k}

, but also by considering the use of

{{\hat{ρ}}_{k - m}, \dots, {\hat{ρ}}_{k}}

. To improve the efficiency of the nonmonotone trust region methods, we can define the modified ratio formula based on (7) as follows:

{\hat{ρ}}^{'}_{k} = \sum_{i = 1}^{\min {k, m}} w_{k i} {\hat{ρ}}_{k - i + 1},

(11)

where

m

is a positive integer and

w_{k i}

is the weight of

{\hat{ρ}}_{k - i + 1}

, such that

\sum_{i = 1}^{\min {k, m}} w_{k i} = 1

.

More exactly,

{\hat{ρ}}_{k}

can be used to determine whether the trial step is acceptable. Adjusting the next radius

Δ_{k + 1}

depends on (11), thus

c_{k}

is updated by

c_{k + 1} = {\begin{cases} \min {β_{2} c_{k}, c_{\max}} & if {\hat{ρ}}_{k}^{'} \geq μ_{2} \\ c_{k} & if μ_{1} \leq {\hat{ρ}}_{k}^{'} < μ_{2} \\ β_{1} c_{k} & if {\hat{ρ}}_{k}^{'} < μ_{1} \end{cases},

(12)

In what follows, we refer to

\nabla f (x_{k})

by

g_{k} = g (x_{k})

. When an

i - th

component of

g_{k}

is needed, we denote it using

g_{i} (x_{k})

. Based on this filter, we say that an iterate point

x_{1}

dominates

x_{2}

if, and only if

| g_{i} (x_{1}) | \leq | g_{i} (x_{2}) | \forall i = 1, 2, \dots, n .

(13)

A multidimensional filter

F

is a list of

n

-tuples of the form

(g_{1} (x_{k}), g_{2} (x_{k}), \dots, g_{n} (x_{k}))

, such that

| g_{j} (x_{k}) | \leq | g_{j} (x_{l}) | j \in {1, 2, 3, \dots, n},

(14)

where

g_{k}

,

g_{l}

belong to

F

.

For all

g (x_{l}) \in F

, a new trial point is acceptable if there exists

j \in {1, 2, 3, \dots, n}

, such that

| g_{j} (x_{k}) | \leq | g_{j} (x_{l}) | - γ_{g} ‖ g (x_{l}) ‖ γ_{g} \in (0, \frac{1}{\sqrt{n}}),

(15)

If the iterate point

x_{k}^{+}

is acceptable, we add

g (x_{k}^{+})

to the filter. Meanwhile, we remove all the points which are dominated by

x_{k}^{+}

from the filter. In the general filter trust region algorithm, for the trial point

x_{k}^{+}

satisfying

{\hat{ρ}}_{k} < μ_{1}

, we verify whether it is accepted by filter

F

. In our algorithm, it is the trial point

x_{k}^{+}

that satisfies

0 < {\hat{ρ}}_{k} < μ_{1}

, and verifies whether or not it is accepted by the filter

F

.

Our discussion can be summarized as the following Algorithm 1.

Algorithm 1. A new filter nonmonotone adaptive trust region method.

Step 0. (Initialization) An initial point

x_{0} \in R^{n}

and a symmetric matrix

B_{0} \in R^{n} \times R^{n}

are given. The constants

0 < μ_{1} < μ_{2} < 1

,

0 < η_{\min} \leq η_{\max} < 1

,

τ > 0

,

N > 0

,

ε > 0

,

η_{\min} \in [0, 1)

and

η_{\max} \in [η_{\min}, 1)

are also given.
Step 1. If

‖ g_{k} ‖ \leq ε

, then stop.
Step 2. Solve the subproblem (2) to find the trial step

d_{k}

.
Step 3. Choose

w_{k i} \in [0, 1]

, which satisfies

\sum_{i = 1}^{\min {k, m}} w_{k i} = 1

. Compute

R_{k}

,

{\hat{ρ}}_{k}

, and

{\hat{ρ}}_{k}^{'}

, respectively.
Step 4. Test the trial step.
If

{\hat{ρ}}_{k} \geq μ_{1}

, then set

x_{k + 1} = x_{k}^{+}

.
Otherwise compute

g_{k}^{+} = \nabla f (x_{k}^{+})

;
if

x_{k}^{+}

is acceptable by the filter

F

, then

x_{k + 1} = x_{k}^{+}

, add

g_{k}^{+} = \nabla f (x_{k}^{+})

into the filter

F

.
Otherwise, find the step length

α_{k}

satisfying (4), set

x_{k + 1} = x_{k} + α_{k} d_{k}

.
end(if)
end(If)
Step 5. Update the trust region radius by

Δ_{k + 1} = c_{k + 1} {‖ g_{k + 1} ‖}^{γ}

, where

c_{k + 1}

is updated by (12).
Step 6. Compute the new Hessian approximation

B_{k + 1}

by a modified BFGS method formula. Set

k = k + 1

, and return to Step 1.

In order to obtain convergence results, we use the following notation:

D = {k | {\hat{ρ}}_{k} \geq μ_{1}}

,

A = {k | 0 < {\hat{ρ}}_{k} < μ_{1} and x_{k}^{+} is accepted by the filter F}

,

S = {k | x_{k + 1} = x_{k} + d_{k}}

. Then, we have

S = {k | {\hat{ρ}}_{k} \geq μ_{1} or x_{k}^{+} is accepted by the filter F}

. When

k \notin S

, we have

x_{k + 1} = x_{k} + α_{k} d_{k}

.

3. Convergence Analysis

To establish the convergence of Algorithm 1, we make the following common assumption.

Assumption 1.

H1.The level set

L (x_{0}) = {x \in R^{n} | f (x) \leq f (x_{0})} \subset Ω

,where

Ω \in R^{n}

is bounded

f (x)

is continuously differentiable on the level set

L (x_{0})

.

H2. The matrix

B_{k}

is uniformly bounded, i.e., there exists a constant

M_{1} > 0

such that

‖ B_{k} ‖ \leq M_{1}

.

H3. There exists constant

v

such that

v {‖ d ‖}^{2} \leq d^{T} B_{k} d

, for all

k \in N \cup {0}

.

Remark 1.

In order to analyze the convergence of the new algorithm, it should be seen to that the trial step

d_{k}

satisfies the following conditions:

m_{k} (0) - m_{k} (d_{k}) \geq τ ‖ g_{k} ‖ \min {Δ_{k}, \frac{‖ g_{k} ‖}{‖ B_{k} ‖}},

(16)

g_{k}^{T} d_{k} \leq - τ ‖ g_{k} ‖ \min {Δ_{k}, \frac{‖ g_{k} ‖}{‖ B_{k} ‖}},

(17)

where the constant

τ \in (0, 1)

.

Remark 2.

If

f

is a twice continuously differentiable function, then H1 means that there is a positive constant

L

such that

‖ \nabla f (x) - \nabla f (y) ‖ \leq L ‖ x - y ‖ \forall x, y \in Ω,

(18)

Lemma 1.

For all

k

, we can obtain that

| f_{k} - f (x_{k} + d_{k}) - (m_{k} (0) - m_{k} (d_{k})) | \leq O ({‖ d_{k} ‖}^{2}),

(19)

Proof.

The proof is given by using Taylor’s expansion and H3. □

Lemma 2.

Suppose that H1-H3 hold, the sequence

{x_{k}}

is generated by Algorithm 1. Moreover, assume that there exists a constant

0 < ε < 1

such that

‖ g_{k} ‖ > ε

, for all

k

. Then, for every

k

, there exists a nonnegative integer

p

, so that

x_{k + p + 1}

is a successful iteration point, i.e.,

{\hat{ρ}}_{k + p} \geq μ_{1}

.

Proof.

We are able to prove this by using contradiction; we assume that there exists an iteration

k

, and that

x_{k + p + 1}

is an unsuccessful iteration point for all nonnegative integer

p

, i.e.,

{\hat{ρ}}_{k + p} < μ_{1} p = 0, 1, 2, \dots .

(20)

It follows from (11) that

{\hat{ρ}}_{k + p}^{'} < μ_{1} p = 0, 1, 2, \dots

. Thus, using (10) and (12), we show

Δ_{k + p + 1} \leq β_{1}^{p + 1} c_{k} {‖ g_{k + p} ‖}^{γ} \leq β_{1}^{p + 1} c_{\max} {‖ g_{k} ‖}^{γ} .

(21)

As a matter of fact, in the unsuccessful iterations, we have

x_{k + p} = x_{k}

. Thus, according to

0 < β_{1} < 1

and (21), we have

\lim_{p \to \infty} Δ_{k + p + 1} = 0 .

(22)

Now, using Lemma 1 and (16) we get

\begin{array}{l} | \frac{f (x_{k + p}) - f (x_{k + p} + d_{k + p})}{m_{k + p} (0) - m_{k + p} (d_{k + p})} - 1 | & = | \frac{f (x_{k + p}) - f (x_{k + p} + d_{k + p}) - (m_{k + p} (0) - m_{k + p} (d_{k + p}))}{m_{k + p} (0) - m_{k + p} (d_{k + p})} | \\ \leq \frac{O ({‖ d_{k + p} ‖}^{2})}{τ ‖ g_{k + p} ‖ \min {Δ_{k + p}, \frac{‖ g_{k + p} ‖}{‖ B_{k + p} ‖}}} \\ \leq \frac{O ({‖ d_{k + p} ‖}^{2})}{τ ε \min {Δ_{k + p}, \frac{ε}{M_{1}}}} \\ \leq \frac{O (Δ_{k + p}^{2})}{O (Δ_{k + p})} \to 0 (p \to \infty) \end{array}

According to the definition of

R_{k}

, we get

R_{k} \geq η_{k} f_{k} + (1 - η_{k}) f_{k} = f_{k}

. Thus, for sufficiently large

p

we have,

{\hat{ρ}}_{k + p} = \frac{R_{k + p} - f (x_{k + p} + d_{k + p})}{m_{k + p} (0) - m_{k + p} (d_{k + p})} \geq \frac{f_{k + p} - f (x_{k + p} + d_{k + p})}{m_{k + p} (0) - m_{k + p} (d_{k + p})} \to 1,

(23)

which contradicts (20). This completes the proof of Lemma 2. □

Lemma 3.

Suppose that H1–H3 hold and the sequence

{x_{k}}

is generated by Algorithm 1. Set

δ \in (0, \min {1, \frac{v}{L}})

. For

k \notin S

, we have

f_{k + 1} - R_{k} \leq \frac{δ}{2} (1 - \frac{L δ}{v}) g_{k}^{T} d_{k} \leq 0,

(24)

Proof.

According to the definition of

R_{k}

, for all

k

, we have

f_{k} \leq R_{k}

. Using the differential mean value theorem, we get

f_{k + 1} - R_{k} \leq f_{k + 1} - f_{k} = g {(ξ)}^{T} (x_{k + 1} - x_{k}),

(25)

where

ξ \in [x_{k}, x_{k + 1}]

. For

k \notin S

and (18), we obtain

\begin{array}{l} g {(ξ)}^{T} (x_{k + 1} - x_{k}) & = g_{k}^{T} (x_{k + 1} - x_{k}) + (g (ξ) - g_{k})^{T} (x_{k + 1} - x_{k}) \\ \leq g_{k}^{T} (x_{k + 1} - x_{k}) + ‖ g (ξ) - g_{k} ‖ ‖ x_{k + 1} - x_{k} ‖ \\ \leq g_{k}^{T} (x_{k + 1} - x_{k}) + L {‖ x_{k + 1} - x_{k} ‖}^{2} \\ = g_{k}^{T} α_{k} d_{k} + L α_{k}^{2} {‖ d_{k} ‖}^{2} \\ = (1 - L δ {‖ d_{k} ‖}^{2} / d_{k}^{T} B_{k} d_{k}) α_{k} g_{k}^{T} d_{k} \\ \leq (1 - L δ / v) α_{k} g_{k}^{T} d_{k} \end{array}

(26)

Note that (4) and (16) imply that

α_{k} \geq \frac{δ}{2}

for all. Setting

δ \in (0, \min {1, \frac{v}{L}})

means that

1 - L δ / v > 0

. According to (17), (25), and (26), we can conclude that (24) holds. □

Lemma 4.

Suppose that the sequence

{x_{k}}

is generated by Algorithm 1. Then, we have

{x_{k}} \subset L (x_{0})

.

Proof.

We can proceed by induction. When

k = 0

, apparently

x_{0} \in L (x_{0})

.

Assuming that

x_{k} \in L (x_{0})

(k \geq 0)

holds, we would obtain

f_{k} \leq f_{0}

. Then, we can prove that

x_{k + 1} \in L (x_{0})

.

(a) When

k \in S

, consider two cases:

Case 1:

k \in D

. According to (7) and (16), we obtain

R_{k} - f_{k + 1} \geq μ_{1} (m_{k} (0) - m_{k} (d_{k})) \geq 0

. Thus, we have

R_{k} \geq f_{k + 1}

. Following the definition of

R_{k}

and

f_{l (k)}

, we obtain

R_{k} = η_{k} f_{l (k)} + (1 - η_{k}) f_{k} \leq η_{k} f_{l (k)} + (1 - η_{k}) f_{l (k)} = f_{l (k)} .

(27)

The above two inequalities show that

f_{k + 1} \leq R_{k} \leq f_{l (k)} \leq f_{0} .

(28)

Case 2:

k \in A

. According to

0 < {\hat{ρ}}_{k} < μ_{1}

, we have

R_{k} - f (x_{k} + d_{k}) > 0

. Thus, we get

f_{k + 1} \leq R_{k} \leq f_{l (k)} \leq f_{0}

.

(b) When

k \notin S

. Using Lemma 3 and (27), we obtain

f_{k + 1} \leq R_{k} \leq f_{l (k)} \leq f_{0}

.

Now, we can conclude that

{x_{k}} \subset L (x_{0})

. □

Lemma 5.

Suppose that H1–H3 hold, and the sequence

{x_{k}}

is generated by Algorithm 1. This would mean that

{f_{l (k)}}

is a not a monotonically increasing sequence, nor is it convergent.

Proof.

Now, we can prove that the sequence

{f_{l (k)}}

is not a monotonically increasing sequence. Thus, we consider two cases:

Case 1: For

k < N

, it is clear that

m (k) = k

. Since, for any

k

, we have

f_{k} \leq f_{0}

. Thus, we get

f_{l (k)} = f_{0}

.

Case 2: For

k \geq N

, we obtain

m (k + 1) = m (k) + 1

. Thus, using the definition of

f_{l (k + 1)}

and (5), we observe that

f_{l (k + 1)} = \max_{0 \leq j \leq m (k + 1)} {f_{k + 1 - j}} \leq \max_{0 \leq j \leq m (k) + 1} {f_{k + 1 - j}} = \max {f_{l (k)}, f_{k + 1}} \leq f_{l (k)},

(29)

{f_{l (k)}}

is not a monotonically increasing sequence. This fact, along with H1, implies that

\exists λ, s . t . \forall n \in N \cup {0} : λ \leq f_{k + n} \leq f_{l (k + n)} \leq \dots \leq f_{l (k + 1)} \leq f_{l (k)} .

(30)

This shows that the sequence

{f_{l (k)}}

is convergent. □

Lemma 6.

Suppose that H1–H3 hold, and there exists

ε > 0

such that

‖ g_{k} ‖ \geq ε

, for all

k

. Then, there is a constant

β > 0

, and we have

Δ_{k} \geq \frac{β}{M_{k}},

(31)

where

M_{k} = \max_{0 \leq i \leq k} ‖ B_{i} ‖ + 1

.

Proof.

Set

σ = \frac{τ ε (1 - μ_{1})}{2 L}

. We proceed by induction; set

β = {Δ_{0} M_{0}, β_{1} σ M_{0}, (1 - μ_{1}) β_{1} τ ε, β_{1} ε},

(32)

When

k = 0

, we can see that

Δ_{0} \geq \frac{β}{M_{0}}

. Then, assume that (31) holds for

k

. Note that

{M_{k}}

is an increasing sequence. Thus, we prove that

Δ_{k + 1} \geq \frac{β}{M_{k}}, k = 0, 1, \dots,

(33)

(a) When

k \in D

, i.e.,

{\hat{ρ}}_{k} \geq μ_{1}

. Using (11) and (7), we deduce that

{\hat{ρ}}_{k}^{'} \geq μ_{1}

. From (12), we get

Δ_{k + 1} = λ Δ_{k}

, where

λ > 1

is a constant. Thus, the inequality

Δ_{k} \geq \frac{β}{M_{k}}

implies that

Δ_{k + 1} \geq \frac{β}{M_{k}}

.

(b) When

k \in A

, i.e.,

0 < {\hat{ρ}}_{k} < μ_{1}

. Supposing that

‖ d_{k} ‖ \geq σ

, we have

Δ_{k + 1} = β_{1} Δ_{k} \geq β_{1} ‖ d_{k} ‖ \geq β_{1} σ \geq \frac{β_{1} σ M_{0}}{M_{k}} \geq \frac{β}{M_{k}}

(34)

Then, assuming that

‖ d_{k} ‖ < σ

, we have

\frac{f_{k} - f (x_{k} + d_{k})}{m_{k} (0) - m_{k} (d_{k})} \leq \frac{R_{k} - f (x_{k} + d_{k})}{m_{k} (0) - m_{k} (d_{k})} < μ_{1}

Thus,

f (x_{k} + d_{k}) - f_{k} > - μ_{1} (m_{k} (0) - m_{k} (d_{k})) = μ_{1} (g_{k}^{T} d_{k} + \frac{1}{2} d_{k}^{T} B_{k} d_{k}) .

(35)

Using Taylor’s formula and H1–H3, it is easy to show that

\begin{array}{l} f (x_{k} + d_{k}) - f_{k} & \leq g {(η)}^{T} d_{k} = g_{k}^{T} d_{k} + (g (η) - g_{k})^{T} d_{k} \\ \leq g_{k}^{T} d_{k} + L {‖ d_{k} ‖}^{2} \\ \leq g_{k}^{T} d_{k} + \frac{τ (1 - μ_{1}) ε ‖ d_{k} ‖}{2} \end{array}

(36)

where

η \in [x_{k}, x_{k} + d_{k}]

. When combining (35) with (36), we discover that

(1 - μ) (g_{k}^{T} d_{k} + \frac{τ ε ‖ d_{k} ‖}{2}) > \frac{μ d_{k}^{T} B_{k} d_{k}}{2} .

(37)

Moreover, the inequality (16), together with

‖ g_{k} ‖ \geq ε

, imply that

- g_{k}^{T} d_{k} - \frac{1}{2} d_{k}^{T} B_{k} d_{k} \geq τ ε \min {Δ_{k}, \frac{ε}{‖ B_{k} ‖}} .

(38)

Multiply the two sides of inequality (38) by

(1 - μ)

, such that

- (1 - μ) (g_{k}^{T} d_{k} + d_{k}^{T} B_{k} d_{k} / 2) \geq (1 - μ_{1}) τ ε \min {‖ Δ_{k} ‖, \frac{ε}{‖ B_{k} ‖}} .

(39)

On the other hand, from H3, (37), and (39), we have

Δ_{k} ‖ B_{k} ‖ \geq τ (1 - μ_{1}) ε \min {1, \frac{2 ε}{‖ B_{k} ‖ Δ_{k}} - 1} .

(40)

If

Δ_{k} ‖ B_{k} ‖ \leq ε

, we have

Δ_{k} ‖ B_{k} ‖ > (1 - μ) τ ε

. Otherwise, we obtain

Δ_{k} ‖ B_{k} ‖ > ε

. Now, following (40) we obtain

Δ_{k} ‖ B_{k} ‖ \geq \min {(1 - μ_{1}) τ ε, ε}

Thus,

Δ_{k + 1} = β_{1} Δ_{k} \geq \frac{\min {(1 - μ_{1}) β_{1} τ ε, β_{1} ε}}{‖ B_{k} ‖} \geq \frac{β}{M_{k}}

(41)

The proof is completed. □

Based on the analyses and lemmas above, we obtain the global convergence of Algorithm 1 as follows:

Theorem 1.

(Global Convergence) Suppose that H1–H3 hold, and the sequence

{x_{k}}

is generated by Algorithm 1. Then,

\lim_{k \to \infty} \inf ‖ g_{k} ‖ = 0

(42)

Proof.

Consider the following two cases:

Case 1: The number of successful iterations and many filter iterations are infinite, e.g.,

| S | = + \infty

,

| A | = + \infty

.

We proceed from this proof with a contradiction. Suppose that (42) is not true, then there exists a positive constant

ε

such that

‖ g_{k} ‖ > ε

. From H1, we can see that

{‖ g_{k} ‖}

is bounded. Set in the index of set

A

is the sequence

{k_{i}}

. Thus, there exists a subsequence

{k_{t}} \subseteq {k_{i}}

, which satisfies

\lim_{t \to \infty} g_{k_{t}} = g_{\infty}

,

g_{\infty} \geq ε

,

\exists j \in {1, 2, \dots, n}

and we have

| g_{k_{t}}^{j} | - | g_{_{k_{t - 1}}}^{j} | \leq - γ_{g} ‖ g_{k_{t - 1}} ‖ .

(43)

Using (43), as

t

is sufficiently large, we have

0 \leftarrow | g_{k_{t}}^{j} | - | g_{_{k_{t - 1}}}^{j} | \leq - γ_{g} ε < 0 .

(44)

As this is a contradiction, the proof is completed.

Case 2: The number of successful iterations is infinite, and the number of filter iterations is finite, e.g.,

| S | = + \infty

,

| A | < + \infty

.

Assume for a moment that there exists an integer constant

k_{1}

, such that

k > k_{1}

. This implies that

k \in D

, and we therefore have

{\hat{ρ}}_{k} \geq μ_{1}

. Hence, from (16) and (27), we find that

f_{l (k)} - f_{k + 1} \geq R_{k} - f_{k + 1} \geq μ_{1} τ ‖ g_{k} ‖ \min {Δ_{k}, \frac{‖ g_{k} ‖}{‖ B_{k} ‖}} \geq 0 .

(45)

We proceed from this proof with a contradiction. Suppose that there exist constants

ε > 0

and

k_{2} > k_{1}

, such that

‖ g_{k} ‖ \geq ε

,

\forall k \geq k_{2}

. Based on Lemma 6 and (45), we write

f_{l (k)} - f_{k + 1} \geq R_{k} - f_{k + 1} \geq μ_{1} τ ε \min {\frac{β}{M_{k}}, \frac{ε}{M_{k}}} = μ_{1} τ ε \frac{\min {β, ε}}{M_{k}} .

(46)

Set

a = μ_{1} τ ε \min {β, ε}

, thus,

f_{l (k)} - f_{k + 1} \geq \frac{a}{M_{k}} .

(47)

According to (47) and

{M_{k}}

, this is an increasing sequence, as we have

\begin{array}{l} f_{l (k)} \geq f_{k + 1} + a / M_{k} \geq f_{k + 1} + a / M_{k + M + 1} \\ f_{l (k + 1)} \geq f_{k + 2} + a / M_{k + 1} \geq f_{k + 2} + a / M_{k + M + 1} \\ ⋮ ⋮ ⋮ ⋮ ⋮ \\ f_{l (k + M)} \geq f_{k + M + 1} + a / M_{k + M} \geq f_{k + M + 1} + a / M_{k + M + 1} \end{array}

(48)

We then take the maximum value of both sides of (48), along with Lemma 5, to imply that

f_{l (k)} \geq \max {f_{k + 1}, f_{k + 2}, \dots, f_{k + M + 1}} + a / M_{k + M + 1} \forall k \geq k_{2} .

(49)

According to (5), we have

f_{l (k + M + 1)} \leq \max {f_{k + 1}, f_{k + 2}, \dots, f_{k + M + 1}} .

(50)

Thus, we get

f_{l (k)} - f_{l (k + M + 1)} \geq a / M_{k + M + 1} .

(51)

Now, using (51), we deduce that

\begin{array}{l} \sum_{k \geq k_{2}} \frac{1}{M_{k + M + 1}} & \leq \frac{1}{a} \sum_{k \geq k_{2}} (f_{l (k)} - f_{l (k + M + 1)}) \\ = \frac{1}{a} \sum_{k \geq k_{2}} \sum_{s = 0}^{M} (f_{l (k + s)} - f_{l (k + s + 1)}) \\ = \frac{1}{a} \sum_{k \geq k_{2}} (f_{l (k)} - f_{l (k + 1)}) < + \infty \end{array}

(52)

which contradicts

\sum_{k = 1}^{\infty} \frac{1}{M_{k}} = + \infty

. This completes the proof of Theorem 1. □

4. Local Convergence

In this section, we will demonstrate the superlinear convergence of Algorithm 1 under appropriate conditions.

Theorem 2.

(Superlinear Convergence) Suppose that H1–H3 hold, and the sequence

{x_{k}}

generated by Algorithm 1 converges to

x^{*}

. Moreover, assume that

\nabla^{2} f (x^{*})

is a positive definite, and

\nabla^{2} f (x)

is Lipschitz continuous in a neighborhood of

x^{*}

. If

‖ d_{k} ‖ \leq Δ_{k}

, where

d_{k} = - B_{k}^{- 1} g_{k}

, and

\lim_{k \to \infty} \frac{‖ (B_{k} - \nabla^{2} f (x^{*}) d_{k} ‖}{‖ d_{k} ‖} = 0

(53)

Then, the sequence

{x_{k}}

converges to

x^{*}

superlinearly, that is,

‖ x_{k + 1} - x^{*} ‖ = o (‖ x_{k} - x^{*} ‖)

(54)

Proof.

Following Lemmas 1 and 2, it is obvious that

{\hat{ρ}}_{k} \geq μ_{1}

for sufficiently large

k

. This shows that Algorithm 1 has been simplified to the superlinear convergence standard quasi-Newton methods [22]. Thus, the superlinear convergence of this algorithm can be proven to be similar to Theorem 5.5.1 in [22]. We omit it for convenience. □

5. Preliminary Numerical Experiments

In this section, we perform numerical experiments on Algorithm 1, and compare it with Mo [1] and Hang [4]. A set of unconstrained test problems (of variable dimension) are selected from [23]. The simulation experiment uses MATLAB 9.4 and the processor uses Intel (R) Core (TM), 2.00 GHz, 6 GB RAM. Take exactly the same value for the public parameters of these algorithms:

μ_{1} = 0.25

,

μ_{2} = 0.75

,

β_{1} = 0.25

,

β_{2} = 1.5

,

M = 5

. In our experiments, algorithms are being stopped when

‖ g_{k} ‖ \leq 10^{- 6} ‖ g_{0} ‖

or the number of iterations exceeds 10,000. We denote the running time via CPU.

n_{f}

and

n_{i}

denoted the total number of gradient evaluations, and the total number of function evaluations, respectively. The matrix

B_{k}

is updated using a MBFGS formula [24]:

B_{k + 1} = {\begin{cases} B_{k} + \frac{z_{k} z_{k}^{T}}{z_{k}^{T} d_{k}} - \frac{B_{k} d_{k} d_{k}^{T} B_{k}}{d_{k}^{T} B_{k} d_{k}}, & y_{k}^{T} d_{k} > 0 \\ B_{k}, & y_{k}^{T} d_{k} \leq 0 \end{cases}

where

d_{k} = x_{k + 1} - x_{k}

,

y_{k} = g_{k + 1} - g_{k}

,

z_{k} = y_{k} + t_{k} ‖ g_{k} ‖ d_{k}

,

t_{k} = 1 + \max {- \frac{y_{k}^{T} d_{k}}{‖ g_{k} ‖ ‖ d_{k} ‖}, 0}

.

To facilitate this, we used the following notations to represent the algorithms:

ANTRFS: A Nonmonotone Trust Region Method with Fixed Step length [1];

FSNATR: On a Fixed Step length Nonmonotone Adaptive Trust Region Method [4];

From Table 1, there are some variable dimension problems, which select the dimension in the range [4,1000]. We know that the new algorithm is generally better than NTRFS and FSNATR in terms of the total number of gradient evaluations and function evaluations. The new algorithm solves all the test functions in Table 1. The performance profiles given by Dolan and Moré [25] are used to compare the efficiency of the three algorithms. Figure 1, Figure 2 and Figure 3 give the performance profiles of the three algorithms for running time, the number of gradient evaluations, and the number of function evaluations, respectively. The figures show that Algorithm 1 performs well when compared with the other algorithms, at least on the test problems considered, which are mostly of small dimension. It can be observed that Algorithm 1 increases faster than the other algorithms, especially in contrast to NTRFS. Therefore, we can deduce that the new algorithm is more efficient and robust than the other considered trust region algorithms for solving small and medium-scale unconstrained optimization problems.

6. Conclusions

In this paper, the authors proposed a new nonmonotone trust region method and also put forward the following innovations:

(1) a new adaptive radius strategy to reduce the number of calculations;

(2) a modified trust region ratio to solve effectively unconstrained optimization problems. The filter technology is also important. Theorems 1 and 2 show that the proposed algorithm can preserve global convergence and superlinear convergence, respectively. According to preliminary numerical experiments, we can conclude that the new algorithm is very effective for unconstrained optimization, and the nonmonotone technology is very helpful for many optimization problems. In the future, we will have more ideas for solving many optimization problems, such as combining the modified conjugate gradient algorithm with a modified trust region method. We can also use the new algorithm for solving constrained optimization problems.

Author Contributions

Conceptualization, X.W. and Q.Q.; methodology, X.W.; software, X.W.; validation, X.W., Q.Q. and X.D.; formal analysis, X.D.; investigation, Q.Q.; resources, X.D.; data curation, Q.Q.; writing—original draft preparation, X.W.; writing—review and editing, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

At the point of finishing this paper, I’d like to express my sincere thanks to all those who have lent me a hand over the course of my writing this paper. First of all, I would like to take this opportunity to show my sincere gratitude to my supervisor, Xianfeng Ding, who has given me so much useful advice on my writing and has tried his best to improve my paper. Secondly, I would like to express my gratitude to my classmates, who offered me references and information on time. Without their help, it would have been much harder for me to finish this paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Mo, J.T.; Zhang, K.C.; Wei, Z.X. A nonmonotone trust region method for unconstrained optimization. Appl. Math. Comput. 2005, 171, 371–384. [Google Scholar] [CrossRef]
Ou, Y.G.; Zhou, Q.; Lin, H.C. An ODE-based trust region method for unconstrained optimization problems. J. Comput. Appl. Math. 2009, 232, 318–326. [Google Scholar] [CrossRef] [Green Version]
Wang, X.Y.; Tong, J. A Nonmonotone Adaptive Trust Region Algorithm with Fixed Stepsize for Unconstrained Optimization Problems. Math. Appl. 2009, 3, 496–500. [Google Scholar]
Hang, D.; Liu, M. On a Fixed Stepsize Nonmonotonic Self-Adaptive Trust Region Algorithm. J. Southwest China Norm. Univ. 2013, 38. [Google Scholar] [CrossRef]
Sartenaer, A. Automatic determination of an initial trust region in nonlinear programming. SIAM J. Sci. Comput. 1997, 18, 1788–1803. [Google Scholar] [CrossRef] [Green Version]
Zhang, X.S.; Zhang, J.L.; Liao, L.Z. An adaptive trust region method and its convergence. Sci. China 2002, 45, 620–631. [Google Scholar] [CrossRef] [Green Version]
Shi, Z.J.; Guo, J.H. A new trust region methods for unconstrained optimization. Comput. Appl. Math. 2008, 213, 509–520. [Google Scholar] [CrossRef] [Green Version]
Kimiaei, M. A new class of nonmonotone adaptive trust-region methods for nonlinear equations with box constraints. Calcolo 2017, 54, 769–812. [Google Scholar] [CrossRef]
Amini, K.; Shiker Mushtak, A.K.; Kimiaei, M. A line search trust-region algorithm with nonmonotone adaptive radius for a system of nonlinear equations. Q. J. Oper. Res. 2016, 4, 132–152. [Google Scholar] [CrossRef]
Peyghami, M.R.; Tarzanagh, D.A. A relaxed nonmonotone adaptive trust region method for solving unconstrained optimization problems. Comput. Optim. Appl. 2015, 61, 321–341. [Google Scholar] [CrossRef]
Deng, N.Y.; Xiao, Y.; Zhou, F.J. Nonmonotone Trust Region Algorithm. J. Optim. Theory Appl. 1993, 76, 259–285. [Google Scholar] [CrossRef]
Ahookhoosh, M.; Amini, K.; Peyghami, M. A nonmonotone trust region line search method for large scale unconstrained optimization. Appl. Math. Model. 2012, 36, 478–487. [Google Scholar] [CrossRef]
Zhang, H.C.; Hager, W.W. A nonmonotone line search technique and its application to unconstrained optimization. SIAM J. Optim. 2004, 14, 1043–1056. [Google Scholar] [CrossRef] [Green Version]
Gu, N.Z.; Mo, J.T. Incorporating nonmonotone strategies into the trust region for unconstrained optimization. Comput. Math. Appl. 2008, 55, 2158–2172. [Google Scholar] [CrossRef] [Green Version]
Fletcher, R.; Leyffer, S. Nonlinear programming without a penalty function. Math. Program. 2002, 91, 239–269. [Google Scholar] [CrossRef]
Gould, N.I.; Sainvitu, C.; Toint, P.L. A filter-trust-region method for unconstrained optimization. SIAM J. Optim. 2005, 16, 341–357. [Google Scholar] [CrossRef] [Green Version]
Gould, N.I.; Leyffer, S.; Toint, P.L. A multidimensional filter algorithm for nonlinear equations and nonlinear least-squares. SIAM J. Optim. 2004, 15, 17–38. [Google Scholar] [CrossRef] [Green Version]
Wächter, A.; Biegler, L.T. Line search filter methods for nonlinear programming and global convergence. SIAM J. Optim. 2005, 16, 1–31. [Google Scholar] [CrossRef]
Miao, W.H.; Sun, W. A filter trust-region method for unconstrained optimization. Numer. Math. J. Chin. Univ. 2007, 19, 88–96. [Google Scholar]
Zhang, Y.; Sun, W.; Qi, L. A nonmonotone filter Barzilai-Borwein method for optimization. Asia Pac. J. Oper. Res. 2010, 27, 55–69. [Google Scholar] [CrossRef]
Fatemi, M.; Mahdavi-Amiri, N. A filter trust-region algorithm for unconstrained optimization with strong global convergence properties. Comput. Optim. Appl. 2012, 52, 239–266. [Google Scholar] [CrossRef]
Nocedal, J.; Wright, S.J. Numerical Optimization; Springer: New York, NY, USA, 2006. [Google Scholar]
Andrei, N. An unconstrained optimization test functions collection. Environ. Sci. Technol. 2008, 10, 6552–6558. [Google Scholar]
Pang, S.; Chen, L. A new family of nonmonotone trust region algorithm. Math. Pract. Theory. 2011, 10, 211–218. [Google Scholar]
Dolan, E.D.; Moŕe, J.J. Benchmarking optimization software with performance profiles. Math. Program. 2002, 91, 201–213. [Google Scholar] [CrossRef]

Figure 1. CPU performance profile for the three algorithms.

Figure 2. Performance profile for the number of gradient evaluations (

n_{i}

).

Figure 2. Performance profile for the number of gradient evaluations (

n_{i}

).

Figure 3. Performance profile for the number of function evaluations (

n_{f}

).

Figure 3. Performance profile for the number of function evaluations (

n_{f}

).

Table 1. Numerical comparisons on a subset of test problems.

Problem	$n$	$n_{f} / n_{i}$
		ANTRFS [9]	CPU	FSNATR [20]	CPU	Algorithm 1	CPU
Ext.Rose	4	755/382	2.755795	168/88	0.322036	87/58	0.104316
Ext. Beale	4	25/13	0.008651	41/21	0.069185	18/16	0.028946
Penalty i	2	18/10	0.087532	18/10	0.067020	17/14	0.032533
Pert.Quad	6	28/25	0.058921	25/13	0.058700	18/17	0.035631
Raydan 1	8	18/10	0.015109	38/20	0.105928	39/20	0.070292
Raydan 2	4	21/11	0.015356	13/8	0.012729	11/6	0.017449
Diagonal 1	10	13/8	0.009493	35/18	0.070199	27/26	0.064282
Diagonal 2	10	56/29	0.017841	58/30	0.119385	57/29	0.083905
Diagonal 3	50	200/101	1.926143	182/92	1.232287	127/126	1.849887
Hager	10	27/14	0.049037	27/14	0.048247	33/17	0.071906
Gen. Trid 1	20	967/484	3.536055	50/26	0.432577	47/24	0.217367
Ext.Trid 1	10	27/14	0.013890	29/15	0.128696	18/12	0.071580
Ext. TET	50	13/7	0.203093	16/9	0.031416	17/9	0.119907
Diadonal 4	100	7/4	0.035933	9/5	0.343849	5/4	0.146901
Ext.Him	100	29/15	0.147102	25/13	0.208976	29/28	0.409463
Gen. White	50	785/576	10.47342	771/429	9.940880	443/228	5.741535
Ext. Powell	16	1567/787	7.266044	794/404	2.148929	496/337	1.208253
Full Hessian FH3	100	11/6	0.053598	11/6	0.084726	8/7	0.088831
Ext.BD1	100	51/27	0.210790	50/28	0.739621	21/15	0.261978
Pert. Quad	200	91/66	2.547689	87/44	2.421596	57/56	2.405979
Extended Hiebert	16	1821/1000	9.819290	175/143	2.456780	135/68	0.527388
Quadratic QF1	4	15/8	0.007903	17/9	0.017025	11/10	0.010983
FLETCHCR34	36	210/123	1.847519	150/91	0.950314	165/83	1.786160
ARWHEAD	200	297/150	37.928050	29/15	0.317976	15/12	0.317976
NONDIA	50	75/39	0.368280	92/47	0.544079	51/35	0.307129
DQDRTIC	50	67/38	0.51243	53/28	0.341435	32/30	0.318596
EG2	200	32/17	0.319954	28/16	0.373764	49/35	2.633184
Bro.Tridiagonal	200	2797/1504	441.453385	744/398	119.570838	69/35	1.539657
A.Per.Quad	16	73/47	0.144890	63/32	0.132644	45/26	0.128349
Pert.Trid.Quad	100	330/166	10.985321	325/163	9.663929	289/156	8.521700
Ext.DENSCH	100	37/19	0.190549	43/22	0.398777	128/82	5.638770
SINCOS	100	4303/2152	198.717544	1303/952	142.543185	65/36	1.122092
BIGGSB1	10	1949/1042	8.466655	329/195	0.676394	275/185	0.376394
ENGVAL1	200	788/487	139.949938	643/406	99.088596	474/472	88.401960
EDENSCH	100	474/238	25.639664	45/26	0.407574	37/23	0.930150
CUBE	100	430/220	21.53234	357/198	20.93564	280/147	13.946540
BDEXP	100	476/369	34.54797	452/356	24.569196	22/21	0.550708
GENHUMPS	100	532/321	3.27453	412/213	0.475453	1014/537	1.235720
QUARTC	100	57/32	1.035734	43/22	0.443325	18/17	0.326680
Gen. PSC1	500	198/212	10.457624	51/54	9.562354	51/54	8.539801
Ext. PSC1	500	15/15	1.254327	15/15	1.0983452	13/13	1.562763
Variably dim.	500	41/27	2.9578243	21/16	1.4536982	17/15	1.093456
DIXMAANA	1000	21/21	1.457893	21/21	1.237642	20/20	1.025372
SINQUAD	1000	1582/1063	187.563723	1995/1215	135.872354	912/579	100.458723
DIXMAANJ	1000	2415/2398	431.253485	2320/2311	410.253485	2246/2132	397.256732

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, X.; Ding, X.; Qu, Q. A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization. Symmetry 2020, 12, 208. https://doi.org/10.3390/sym12020208

AMA Style

Wang X, Ding X, Qu Q. A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization. Symmetry. 2020; 12(2):208. https://doi.org/10.3390/sym12020208

Chicago/Turabian Style

Wang, Xinyi, Xianfeng Ding, and Quan Qu. 2020. "A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization" Symmetry 12, no. 2: 208. https://doi.org/10.3390/sym12020208

APA Style

Wang, X., Ding, X., & Qu, Q. (2020). A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization. Symmetry, 12(2), 208. https://doi.org/10.3390/sym12020208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Filter Nonmonotone Adaptive Trust Region Method for Unconstrained Optimization

Abstract

1. Introduction

2. The New Algorithm

3. Convergence Analysis

4. Local Convergence

5. Preliminary Numerical Experiments

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI