Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming

Tang, Chunming; He, Bo; Wang, Zhenzhen

doi:10.3390/math8020265

Open AccessArticle

Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming

by

Chunming Tang

^*

,

Bo He

and

Zhenzhen Wang

College of Mathematics and Information Science, Guangxi University, Nanning 540004, China

^*

Author to whom correspondence should be addressed.

Mathematics 2020, 8(2), 265; https://doi.org/10.3390/math8020265

Submission received: 25 December 2019 / Revised: 31 January 2020 / Accepted: 11 February 2020 / Published: 17 February 2020

(This article belongs to the Special Issue Computational Methods in Applied Analysis and Mathematical Modeling)

Download Versions Notes

Abstract

The accelerated prox-level (APL) and uniform smoothing level (USL) methods recently proposed by Lan (Math Program, 149: 1–45, 2015) can achieve uniformly optimal complexity when solving black-box convex programming (CP) and structure non-smooth CP problems. In this paper, we propose two modified accelerated bundle-level type methods, namely, the modified APL (MAPL) and modified USL (MUSL) methods. Compared with the original APL and USL methods, the MAPL and MUSL methods reduce the number of subproblems by one in each iteration, thereby improving the efficiency of the algorithms. Conclusions of optimal iteration complexity of the proposed algorithms are established. Furthermore, the modified methods are applied to the two-stage stochastic programming, and numerical experiments are implemented to illustrate the advantages of our methods in terms of efficiency and accuracy.

Keywords:

stochastic programming; multi-step accelerated scheme; bundle method; complexity analysis

1. Introduction

In the fields of production planning, finance risk, telecommunication, and electricity, decision makers need to take into consideration uncertainty about the information and the model itself. Lack of data, calculation errors as well as unpredictability, etc. lead to the uncertainty in information. The uncertainty of model is derived from the structure of problem, the features of constraints, as well as the risks and profiles of decisions. Stochastic programming (SP) is an effective tool for dealing with optimization problems under uncertainty. The expectation model in stochastic programming is widely used, which maximizes the expectation of the benefit or minimizes the loss under the expected constraints. In this paper, we are concerned with the two-stage stochastic programming with recourse. Next, through a practical problem, network planning with random demand, we introduce the mathematical model of two-stage stochastic programming.

Due to demand for higher bandwidth and dedicated lines, network capacity is becoming a scarce resource. Consider a situation where a network provider plans bandwidth allocation between network links, under total network capacity b that is available for allocation. In addition, there are n different links needed to be expanded in capacity. The extra capacity allocated to link j is

x_{j}

, where

j = 1, 2, \dots, n

and vector

x \in R^{n}

consists of elements

x_{j}

. In network planning, demands refer to the number of connections requested between point-to-point pairs provided by the network at a certain time. Here, the demand

D \in R^{m}

related to the m point-to-point pairs is modeled as a random variable.

Suppose that the extra capacities

x_{j}

,

j = 1, 2, \dots, n

are given and the demand D is observed. Then the capacity planning model introduced in [1] is to minimize the total number of unoffered requests. Let

i = 1, \dots, m

be the indices of point-to-point pairs, and the paths that can be offered to connections related to point-to-point pair i be represented in

P (i)

.

c \in R^{n}

is a vector whose elements

c_{j}

denote the capacity of current link j.

f_{i p}

and

s_{i}

denote the number of connections and unoffered requests related to pair i, respectively. For an observation

d = (d_{1}, \dots, d_{m})

of the random variable D, one can obtain the optimal decision by solving the linear programming problem as follows

\begin{matrix} min & \sum_{i = 1}^{m} s_{i} \\ s . t . & \sum_{i = 1}^{m} \sum_{p \in P (i)} A_{i p} f_{i p} \leq x + c, \\ \sum_{p \in P (i)} f_{i p} + s_{i} = d_{i}, \\ f_{i p} \geq 0, s_{i} \geq 0, i = 1, 2, \dots, m, and p \in P (i) . \end{matrix}

(1)

Here, 0-1 vector

A_{i p}

is introduced as an incidence whose j-th component,

a_{i p j}

, is 1 if link j lies in path p and 0 otherwise. Let

Q (x, d)

denote the optimal value of the above linear programming (1) which depends on the observation d of D and capacity x. It is obvious that

Q (x, D)

is also a random variable due to the randomness of D. Thus, the capacities

x_{j}

,

j = 1, 2, \dots, n

can be obtained from the following programming problem

\begin{matrix} min & E [Q (x, D)] \\ s . t . & \sum_{j = 1}^{n} x_{j} \leq b, \\ x \geq 0 . \end{matrix}

(2)

Here

E [\cdot]

refers to the expectation of the probability distribution of random variable D. In a word, the purpose of programming problem (2) is to minimize the expectation of unoffered requests, while satisfying the constraint within the total network capacity b. The above optimization model (1) and (2) is an instance of two-stage stochastic programming with recourse, in which (1) is the second-stage problem and (2) is the first-stage problem.

From the above practical instance we can see that the stochastic programming plays a key role in network planning with random demand. In addition to stochastic programming, the unit commitment problems in the field of energy systems usually involve mixed-integer quadratic programming. Recently, there have been some studies on solving such problems [2,3,4]. For solving the programming problems (1) to (2), Sen et al. [1] applied the stochastic decomposition method, which is one of the most efficient methods for stochastic programming. There are still many applications of stochastic programming in other fields, such as insurance and finance, valuation of electricity, telecommunications, hydrothermal power production planning and pollution control, etc. Stochastic models of these application problems can be found in [5]. In what follows, as an extension of the network planning problem, the standard mathematical model of two-stage stochastic programming with recourse is given, which is more convenient for generalization and theoretical analysis.

The two-stage stochastic programming with fixed recourse is in the following form

\begin{matrix} min & c^{T} x + E [Q (x, D)] \\ s . t . & A x = b, \\ x \geq 0, \end{matrix}

(3)

where x is the decision variable, c is the the cost of production and

Q (x, D)

denotes the optimal objective value of the second-stage problem

\begin{matrix} min & q^{T} y \\ s . t . & W y = h - T x, \\ y \geq 0 . \end{matrix}

(4)

Here, D is the demand vector with the elements

q^{T}

,

h^{T}

, and T. Let W be the fixed recourse and

D = (D_{1}, \dots, D_{n})

be a random variable with the support of probability distribution

Ξ \subset R^{d}

.

E [Q (x, D)]

in the first-stage problem (3) is the expectation with respect to D.

The mathematical model of the two-stage stochastic programming is given and next we will summarize the methods mainly used to solve it. Since the two and multi-stage stochastic programming problems have very large dimension and special structures, they can be solved by means of decomposition. In two-stage models (3) and (4), suppose that the probability space

Θ

is finite,

ω^{s}, s = 1, \dots, S

and that

p_{s}, s = 1, \dots, S

represent all basic events and their probabilities. Then the first-stage problem can be rewritten in the form

\begin{matrix} min & c^{T} x + \sum_{s = 1}^{S} p_{s} Q^{s} (x) \\ s . t . & A x = b, \\ x \geq 0 . \end{matrix}

(5)

Here,

Q^{s} (x)

is the optimal value of the second-stage problem

\begin{matrix} min & {(q^{s})}^{T} y \\ s . t . & W^{s} y = h^{s} - T^{s} x, \\ y \geq 0, \end{matrix}

(6)

where,

T^{s} = T (ω^{s})

,

W^{s} = W (ω^{s})

,

h^{s} = h (ω^{s})

,

y^{s} = y (ω^{s})

,

q^{s} = q (ω^{s})

.

The main idea of the basic dual decomposition methods is to establish certain approximations to the first-stage problem by means of solving subproblems in structure (6). As an original decomposition-type method, the cutting-plane method establishes a linear model of

\sum_{s = 1}^{S} p_{s} Q^{s} (x)

, and its main scheme can be found in [6,7]. Simplicity and cheapness in calculation are advantages of the cutting-plane model. With the increasing of cuts, however, the demand for storage growths which is a typical difficulty of such method. On the other hand, although there is a good initial iteration, such method may generate a longer stepsize. In order to tackle these difficulties, Ruszczynski [8] proposed the regularized decomposition method which is considered as an improvement of the basic one. Another way to avoid generating longer steps is the trust region method which was extended to two-stage stochastic programming by Linderoth and Wright [9]. In addition, bundle methods, which can be viewed as stabilized variants of the original cutting-plane methods, were developed in [10,11,12]. Algorithmic modifications based on bundle methods were introduced in [13,14]. The bundle level (BL) method was first proposed by Lemaréchal et al. [15] as another kind of bundle method. On this basis, in [16,17,18] the “restricted-memory” version of BL method is developed which performs well in numerical experiments. In recent years, there has been tremendous development in “asynchronous” and “partial” versions of BL methods, see references [19,20,21,22,23]. Considering that the research on BL methods is only focused on general non-smooth convex optimization (CP) problems, Lan [24] proposed an accelerated BL-type method, namely the accelerated bundle level (ABL) method and its restricted memory version, the accelerated prox-level (APL) method. Benefiting from the multi-step strategy introduced by Nesterov [25] and later applied in [26,27,28,29,30], both ABL and APL methods are uniformly optimal for solving non-smooth, weakly smooth and smooth CP problems. In addition, by incorporating Nesterov’s smoothing technique [16,17] into APL method, Lan [24] presented the uniform smoothing level (USL) method for solving structure non-smooth CP problems with optimal iteration complexity. In particular, the USL method does not require any input of problem parameters. Moreover, Lan [24] illustrated that the APL and USL methods, when applied to solving semidefine programming and two-stage stochastic programming, have obvious advantages in computation time and accuracy over the related gradient-type algorithms and some existing methods.

Our main work in this paper includes several aspects. First of all, on the basis of Lan’s work, we make further improvement and present the modified accelerated prox-level (MAPL) method. By selecting the proper proximal function, the MAPL method only need to solve one subproblem to update the prox-center and the lower bound synchronously, which improves the computational efficiency. In addition, MAPL method can achieve uniformly optimal complexity for solving smooth, weakly smooth, and non-smooth CP problems uniformly. Furthermore, we extend the MAPL method to solve the structure non-smooth CP problems and present the modified uniform smoothing level (MUSL) method. Finally, we apply the proposed methods to solve the two-stage stochastic programming problems with recourse. The numerical results show that the MAPL and MUSL methods have certain advantages in iterations and computation time.

The present paper is built up as follows. In Section 2, some related works about BL-type methods are reviewed. The MAPL method and its complexity analysis are presented in Section 3. The MUSL method and its complexity analysis are presented in Section 4. The application of MAPL and MUSL methods to two-stage stochastic programming as well as numerical experiments are shown in Section 5. Conclusions are presented in the final section.

2. Related Work

This section reviews some related work on BL methods. Specifically, the main ideas of BL methods and related notation are reviewed in Section 2.1. The APL method and its gap reduction procedure

G_{M A P L}

introduced in Section 2.2 are the basis of the main work in this paper.

2.1. The Bundle Level (BL) Method

Consider the general CP problem

f^{*} : = min_{x \in X} f (x),

(7)

where the constraint set

X \subseteq R^{n}

is convex and compact, and the objective function

f : X \to R

is closed and convex over X. The function f is known only through a first-order oracle returning the function value

f (x)

and a subgradient

f^{'} (x) \in \partial f (x)

for a given point

x \in X

, where

\partial f (x) : = {ξ \in R^{n} | f (y) \geq f (x) + 〈ξ, y - x〉, \forall y \in R^{n}}

is the subdifferential of f at x.

Given a sequence of iteration points, the first-order information

{f (x_{k})}

and

{f^{'} (x_{k})}

are provided by oracle. The cutting-plane approximation of f is generated by

{\overset{ˇ}{f}}_{k} (x) : = max {h (x_{i}, x) : 1 \leq i \leq k} \leq f (x),

where

h (x_{i}, x) : = f (x_{i}) + 〈 f^{'} (x_{i}), x - x_{i} 〉 .

In addition,

〈 x, y 〉 : = \sum_{i} x_{i} y_{i}, x, y \in R^{n}

refers to the inner product. The level set

X_{k}

of

{\overset{ˇ}{f}}_{k}

with level parameter

ℓ_{k}

is defined by

X_{k} : = {x \in X : {\overset{ˇ}{f}}_{k} (x) \leq ℓ_{k}}

. The BL method [15] generates the next iteration point

x_{k + 1}

by

x_{k + 1} = {argmin}_{x \in X_{k}} {∥x - x_{k}∥}^{2} .

(8)

The main steps of the BL method are listed below.

Step 1. Set

{\bar{f}}_{k} : = min \{f (x_{i}), 1 \leq i \leq k\}

and compute a lower bound

{\underset{̲}{f}}_{k} : = {min}_{x \in X} {\overset{ˇ}{f}}_{k} (x)

;

Step 2. Set the level parameter

ℓ_{k} = β {\underset{̲}{f}}_{k} + (1 - β) {\bar{f}}_{k}

for some

β \in (0, 1)

;

Step 3. Set

X_{k} : = {x \in X : {\overset{ˇ}{f}}_{k} (x) \leq ℓ_{k}}

and generate the new iterate by solving (8).

2.2. The Accelerated Prox-Level (APL) Method and Its Gap Reduction Procedure $G_{M A P L}$

In this subsection we consider the CP problem (7), where f satisfies the following inequality

f (y) - f (x) - 〈f^{'} (x), y - x〉 \leq \frac{M}{1 + ρ} {∥ y - x ∥}^{1 + ρ}, \forall x, y \in X,

(9)

for some

M > 0, ρ \in [0, 1]

and

f^{'} (x) \in \partial f (x) .

It is easy to show that non-smooth

(ρ = 0)

, smooth

(ρ = 1)

and weakly smooth

(ρ \in (0, 1))

CP problems are contained in this family of problems. Lan [24] generalized the BL method to accelerated bundle level (ABL) and accelerated prox-level (APL) methods, such that they achieve uniformly optimal complexity bounds for functions satisfying (9). Here, we mainly introduce the APL method and its principle.

Firstly, Lan [24] introduced three related iteration sequences,

{x_{k}^{l}}, {x_{k}^{u}}

and

{x_{k}}

to establish the cutting-plane approximation

{\overset{ˇ}{f}}_{k} (x)

, generate the upper bounds

{\bar{f}}_{k}

, and update prox-center

x_{k}

, respectively. Specifically,

{x_{k}^{l}}

and

{x_{k}^{u}}

are updated by

x_{k}^{l} = (1 - α_{k}) x_{k}^{u} + α_{k} x_{k - 1}

and

x_{k}^{u} = (1 - α_{k}) x_{k - 1}^{u} + α_{k} x_{k}

. Secondly, Lan introduced an internal procedure

G_{A P L}

to reduce the gap between the upper and lower bounds of

f^{*}

. The algorithmic framework of the APL method is as follows (Algorithm 1).

Algorithm 1: The APL method.

Input: Choose

p_{0} \in X

, stopping tolerance

ε > 0

and parameters

λ, θ \in (0, 1)

.
Step 0: (Initialization) Set

p_{1} \in {Argmin}_{x \in X} h (p_{0}, x)

,

{lb}_{1} = h (p_{0}, p_{1})

and

{ub}_{1} = f (p_{1})

.
Let

s = 1

.
Step 1: (Stopping test) If

{ub}_{s} - {lb}_{s} \leq ε

, terminate.
Step 2: (Call $G_{A P L}$ procedure) Set

(p_{s + 1}, {lb}_{s + 1}) = G_{A P L} (p_{s}, {lb}_{s}, λ, θ)

and

{ub}_{s + 1} = f (p_{s + 1})

.
Step 3: (Loop) Set

s = s + 1

, and return to Step 1.

Next, we focus on describing the gap reduction procedure

G_{A P L}

. Denote the level set

X_{f} (ℓ) : = {x \in X : f (x) \leq ℓ}

. For a given iterate z, denote

\bar{f} : = min {h (z, x) : x \in X_{f} (ℓ)} .

(10)

Then, it can be verified that

min {ℓ, \bar{f}} \leq f (x), \forall x \in X .

This implies that

min {ℓ, \bar{f}}

a lower bound of

f^{*}

. However, the problem (10) is difficult to solve in general. To overcome the difficulty and obtain the lower bound in a convenient way, Lan used a compact and convex set

X^{'}

to replace

X_{f} (ℓ)

in problem (10). Thus, one can solve the relaxation of (10)

\underset{̲}{f} : = min {h (z, x) : x \in X^{'}} .

Here, the set

X^{'}

is called the localizer of the level set

X_{f} (ℓ)

satisfying

X_{f} (ℓ) \subseteq X^{'} \subseteq X

. Then we obtain a lower bound on

f^{*}

as follow

min {ℓ, \underset{̲}{f}} \leq f (x), \forall x \in X .

(11)

Indeed, as shown in [24], if

X^{'} \neq \emptyset

, then

\underset{̲}{f} \leq min {h (z, x) : x \in X_{f} (ℓ)} \leq f (x),

for all

x \in X

. If

X^{'} = \emptyset

, we have that

\underset{̲}{f} = + \infty

and

X_{f} (ℓ) = \emptyset

, and therefore

f (x) \geq ℓ

for all

x \in X

. Thus, (11) holds.

Moreover, in order to make better use of the structure of the feasible set X, similar to the NERML algorithm in [16,17], Lan introduced the prox-function to replace the Euclidean distance function

{∥ \cdot ∥}^{2}

. Here, a function

ω : X \to R

is served as a prox-function of a convex compact set

X \subseteq R^{n}

with coefficient

σ_{ω}

, if it is a function with differentiability, strong convexity as well as coefficient

σ_{ω}

i.e.

〈 \nabla ω (x) - \nabla ω (z), x - z 〉 \geq σ_{ω} {∥ x - z ∥}^{2}, \forall x, z \in X .

(12)

Furthermore, one can redefine the diameter of X with

d_{ω}

by

D_{ω, X}^{2} : = max {ω (x) - ω (z) - 〈 \nabla ω (z), x - z 〉, x, z \in X} .

(13)

This lead to the following relation

{∥ x - z ∥}^{2} \leq (2 / σ_{ω}) D_{ω, X}^{2} = : Ω_{ω, X}, \forall x, z \in X .

(14)

Furthermore, let

d_{ω} (x) : = ω (x) - [ω (x_{0}) + 〈 ω^{^{'}} (x_{0}), x - x_{0} 〉], \forall x \in X,

(15)

be the prox-function on X and

x_{0} : = {arg min}_{x \in X} d_{ω} (x)

be the prox-center of

d_{ω} (x)

. It follows that

d_{ω} (x) \geq (1 / 2) σ_{ω} {∥ x - x_{0} ∥}^{2},

(16)

and

d_{ω} (x)

is a strong convex function with coefficient

σ_{ω}

.

The internal gap reduction procedure

G_{A P L}

of APL method is as follows.

The APL gap reduction procedure:

(p^{+}, {lb}^{+}) = G_{A P L} (p, lb, λ, θ)

Step 0: (Initialization) Set $x_{0}^{u} = p, {\bar{f}}_{0} = f (x_{0}^{u}), {\underset{̲}{f}}_{0} = lb$ and $ℓ = λ {\underset{̲}{f}}_{0} + (1 - λ) {\bar{f}}_{0}$ . In addition, choose $x_{0} \in X$ and the initial localizer $X_{0}^{'} = X$ . The prox-function $d_{ω} (x)$ is defined in (15). Also let $k = 1$ .
Step 1: (Update lower bound) Let $x_{k}^{l} = α_{k} x_{k - 1} + (1 - α_{k}) x_{k - 1}^{u}$ , and $h (x_{k}^{l}, x) = f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), x - x_{k}^{l} 〉$ .

${\underset{̲}{h}}_{k} : = min_{x \in X_{k - 1}^{'}} \{h (x_{k}^{l}, x)\}, and {\underset{̲}{f}}_{k} : = max \{{\underset{̲}{f}}_{k - 1}, min \{ℓ, {\underset{̲}{h}}_{k}\}\} .$

If ${\underset{̲}{f}}_{k} \geq ℓ - θ (ℓ - {\underset{̲}{f}}_{0})$ , then stop the procedure and output $p^{+} = x_{k - 1}^{u}$ and ${lb}^{+} = {\underset{̲}{f}}_{k}$ .
Step 2: (Update prox-center) Set

$x_{k} : = {argmin}_{x \in X_{k - 1}^{'}} \{d_{ω} (x) : h (x_{k}^{l}, x) \leq ℓ\} .$
Step 3: (Update upper bound) Set ${\bar{f}}_{k} = min {{\bar{f}}_{k - 1}, f (α_{k} x_{k} + (1 - α_{k}) x_{k - 1}^{u})}$ and $x_{k}^{u}$ such that $f (x_{k}^{u}) = {\bar{f}}_{k}$ . If ${\bar{f}}_{k} \leq ℓ + θ ({\bar{f}}_{0} - ℓ)$ , then stop the procedure and output $p^{+} = x_{k - 1}^{u}$ and ${lb}^{+} = {\underset{̲}{f}}_{k}$ .
Step 4: (Update localizer) Choose $X_{k}^{'}$ such that ${\underset{̲}{X}}_{k} \subseteq X_{k}^{'} \subseteq {\bar{X}}_{k}$ , where

${\underset{̲}{X}}_{k} : = \{x \in X_{k - 1}^{'} : h (x_{k}^{l}, x) \leq ℓ\} and {\bar{X}}_{k} = \{x \in X : 〈 \nabla d_{ω} (x_{k}), x - x_{k} 〉 \geq 0\} .$
Step 5: (Loop) Set $k = k + 1$ and return to Step 1.

3. The Modified Accelerated Prox-Level (MAPL) Method

In this section, we proposed the modified accelerated prox-level (MAPL) method which requires only one subproblem to be solved per iteration while achieving the uniformly optimal iteration complexity for solving the black-box CP problems. We first present the modified gap reduction procedure

G_{M A P L}

, which generates a new search point

p^{+}

and a new lower bound

{l b}^{+}

of

f^{*}

such that

f (p^{+}) - {l b}^{+} \leq q [f (p) - l b]

from the given input p and lb and some

q \in (0, 1) .

Here, the value of q depends on the parameters

λ, θ \in (0, 1)

.

The MAPL gap reduction procedure:

(p^{+}, {lb}^{+}) = G_{M A P L} (p, lb, λ, θ)

Step 0: (Initialization) Set $x_{0}^{u} = p, {\bar{f}}_{0} = f (x_{0}^{u}), {\underset{̲}{f}}_{0} = lb$ and $ℓ = λ {\underset{̲}{f}}_{0} + (1 - λ) {\bar{f}}_{0}$ . In addition, choose $x_{0} \in X$ and the initial localizer $X_{0}^{'} = X$ . The prox-function $d_{ω} (x)$ is defined in (15). Let $k = 1$ .
Step 1: (Update level set) Set

$x_{k}^{l} = α_{k} x_{k - 1} + (1 - α_{k}) x_{k - 1}^{u},$

(17)

$h (x_{k}^{l}, x) = f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), x - x_{k}^{l} 〉,$

(18)

${\underset{̲}{X}}_{k} = \{x \in X_{k - 1}^{'} : h (x_{k}^{l}, x) \leq ℓ\} .$

(19)
Step 2: (Update prox-center and lower bound) Set

$x_{k} = {argmin}_{x \in {\underset{̲}{X}}_{k}} d_{ω} (x) .$

(20)

If ${\underset{̲}{X}}_{k} = \emptyset$ , then stop the procedure and output $p^{+} = x_{k - 1}^{u}, {l b}^{+} = ℓ$ .
Step 3: (Update upper bound) Set

${\tilde{x}}_{k}^{u} = α_{k} x_{k} + (1 - α) x_{k - 1}^{u},$

(21)

$x_{k}^{u} = \{\begin{matrix} {\tilde{x}}_{k}^{u}, & i f f ({\tilde{x}}_{k}^{u}) < {\bar{f}}_{k - 1}, \\ x_{k - 1}^{u}, & o t h e r w i s e, \end{matrix}$

(22)

and ${\bar{f}}_{k} = f (x_{k}^{u})$ . If ${\bar{f}}_{k} \leq ℓ + θ ({\bar{f}}_{0} - ℓ)$ , then stop the procedure (there is a significant improvement on the upper bound) and output $p^{+} = x_{k}^{u}, {l b}^{+} = l b$ .
Step 4: (Update localizer) Choose an arbitrary $X_{k}^{'}$ such that ${\underset{̲}{X}}_{k} \subseteq X_{k}^{'} \subseteq {\bar{X}}_{k}$ , where

${\bar{X}}_{k} = \{x \in X : 〈 \nabla d_{ω} (x_{k}), x - x_{k} 〉 \geq 0\} .$

(23)
Step 5: (Loop) Set $k = k + 1$ and return to Step 1.

The following are a few remarks on the MAPL gap reduction procedure. Firstly, the upper bound

{\bar{f}}_{0} = f (x_{0}^{u})

and lower bound

{\underset{̲}{f}}_{0} = lb

of

f^{*}

in Step 0 are obtained from the outer iteration of the MAPL method (described below), and

{\bar{f}}_{0}

and

{\underset{̲}{f}}_{0}

are fixed throughout the entire progress of

G_{M A P L}

. Furthermore, the level parameter ℓ is also fixed as a convex combination of

{\bar{f}}_{0}

and

{\underset{̲}{f}}_{0}

. But in the original BL methods, the level parameter changes in each iteration. Secondly, Step 2 and Step 3 have two exits. When the procedure stops at Step 2, ℓ is the lower bound on

f^{*}

; when the procedure stops at Step 3, there is a significant progress on

{\bar{f}}_{k}

, which depends on the parameter

θ

. Compared with the gap reduction procedure

G_{A P L}

of APL method, the

G_{M A P L}

updates the lower bound in a easier way. Indeed, in Step 1 of

G_{A P L}

see [24], in order to determine whether the lower bound lb needs to be updated, a linear programming subproblem is first solved. However, in the

G_{M A P L}

procedure, the prox-center

x_{k}

and the lower bound lb are merged into one subproblem to be updated by selecting appropriate prox-function

d_{ω}

. While solving the subproblem (20), it can be automatically checked that whether

{\underset{̲}{X}}_{k} = \emptyset

. If

{\underset{̲}{X}}_{k} = \emptyset

, the lower bound lb is directly updated to ℓ, which avoids solving the linear programming as

G_{A P L}

and reduce the number of calculation. Thirdly, in Step 4,

X_{k}^{^{'}}

can be selected arbitrarily under condition

{\underset{̲}{X}}_{k} \subseteq X_{k}^{^{'}} \subseteq {\bar{X}}_{k}

. For convenience, one can directly choose

X_{k}^{^{'}} = {\underset{̲}{X}}_{k}

or

X_{k}^{^{'}} = {\bar{X}}_{k}

. However, the number of constraints in

{\underset{̲}{X}}_{k}

increases with k, and

{\bar{X}}_{k}

has only one more constraint than the feasible set X. In practice, we can choose

X_{k}^{^{'}}

between these two sets, which can control the number of constraints in (19) and reduce the computation cost. Moreover, a proper selection of the stepsize sequences

{α_{k}}

is critical for

G_{M A P L}

to terminate after finite iterations and achieve optimal iteration complexity. Lan proposed a generalized selection rule for the sequence

{α_{k}}

in [24]. Chen et al. [31] proposed a more concise selection rule, i.e., the sequence

{α_{k}}

satisfies the following conditions:

α_{1} = 1, 0 < α_{k} \leq 1, α_{k} \leq \frac{c}{k}, \frac{1 - α_{k + 1}}{α_{k + 1}^{1 + γ}} \leq \frac{1}{α_{k}^{1 + γ}}, \forall k \geq 1,

(24)

for any

γ \in [0, 1]

and some

c > 0

. Two examples for

{α_{k}}

are as follows [31]:

(1)

α_{k} = 2 / (k + 1), k = 1, 2, \dots,

with

c = 2

;

(2)

α_{1} = 1, α_{k + 1}^{2} = (1 - α_{k + 1}) α_{k}^{2}

, with

c = 2

.

The following lemma shows the important properties of procedure

G_{M A P L}

. These properties are similar to those of [24,31], whose proof can be found in Appendix A.

Lemma 1.

The following properties hold for the procedure

G_{M A P L}

.

a.

{\{X_{k}^{'}\}}_{k \geq 0}

is a collection of localizers for the level set

X_{f} (ℓ)

;

b.

{\bar{f}}_{0} \geq {\bar{f}}_{1} \geq \dots \geq {\bar{f}}_{k} \geq f^{*}

holds for any

k \geq 1

;

c.

X_{f} (ℓ) \neq \emptyset

, and hence

X_{f} (ℓ) \subseteq {\underset{̲}{X}}_{k} \subseteq X_{k}^{'} \subseteq {\bar{X}}_{k}

for any

k \geq 1

;

d. When

{\underset{̲}{X}}_{k} \neq \emptyset

holds, the problem (20) has a unique solution. In addition, if the procedure

G_{M A P L}

stops at Step 2, we have

ℓ \leq f^{*}

.

e. When the procedure

G_{M A P L}

stops, the relation

f (p^{+}) - {lb}^{+} \leq q (f (p) - lb)

holds, where

q \equiv q (λ, θ) = max \{λ, 1 - λ (1 - θ)\} .

(25)

Referring to the theoretical analysis mode of [24], the following proposition will show that the gap between the upper bound

f (x_{k}^{u})

and the level parameter ℓ will decrease with k, and it is proven that when the algorithm stops, the total number of iterations does not exceed an upper bound.

Proposition 1.

If

{α_{k}}

satisfy the condition (24), take

γ = ρ

, let ℓ be the level parameter and

d_{ω} (x)

denotes the prox-function, then the number of iterations of the procedure

G_{M A P L}

does not exceed

N (Δ_{0}) : = [{(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{λ θ (1 + ρ) Δ_{0}})}^{\frac{2}{1 + 3 ρ}}],

(26)

where

Δ_{0} = {\bar{f}}_{0} - {\underset{̲}{f}}_{0}

and

Ω_{ω, X}

is defined in (14).

Proof.

Assume that the gap reduction procedure

G_{M A P L}

does not stop at the K-th iteration. From (16) we have for

x_{1} \in X

\frac{σ_{ω}}{2} {∥x_{1} - x_{0}∥}^{2} \leq d_{ω} (x_{1}) .

By the definition of

{\bar{X}}_{k}

in (23) we have

〈\nabla d_{ω} (x_{k}), z - x_{k}〉 \geq 0,

for any

z \in {\bar{X}}_{k}

. This together with

X_{k}^{'} \subseteq {\bar{X}}_{k}

shows that

〈\nabla d_{ω} (x_{k}), z - x_{k}〉 \geq 0, \forall z \in X_{k}^{'} .

It follows from (19) and (20) that

x_{k + 1} \in {\underset{̲}{X}}_{k + 1} \subseteq X_{k}^{'}

, for all

1 \leq k \leq K

. Moreover, due to the strong convexity of

d_{ω} (x)

and the optimality condition of subproblem (20), we have

\begin{matrix} \frac{σ_{ω}}{2} {∥x_{k + 1} - x_{k}∥}^{2} & \leq d_{ω} (x_{k + 1}) - d_{ω} (x_{k}) - 〈\nabla d_{w} (x_{k}), x_{k + 1} - x_{k}〉 \\ \leq d_{ω} (x_{k + 1}) - d_{ω} (x_{k}), \forall k \geq 1 . \end{matrix}

It turns out that

\frac{σ_{ω}}{2} {∥x_{k} - x_{k - 1}∥}^{2} \leq d_{ω} (x_{k + 1}) - d_{ω} (x_{k}), \forall k \geq 1 .

Summing up the above inequalities over k, we obtain that

\frac{σ_{ω}}{2} \sum_{k = 1}^{K} {∥x_{k + 1} - x_{k}∥}^{2} \leq d_{ω} (x_{K}) .

(27)

According to (22), (9) and (18), we have

\begin{matrix} f (x_{k}^{u}) \leq f ({\tilde{x}}_{k}^{u}) & \leq f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), {\tilde{x}}_{k}^{u} - x_{k}^{l} 〉 + \frac{M}{1 + ρ} {∥ {\tilde{x}}_{k}^{u} - x_{k}^{l} ∥}^{1 + ρ} \\ = h (x_{k}^{l}, {\tilde{x}}_{k}^{u}) + \frac{M}{1 + ρ} {∥ {\tilde{x}}_{k}^{u} - x_{k}^{l} ∥}^{1 + ρ} . \end{matrix}

From (18), (21), the convexity of f, (19) and (22), we have

\begin{matrix} h (x_{k}^{l}, {\tilde{x}}_{k}^{u}) & = f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), (1 - α_{k}) x_{k - 1}^{u} + α_{k} x_{k} - x_{k}^{l} 〉 \\ = (1 - α_{k}) [f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), x_{k - 1}^{u} - x_{k}^{l} 〉] + α_{k} [f (x_{k}^{l}) + 〈 f^{'} (x_{k}^{l}), x_{k} - x_{k}^{l} 〉] \\ = (1 - α_{k}) h (x_{k}^{l}, x_{k - 1}^{u}) + α_{k} h (x_{k}^{l}, x_{k}) \\ \leq (1 - α_{k}) f (x_{k - 1}^{u}) + α_{k} h (x_{k}^{l}, x_{k}) \\ \leq (1 - α_{k}) f (x_{k - 1}^{u}) + α_{k} ℓ . \end{matrix}

By the definition of

{\tilde{x}}_{k}^{u}

(21) and

x_{k}^{l}

(17), it is easy to show that

\begin{matrix} {\tilde{x}}_{k}^{u} - x_{k}^{l} & = α_{k} x_{k} + (1 - α_{k}) x_{k - 1}^{u} - α_{k} x_{k - 1} - (1 - α_{k}) x_{k - 1}^{u} \\ = α_{k} x_{k} - α_{k} x_{k - 1} = α_{k} (x_{k} - x_{k - 1}) . \end{matrix}

It then follows that

f (x_{k}^{u}) \leq (1 - α_{k}) f (x_{k - 1}^{u}) + α_{k} ℓ + \frac{M α_{k}^{1 + ρ}}{1 + ρ} {∥ x_{k} - x_{k - 1} ∥}^{1 + ρ} .

Subtracting both sides of the above inequality by ℓ and dividing both sides by

α_{k}^{1 + ρ}

, from (24), we obtain that

\begin{matrix} \frac{1}{α_{k}^{1 + ρ}} [f (x_{k}^{u}) - ℓ] & \leq \frac{1 - α_{k}}{α_{k}^{1 + ρ}} [f (x_{k - 1}^{u}) - ℓ] + \frac{M}{1 + ρ} {∥ x_{k} - x_{k - 1} ∥}^{1 + ρ} \\ \leq \frac{1}{α_{k - 1}^{1 + ρ}} [f (x_{k - 1}^{u}) - ℓ] + \frac{M}{1 + ρ} {∥x_{k} - x_{k - 1}∥}^{1 + ρ}, \forall k \geq 2 . \end{matrix}

As a result,

\frac{1}{α_{1}^{1 + ρ}} [f (x_{1}^{u}) - ℓ] \leq \frac{M}{1 + ρ} {∥x_{1} - x_{0}∥}^{1 + ρ},

\frac{1}{α_{2}^{1 + ρ}} [f (x_{2}^{u}) - ℓ] \leq \frac{1}{α_{1}^{1 + ρ}} [f (x_{1}^{u}) - ℓ] + \frac{M}{1 + ρ} {∥x_{2} - x_{1}∥}^{1 + ρ},

\dots

\frac{1}{α_{K}^{1 + ρ}} [f (x_{K}^{u}) - ℓ] \leq \frac{1}{α_{K - 1}^{1 + ρ}} [f (x_{K - 1}^{u}) - ℓ] + \frac{M}{1 + ρ} {∥x_{K} - x_{K - 1}∥}^{1 + ρ} .

Summing up the above inequality over m and from the fact that

α_{k} > 0, \forall k \geq 1,

we have

f (x_{K}^{u}) - ℓ \leq \frac{α_{K}^{1 + ρ} M}{1 + ρ} \sum_{k = 1}^{K} {∥x_{k} - x_{k - 1}∥}^{1 + ρ} .

(28)

Applying the H

\ddot{o}

lder inequality to the above inequality and using (27), we have

\begin{matrix} \sum_{k = 1}^{K} {∥x_{k} - x_{k - 1}∥}^{1 + ρ} & \leq K^{\frac{1 - ρ}{2}} {(\sum_{k = 1}^{K} {∥x_{k} - x_{k - 1}∥}^{2})}^{\frac{1 + ρ}{2}} \\ \leq K^{\frac{1 - ρ}{2}} {(\frac{2 d_{ω} (x_{K})}{σ_{ω}})}^{\frac{1 + ρ}{2}} \leq K^{\frac{1 - ρ}{2}} {(\frac{2 D_{ω, X}}{σ_{ω}})}^{\frac{1 + ρ}{2}} . \end{matrix}

Applying (24) and (14) to (28), as well as the above inequality, we have

f (x_{K}^{u}) - ℓ \leq \frac{c^{1 + ρ} M}{K^{1 + ρ} (1 + ρ)} K^{\frac{1 - ρ}{2}} {(\frac{2 D_{ω, X}}{σ_{ω}})}^{\frac{1 + ρ}{2}} \leq \frac{c^{1 + ρ} M}{1 + ρ} Ω_{ω, X}^{\frac{1 + ρ}{2}} K^{- \frac{1 + ρ}{2}} .

(29)

From the fact that

G_{M A P L}

does not stop at Step 3, we have

{\bar{f}}_{k} - ℓ = f (x_{K}^{u}) - ℓ > θ ({\bar{f}}_{0} - ℓ) = θ λ ({\bar{f}}_{0} - {\underset{̲}{f}}_{0}) = θ λ (ub - lb) = λ θ Δ_{0} .

This together with (29) shows that

λ θ Δ_{0} \leq \frac{M c^{1 + ρ}}{1 + ρ} Ω_{ω, X}^{\frac{1 + ρ}{2}} K^{- \frac{1 + 3 ρ}{2}} .

We finally conclude that

K < {(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{λ θ (1 + ρ) Δ_{0}})}^{\frac{2}{1 + 3 ρ}},

that is

N (Δ_{0}) = [{(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{λ θ (1 + ρ) Δ_{0}})}^{\frac{2}{1 + 3 ρ}}] .

□

After giving the relevant properties and complexity analysis of the procedure

G_{M A P L}

, we are now in a position to introduce the so-called modified accelerated prox-level (MAPL) method, which repeatedly calls the procedure

G_{M A P L}

during iteration progress until it finds an approximate solution with the given accuracy. The algorithmic framework of MAPL method is as follows (Algorithm 2).

Algorithm 2: The MAPL method.

Parameters: Choose stopping tolerance

ε > 0

and parameters

λ, θ \in (0, 1)

.
Step 0: (Initialization) Choose initial point

p_{0}

, set

p_{1} \in {arg min}_{x \in X} h (p_{0}, x)

,

{lb}_{1} = h (p_{0}, p_{1})

and

{ub}_{1} = min {f (p_{0}), f (p_{1}))}

. Let

s = 1

;
Step 1: (Stopping test) If

{ub}_{s} - {lb}_{s} \leq ε

, terminate;
Step 2: (Call $G_{M A P L}$ procedure) Set

(p_{s + 1}, {lb}_{s + 1}) = G_{M A P L} (p_{s}, {lb}_{s}, λ, θ)

and

{ub}_{s + 1} = f (p_{s + 1})

;
Step 3: (Loop) Set

s = s + 1

, and return to Step 1.

Since the procedure

G_{M A P L}

is called during the progress of the MAPL method, we consider that an iteration of

G_{M A P L}

is also an iteration of the MAPL method. Take this fact into consideration, the following theorem establishes the convergence and iteration complexity of the MAPL method. The principle of the proof comes from reference [24].

Theorem 1.

For given

ε > 0

, if

α_{k} \in (0, 1], k = 1, 2, \dots

, in

G_{M A P L}

are chosen to satisfy condition (24) with some

c > 0

, then

(1) The number of procedure

G_{M A P L}

called by the MAPL method can be bounded by

S (ε) : = [max \{0, {log}_{\frac{1}{q}} \frac{M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) ε}\}];

(2) The total number of iterations performed by the MAPL method does not exceed

S (ε) + \frac{1}{1 - q^{\frac{2}{1 + 3 ρ}}} {(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) λ θ ε})}^{\frac{2}{1 + 3 ρ}} .

Proof.

(1) Without loss of generality, we suppose that

Δ_{1} > ε,

where

Δ_{s} = {ub}_{s} - {lb}_{s}, s \geq 1 .

According to Step 1 of the MAPL method, (9) and (14), we have

Δ_{1} = f (p_{1}) - h (p_{0}, p_{1}) = f (p_{1}) - [f (p_{0}) + 〈 f^{'} (p_{0}), p_{1} - p_{0} 〉] \leq \frac{M ∥ p_{1} - p_{0} ∥^{1 + ρ}}{1 + ρ} \leq \frac{M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{1 + ρ} .

From Lemma 1 and the fact that

u b_{s} = f (p_{s})

, we have

Δ_{s + 1} \leq q Δ_{s}, s \geq 1 .

(30)

Furthermore, suppose that the MAPL method finds an

ε

-solution after calling

S (ε)

times

G_{M A P L}

, i.e., the MAPL method stops at

s : = S (ε) + 1

. Then we have

Δ_{S (ε) + 1} \leq ε \leq Δ_{S (ε)},

(31)

and

ε < q^{S (ε) - 1} Δ_{1} \leq q^{S (ε) - 1} M Ω_{ω, X}^{\frac{1 + ρ}{2}} .

It is easy to obtain

{(\frac{1}{q})}^{S (ε) - 1} \leq \frac{M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) ε},

that is

S (ε) : = [max \{0, {log}_{\frac{1}{q}} \frac{M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) ε}\}] .

(2) Suppose that the procedure

G_{M A P L}

has been called

S (ε)

times in MAPL method. Then by (30) and (31) we have

Δ_{s} > ε q^{s - S (ε)}, s = 1, 2, \dots, S (ε) .

Due to

Δ_{S (ε)} > ε

, we know that

\sum_{s = 1}^{S (ε)} Δ_{s}^{- \frac{2}{1 + 3 ρ}} < \sum_{s = 1}^{S (ε)} \frac{q^{\frac{2}{1 + 3 ρ} (s - S (ε))}}{ε^{\frac{2}{1 + 3 ρ}}} = \sum_{t = 0}^{S (ε) - 1} \frac{q^{\frac{2}{1 + 3 ρ} t}}{ε^{\frac{2}{1 + 3 ρ}}} \leq \frac{1}{(1 - q^{\frac{2}{1 + 3 ρ}}) ε^{\frac{2}{1 + 3 ρ}}} .

This together with Lemma 1 and Proposition 1 shows that the total number of iterations performed by the MAPL method does not exceed

\sum_{s = 1}^{S (ε)} N_{s} \leq \sum_{s = 1}^{S (ε)} N (Δ_{s}) \leq S (ε) + \sum_{s = 1}^{S (ε)} {(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) λ θ Δ_{s}})}^{\frac{2}{1 + 3 ρ}} \leq S (ε) + \frac{1}{1 - q^{\frac{2}{1 + 3 ρ}}} {(\frac{c^{1 + ρ} M Ω_{ω, X}^{\frac{1 + ρ}{2}}}{(1 + ρ) λ θ ε})}^{\frac{2}{1 + 3 ρ}} .

Here, we denote the number of the internal iterations performed by the s-th procedure

G_{M A P L}

with

N_{s}

. □

We present a few remarks on the iteration complexity of MAPL method.

Remark 1.

According to the classic complexity theory [32] for CP problem (7), the number of iterations to find an ε-solution i.e., an approximate solution

\bar{x} \in X

such that

f (\bar{x}) - f^{*} \leq ε

, does not exceed

O (1 / ε^{2})

, if f is a general non-smooth Lipschitz continuous convex function. For smooth convex optimization, the optimal iteration complexity bound is

O (1 / ε^{\frac{1}{2}})

. Furthermore, in case that f is weakly smooth and its gradient is H

\ddot{o}

lder continuous, the optimal iteration complexity is bounded by

O (1 / ε^{\frac{2}{1 + 3 ρ}})

for some

ρ \in (0, 1)

. It follows from Theorem 1 that the iteration complexity bound of the MAPL method is

O \{{(\frac{M D_{X}^{1 + ρ}}{ε})}^{\frac{2}{1 + 3 ρ}}\} .

In other words, the MAPL method can achieve uniformly optimal iteration complexity bounds for solving non-smooth (

ρ = 0

), smooth (

ρ = 1

) and weakly smooth (

ρ \in (0, 1)

) CP problems.

4. The Modified Uniform Smoothing Level (MUSL) Method

In this section we consider the objective function f in (7) with the form of

f (x) : = \hat{f} + F (x),

(32)

where

\hat{f} : X \to R

is Lipschitz continuous and simple. In addition,

F (x)

has a special structure that

F (x) : = max_{y \in Y} {〈 A x, y 〉 - \hat{g} (y)} .

(33)

Here, the compact convex set

Y \subset R^{n}

is nonempty,

A : R^{n} \to R^{m}

is a linear operator and

\hat{g} : Y \to R

is convex and continuous on Y.

Generally, the function F is convex and non-smooth. In this case, F can be approximated by constructing a series of smooth convex functions [16,17]. Let

v (y)

denote the prox-function of compact convex set Y with coefficient

σ_{v}

.

c_{v} : = arg {min}_{y \in Y} v (y)

refers to the prox-center of

v (y)

. Let

V (y) : = v (y) - v (c_{v}) - 〈 ▿ v (c_{v}), y - c_{v} 〉 .

The functions

F (x)

and

f (x)

are approximated by

F_{η} (x)

and

f_{η} (x)

(with some smoothing parameter

η > 0

), respectively, i.e.,

F_{η} (x) : = max_{y \in Y} {〈 A x, y 〉 - \hat{g} (y) - η V (y)},

f_{η} (x) : = \hat{f} + F_{η} (x) .

(34)

As described in [17], from the first-order optimality conditions, the convexity of

\hat{g} (y)

and the strong convexity of

V (y)

, we know that the gradient of

F_{η} (x)

is Lipschitz continuous with Lipschitz constant

L_{η} \equiv L (F_{η}) : = \frac{{∥ A ∥}^{2}}{η σ_{v}} .

(35)

Furthermore, we have

F_{η} \leq F (x) \leq F_{η} + η D_{v, Y},

(36)

and

f_{η} \leq f (x) \leq f_{η} + η D_{v, Y},

(37)

for any

x \in X

.

Inspired by the smoothing technique of [16,17], Lan [24] proposed the uniform smoothing level (USL) method for solving structure non-smooth CP problem (7) with f being defined by (32). The advantage of the USL method is that the parameter

η

and estimation of

D_{v, Y}

can be automatically adjusted and obtained during the gap reduction procedure

G_{U S L}

, which makes the USL method free of parameters. However, similar to the APL method, each iteration of the USL method also involves two subproblems. Based on the USL method, combining with the analysis of the MAPL method in the previous section, we propose the modified uniform smoothing level (MUSL) method. Same as the USL method, the MUSL method can achieve the optimal iteration complexity when solving problem (7) with (32), but only one subproblem is required to solve in each iteration.

We next describe the gap reduction procedure

G_{M U S L}

of MUSL method. As the internal procedure of MUSL method,

G_{M U S L}

is called to compress the gap between upper and lower bounds of

f^{*}

.

The MUSL gap reduction procedure:

(p^{+}, {lb}^{+}, {\tilde{D}}^{+}) = G_{M U S L} (p, lb, \tilde{D}, λ, θ)

Step 0: (Initialization) Set ${\begin{matrix} e r r o r t y p e T i t l e x \end{matrix}}_{0}^{u} : = p, {\bar{f}}_{0} : = f ({\begin{matrix} e r r o r t y p e T i t l e x \end{matrix}}_{0}^{u}), {\underset{̲}{f}}_{0} : = lb$ and $ℓ : = λ {\underset{̲}{f}}_{0} + (1 - λ) {\bar{f}}_{0}, x_{0}^{u} : = {\begin{matrix} e r r o r t y p e T i t l e x \end{matrix}}_{0}^{u}$ . Let

$η : = θ ({\bar{f}}_{0} - ℓ) / (2 \tilde{D}) .$

(38)

Choose $x_{0} \in X$ and the initial localizer $X_{0}^{'} = X$ . The prox-function $d_{ω} (x)$ is defined in (15). Let $k = 1$ .
Step 1: (Update level set) Let

$x_{k}^{l} : = (1 - α_{k}) {\tilde{x}}_{k - 1}^{u} + α_{k} x_{k - 1},$

(39)

$h_{η} (x_{k}^{l}, x) : = \hat{f} (x) + F_{η} (x_{k}^{l}) + 〈 \nabla F_{η} (x_{k}^{l}), x - x_{k}^{l} 〉,$

(40)

${\underset{̲}{X}}_{k} = \{x \in X_{k - 1}^{'} : h (x_{k}^{l}, x) \leq ℓ\} .$

(41)
Step 2: (Update prox-center and lower bound) Let

$x_{k} = {argmin}_{x \in {\underset{̲}{X}}_{k}} d_{ω} (x) .$

(42)

If ${\underset{̲}{X}}_{k} = \emptyset$ , then stop the procedure and output $p^{+} = x_{k - 1}^{u}, {l b}^{+} = ℓ, {\tilde{D}}^{+} = \tilde{D}$ .
Step 3: (Update upper bound) Let

${\tilde{x}}_{k}^{u} = α_{k} x_{k} + (1 - α_{k}) {\tilde{x}}_{k - 1}^{u},$

(43)

$x_{k}^{u} = \{\begin{matrix} {\tilde{x}}_{k}^{u}, & i f f ({\tilde{x}}_{k}^{u}) < {\bar{f}}_{k - 1}, \\ x_{k - 1}^{u}, & o t h e r w i s e, \end{matrix}$

and ${\bar{f}}_{k} : = f (x_{k}^{u})$ . Check the following two possible stopping rules:
- (3a) if ${\bar{f}}_{k} \leq ℓ + θ ({\bar{f}}_{0} - ℓ)$ , stop the procedure with $p^{+} = x_{k}^{u}, {lb}^{+} = lb, {\tilde{D}}^{+} = \tilde{D}$ .
- (3b) Otherwise, if $f_{η} ({\tilde{x}}_{k}^{u}) \leq ℓ + \frac{θ}{2} ({\bar{f}}_{0} - ℓ)$ , stop the procedure with $p^{+} = x_{k}^{u}, 1 b^{+} = lb, {\tilde{D}}^{+} = 2 \tilde{D} .$
Step 4: (Update localizer) Choose an arbitrary $X_{k}^{'}$ such that ${\underset{̲}{X}}_{k} \subseteq X_{k}^{'} \subseteq {\bar{X}}_{k}$ , where

${\bar{X}}_{k} = \{x \in X : 〈 \nabla d_{ω} (x_{k}), x - x_{k} 〉 \geq 0\} .$

(44)
Step 5: (Loop) Set $k = k + 1$ and return to Step 1.

The following are a few remarks to the procedure

G_{M U S L}

including the differences from

G_{M A P L}

and some properties. Firstly, compared with

G_{M A P L}

, the procedure

G_{M U S L}

needs to input one more parameter

\tilde{D}

which is used to calculate the smoothing parameter

η

, so that the approximate function

f_{η}

of f can be defined. Secondly, these two procedures approximate the objective function in different ways.

G_{M A P L}

approximates the objective function f by the linearization of f (18), while

G_{M U S L}

only linearizes the function

F_{η}

and approximates the function f with

h_{η}

(40). According to (40), convexity of

F_{η}

, (36) and (32), we have

h_{η} (x_{k}^{l}, x) \leq \hat{f} (x) + F_{η} (x) \leq \hat{f} (x) + F (x) = f (x), \forall x \in X,

(45)

which means that the function

h_{η} (x_{k}^{l}, x)

is the lower estimate of

f (x)

. Thirdly, the procedure

G_{M U S L}

has one more exit than

G_{M A P L}

with three possible exits: Step 2, Step 3a, and Step 3b. If the procedure stops at Step 2, the lower bound lb is updated to ℓ; if it stops at Step 3a, we say that a significant improvement has been made on the upper bound of

f^{*}

and update the upper bound

{\bar{f}}_{k}

. In addition, if the procedure stops at Step 3b, it is considered that there is no significant improvement on the upper bound of

f^{*}

, thus the parameter

\tilde{D}

needs to be adjusted to estimate

D_{v, Y}

and we update

\tilde{D}

to

2 \tilde{D}

.

Referring to the work of [24,31], the following lemma gives some simple observations of procedure

G_{M U S L}

. Its proof can be found in Appendix A.

Lemma 2.

The following results hold for internal procedure

G_{M U S L}

:

a. If

G_{M U S L}

terminates at Step 2 or Step 3a, we have

f (p^{+}) - {lb}^{+} \leq q (f (p) - lb)

. where

q \equiv q (λ, θ) = max {λ, 1 - λ (1 - θ)}

;

b. If

G_{M U S L}

terminates at Step 3b, we have

\tilde{D} < D_{v, Y}

and

{\tilde{D}}^{+} < 2 D_{v, Y}

.

Similar to the previous section, it is time to build the convergence results for procedure

G_{M U S L}

. Referring to the theoretical analysis mode of [24], the detailed derivations of the iteration complexity are given as follows.

Proposition 2.

Let

{α_{k}}

satisfy (24) and take

γ = 1

. Let

(x_{k}^{l}, x_{k}, x_{k}^{u}) \in X \times X \times X, k \geq 1

be the search points, ℓ be the level parameter,

d_{ω} (\cdot)

be the prox-function and η be the smoothing parameter. Then the number of iterations of internal procedure

G_{M U S L}

can be bounded by,

N (Δ_{0}, \tilde{D}) : = [\frac{2 c ∥ A ∥}{λ θ Δ_{0}} \sqrt{\frac{D_{ω, X} \tilde{D}}{σ_{ω} σ_{v}}}],

(46)

where

Δ_{0} = ub - 1 b = {\bar{f}}_{0} - {\underset{̲}{f}}_{0}, D_{ω, X}

is defined in (14).

Proof.

Suppose the gap reduction procedure

G_{M U S L}

does not stop at the K-th iteration. From (34) and inequality (45), we have

h_{η} (z, x) \leq f_{η} (x), \forall z, x \in X

. Because of (34), (40) and the Lipschitz continuity of the subgradient of

F_{η}

, we have

f_{η} (x) - h_{η} (z, x) = F_{η} (x) - [F_{η} (z) + 〈 \nabla F_{η} (z), x - z 〉] \leq \frac{L_{η}}{2} {∥ x - z ∥}^{2}, \forall z, x \in X .

Hence,

f_{η} (x) \leq h_{η} + \frac{L_{η}}{2} {∥ x - z ∥}^{2}, \forall x, z \in X .

From (40), the definition of

{\tilde{x}}_{k}^{u}

(43), the convexity of

\hat{f}

, (45) and (44), we have for all

k \geq 1

,

\begin{matrix} h_{η} (x_{k}^{l}, {\tilde{x}}_{k}^{u}) = & \hat{f} ({\tilde{x}}_{k}^{u}) + F_{η} (x_{k}^{l}) + 〈 F_{η}^{'} (x_{k}^{l}), {\tilde{x}}_{k}^{u} - x_{k}^{l} 〉 \\ = & \hat{f} (α_{k} x_{k} + (1 - α_{k}) {\tilde{x}}_{k - 1}^{u}) + F_{η} (x_{k}^{l}) + 〈 F_{η}^{'} (x_{k}^{l}), α_{k} x_{k} + (1 - α_{k}) {\tilde{x}}_{k - 1}^{u} - x_{k}^{l} 〉 \\ \leq & (1 - α_{k}) h_{η} (x_{k}^{l}, {\tilde{x}}_{k - 1}^{u}) + α_{k} h_{η} (x_{k}^{l}, x_{k}) \\ \leq & (1 - α_{k}) f_{η} ({\tilde{x}}_{k - 1}^{u}) + α_{k} h_{η} (x_{k}^{l}, x_{k}) \\ \leq & (1 - α_{k}) f_{η} ({\tilde{x}}_{k - 1}^{u}) + α_{k} ℓ . \end{matrix}

Moreover, by (39) and (43) we can easily obtain

{\tilde{x}}_{k}^{u} - x_{k}^{l} = α_{k} x_{k} + (1 - α_{k}) {\tilde{x}}_{k - 1}^{u} - α_{k} x_{k - 1} - (1 - α_{k}) {\tilde{x}}_{k - 1}^{u} = α_{k} (x_{k} - x_{k - 1}) .

It then follows that

f_{η} ({\tilde{x}}_{k}^{u}) \leq (1 - α_{k}) f_{η} ({\tilde{x}}_{k - 1}^{u}) + α_{k} ℓ + \frac{L_{η} α_{k}^{2}}{2} {∥ x_{k} - x_{k - 1} ∥}^{2} .

Subtracting both sides of the above inequality by ℓ, then dividing both sides by

α_{k}^{2}

, we have

\frac{f_{η} ({\tilde{x}}_{k}^{u}) - ℓ}{α_{k}^{2}} \leq \frac{1 - α_{k}}{α_{k}^{2}} [f_{η} ({\tilde{x}}_{k - 1}^{u}) - ℓ] + \frac{L_{η}}{2} {∥ x_{k} - x_{k - 1} ∥}^{2} .

Summing up the above inequalities over k, together with (24) and H

\ddot{o}

lder inequality, we obtain

f_{η} ({\tilde{x}}_{k}^{u}) - ℓ \leq \frac{L_{η} α_{k}^{2}}{2} \sum_{k = 1}^{K} {∥ x_{k} - x_{k - 1} ∥}^{2} \leq \frac{c^{2} L_{η} d_{ω} (x_{K})}{σ_{ω} K^{2}} .

Thus, from the relation

d_{ω} (x_{K}) \leq D_{ω, X},

(35) and (38), we have

f_{η} ({\tilde{x}}_{K}^{u}) - ℓ \leq \frac{c^{2} L_{η} D_{ω, X}}{σ_{ω} K^{2}} = \frac{c^{2} {∥ A ∥}^{2} D_{ω, X}}{η σ_{ω} σ_{v} K^{2}} = \frac{2 c^{2} {∥ A ∥}^{2} D_{ω, X} \tilde{D}}{θ ({\bar{f}}_{0} - ℓ) σ_{ω} σ_{v} K^{2}} .

From the fact that the procedure does not stop at Step 3b at the K-th iteration, we obtain

f_{η} ({\tilde{x}}_{K}^{u}) - ℓ > θ ({\bar{f}}_{0} - ℓ) / 2 .

Noticing that

Δ_{0} = ({\bar{f}}_{0} - {\underset{̲}{f}}_{0}) = ({\bar{f}}_{0} - ℓ) / λ

, thus

f_{η} ({\tilde{x}}_{K}^{u}) - ℓ > \frac{λ θ Δ_{0}}{2} .

In conclusion, we obtain

K < \frac{2 c ∥ A ∥}{λ θ Δ_{0}} \sqrt{\frac{D_{ω, X} \tilde{D}}{σ_{ω} σ_{v}}} .

□

Based on the results of the convergence of the procedure

G_{M U S L}

above, we next give the algorithm of the MUSL method. Similar to the MAPL method, the MUSL method is also implemented with outernal algorithm framework and internal gap reduction procedure

G_{M U S L}

. The outernal algorithm of the MUSL method is mainly to determine whether the gap between the upper and lower bounds on

f^{*}

has reached the given tolerance

ε

in current iteration k. If the given tolerance

ε

is reached, the algorithm terminates and output an approximate optimal solution of f, otherwise the outernal algorithm call the internal procedure

G_{M U S L}

continuously to compress the gap between the upper and lower bounds. The algorithmic framework of the MUSL method is as follows (Algorithm 3).

Algorithm 3: The MUSL method.

Step 0: (Input) Choose

p_{0} \in X

, stopping tolerance

ε > 0

, initial estimate

{\tilde{D}}_{1} \in (0, D_{v}]

and parameters

λ, θ \in (0, 1)

.
Step 1: (Initialization) Set

\begin{matrix} p_{1} \in {Argmin}_{x \in X} \{h_{0} (p_{0}, x) : = \hat{f} (x) + F (p_{0}) + 〈 F^{'} (p_{0}), x - p_{0} 〉\}, \end{matrix}

(47)

{lb}_{1} = h (p_{0}, p_{1})

and

{ub}_{1} = min {f (p_{1}), f ({\tilde{p}}_{1})}

. Let

s = 1

;
Step 2: (Stopping test) If

{ub}_{s} - {lb}_{s} \leq ε

, terminate;
Step 3: (Call $G_{M U S L}$ procedure) Set

(p_{s + 1}, {lb}_{s + 1}, {\tilde{D}}_{s + 1}) = G_{M U S L} (p_{s}, {lb}_{s}, {\tilde{D}}_{s}, λ, θ)

and

{ub}_{s + 1} = f (p_{s + 1})

;
Step 4: (Loop) Set

s = s + 1

, and return to Step 2

We now turn to analyze the optimal complexity bound for the MUSL method. Please note that the following results are modifications of those in [24].

Lemma 3.

Suppose that F is defined in (33) and v is a prox-function of Y with strong convex modulus

σ_{v}

. Then, we have the following relation

F (x_{0}) - F (x_{1}) - 〈 F^{'} (x_{1}), x_{0} - x_{1} 〉 \leq 2 {(\frac{{2 ∥ A ∥}^{2} D_{v, Y}}{σ_{v}})}^{\frac{1}{2}} ∥ x_{0} - x_{1} ∥, \forall x_{0}, x_{1} \in R^{n} .

(48)

Here

F^{'} (x_{1}) \in \partial F (x_{1})

denotes the subgradient of F at

x_{1}

and

D_{v, Y}

is defined in (13).

Theorem 2.

Suppose that

{α_{k}}

in procedure

G_{M U S L}

is chosen to satisfy condition (24), and that

γ = 1

. Then for given

ε > 0

, the following statements hold for the MUSL method.

(1) the number of non-significant phases can be bounded by

S_{1} ({\tilde{D}}_{1}) : = [{log}_{2} \frac{D_{v, Y}}{{\tilde{D}}_{1}}],

and the number of significant phases can be bounded by

S_{2} (ε) : = [max \{0, {log}_{\frac{1}{q}} \frac{2 ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{ε}\}] .

(2) the total number of iterations performed by the MUSL method does not exceed

S_{1} ({\tilde{D}}_{1}) + S_{2} (ε) + \frac{\bar{c} Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{ε},

where

\bar{c} = \frac{c ∥ A ∥}{λ θ} (\sqrt{2} / (1 - q) + \sqrt{2} + 1)

, and

Ω_{ω, X}

and

Ω_{v, Y}

are defined in (14).

Proof.

(1) Let

Δ_{s} = {ub}_{s} - {lb}_{s}, s \geq 1

. Without loss of generality, we suppose that

Δ_{s} > ε .

From conclusion (2) of Lemma 2, we konw that

{\tilde{D}}_{S_{1} {(\tilde{D})}_{1} + 1} = 2 {\tilde{D}}_{S_{1} ({\tilde{D}}_{1})} < D_{v, Y}

, if non-significant phase occurs. From

{\tilde{D}}_{S_{1} + 1} = 2^{S_{1} ({\tilde{D}}_{1})} {\tilde{D}}_{1}

and

2^{S_{1} ({\tilde{D}}_{1})} {\tilde{D}}_{1} < D_{v, Y}

, we have

S_{1} (\tilde{D}) < {log}_{2} \frac{D_{v, Y}}{{\tilde{D}}_{1}} .

Furthermore, by (47) and (48) in Lemma 3 we have

\begin{matrix} Δ_{1} & = f (p_{1}) - h_{0} (p_{0}, p_{1}) \\ = F (p_{1}) - [F (p_{0}) + 〈 F^{'} (p_{0}), p_{1} - p_{0} 〉] \\ \leq 2 \sqrt{\frac{{2 ∥ A ∥}^{2} D_{v, Y}}{σ_{v}}} ∥ p_{1} - p_{0} ∥ \\ \leq 2 ∥ A ∥ \sqrt{\frac{2 D_{v, Y}}{σ_{v}, Y}} \sqrt{\frac{2 D_{ω, X}}{σ_{ω}}} \\ = 2 ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}} . \end{matrix}

From (e) in Lemma 1 and the definition of

{ub}_{s}

and

{lb}_{s}

, we know that

ε < Δ_{S_{2} (ε) + 1} \leq q Δ_{S_{2} (ε)} \leq q^{S_{2} (ε)} Δ_{1}

and

{(\frac{1}{q})}^{S_{2} (ε)} \leq \frac{Δ_{1}}{ε}

. In conclusion, we obtain that

S_{2} (ε) \leq {log}_{\frac{1}{q}} \frac{Δ_{1}}{ε} \leq {log}_{\frac{1}{q}} \frac{2 ∥ A ∥ Ω_{w, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{ε} .

(2) Let

K_{1} = {a_{1}, a_{2}, \dots, a_{m}}

and

K_{2} = {b_{1}, b_{2}, \dots, b_{m}}

denote the index sets of non-significant and significant phases respectively. For any

1 \leq i \leq m

in the non-significant phases, we have

{\tilde{D}}_{a_{i + 1}} = 2 {\tilde{D}}_{a_{i}}

. Then we know that

{\tilde{D}}_{a_{i}} = 2^{i - 1} {\tilde{D}}_{a_{1}} = 2^{i - 1} {\tilde{D}}_{1}

and

Δ_{a_{i}} > ε .

Due to Proposition 2, we know that K decreases about the first variable and increases about the second variable monotonously. In addition, we conclude that the total number of iterations performed by the significant phases can be bounded by

\begin{matrix} N_{1} = \sum_{i = 1}^{m} K_{MUSL} (Δ_{a_{i}}, {\tilde{D}}_{a_{i}}) & \leq \sum_{i = 1}^{m} K_{MUSL} (ε, {\tilde{D}}_{a_{i}}) \\ \leq m + \sum_{i = 1}^{m} \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} {\tilde{D}}_{a_{i}}}{σ_{ω} σ_{v}}} \\ \leq m + \sum_{i = 1}^{m} \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} 2^{i - 1} {\tilde{D}}_{a_{1}}}{σ_{w} σ_{v}}} \\ \leq S_{1} ({\tilde{D}}_{1}) + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} {\tilde{D}}_{1}}{σ_{ω}, σ_{v}}} \sum_{i = 1}^{S_{1} ({\tilde{D}}_{1})} 2^{\frac{i - 1}{2}} \\ \leq S_{1} ({\tilde{D}}_{1}) + \frac{2 c ∥ A ∥}{(\sqrt{2} - 1) λ θ ε} \sqrt{\frac{D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \\ = S_{1} ({\tilde{D}}_{1}) + \frac{c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{(\sqrt{2} - 1) λ θ ε} . \end{matrix}

(49)

From the fact that

Δ_{b_{j + 1}} \leq q Δ_{b_{j}}, \forall 1 \leq j \leq n

, we know that

Δ_{b_{j}} \geq q^{j - n} Δ_{b_{n}} > ε q^{j - n} .

It follows from the monotonic of K about variables and Proposition 2 that

{\tilde{D}}_{b_{j}} \leq 2 D_{v, Y}

. In addition, we obtain that the total number of iterations performed in the non-significant phases can be bounded by

\begin{matrix} N_{2} = \sum_{j = 1}^{n} K_{MUSL} (Δ_{b_{j}}, {\tilde{D}}_{b_{j}}) & \leq \sum_{j = 1}^{n} K_{MUSL} (ε q^{j - n}, 2 D_{v, Y}) \\ \leq n + \sum_{j = 1}^{n} \frac{2 c ∥ A ∥}{λ θ ε q^{j - n}} \sqrt{\frac{D_{ω, X} 2 D_{v, Y}}{σ_{ω} σ_{v}}} \\ \leq n + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \sum_{j = 1}^{n} q^{n - j} \\ \leq S_{2} (ε) + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \sum_{j = 1}^{S_{2} (ε)} q^{n - j} \\ \leq S_{2} (ε) + \frac{2 c ∥ A ∥}{(1 - q) λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \\ = S_{2} (ε) + \frac{\sqrt{2} c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{(1 - q) λ θ ε} . \end{matrix}

(50)

Combining (49) and (50), we conclude that the total number of iterations of the MUSL method does not exceed

\begin{matrix} N_{1} + N_{2} & = S_{1} ({\tilde{D}}_{1}) + \frac{c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{(\sqrt{2} - 1) λ θ ε} + S_{2} (ε) + \frac{\sqrt{2} c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{(1 - q) λ θ ε} \\ = S_{1} ({\tilde{D}}_{1}) + S_{2} (ε) + (\frac{1}{\sqrt{2} - 1} + \frac{\sqrt{2}}{1 - q}) \frac{c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{λ θ ε} . \end{matrix}

□

In what follows, we apply the MUSL method to solve a special problem where

\hat{f}

in (32) is a smooth convex function. In addition, the complexity results of procedure

G_{M U S L}

and MUSL method are established in this special case.

\hat{f}

is a smooth convex function means that there exists a constant

L_{1} > 0

such that

\hat{f}

satisfies

\hat{f} (z) - \hat{f} (x) - 〈 \nabla \hat{f} (x), z - x 〉 \leq \frac{L_{1}}{2} {∥ z - x ∥}^{2}, \forall z, x \in X .

(51)

From the fact that

F_{η}

has a Lipschitz continuous gradient with Lipschitz-constant

L_{η} > 0

, we have

F_{η} (z) - F_{η} (x) - 〈\nabla F_{η} (x), z - x〉 \leq \frac{L_{η}}{2} {∥ z - x ∥}^{2}, \forall z, x \in X .

Thus, we know that

f_{η} : = \hat{f} (x) + F_{η} (x)

is a smooth function on X and satisfies

f_{η} - f_{η} - 〈\nabla f_{η}, z - x〉 \leq \frac{L}{2} {∥ z - x ∥}^{2}, \forall z, x \in X,

(52)

where

L = L_{1} + L_{η}

. From the fact that F is non-smooth, we konw that f is non-smooth as well.

We make the following changes to the outernal algorithm of the MUSL method and internal procedure

G_{M U S L}

. Replace (47) in outernal algorithm with

p_{1} \in \underset{x \in X}{argmin} \{h_{0} (p_{0}, x) : = f_{η} (p_{0}) + 〈\nabla f_{η} (p_{0}), x - p_{0}〉\},

(53)

and (40) in procedure

G_{M U S L}

with

h_{η} (x_{k}^{l}, x) : = f_{η} (x) + f_{η} (x_{k}^{l}) + 〈 \nabla f_{η} (x_{k}^{l}), x - x_{k}^{l} 〉 .

The function

f_{η}

is fixed as a result of the fact that

η

is fixed in the internal procedure

G_{M U S L}

. Similar to Propostion 2, we have the following results.

Theorem 3.

We assume that

{α_{k}}

satisfies condition (24) and take

γ = 1

. Let ℓ be the level parameter,

d_{ω} (\cdot)

denotes the prox-function and η is the smoothing parameter in procedure

G_{M U S L}

. If

\hat{f}

is a smooth function, then the number of iterations of procedure

G_{M U S L}

can be bounded by,

\bar{N} : = c Ω_{ω, X}^{\frac{1}{2}} (\sqrt{\frac{L_{1}}{θ λ Δ_{0}}} + \frac{\sqrt{2} ∥ A ∥}{θ λ Δ_{0}} \sqrt{\frac{\tilde{D}}{σ_{v, Y}}}) + 1,

where

Δ_{0} = ub - 1 b = {\bar{f}}_{0} - {\underset{̲}{f}}_{0}

, and

Ω_{ω, X}

is defined in (14).

Proof.

Suppose that the procedure

G_{MUSL}

does not stop in iteration

\bar{K}

. Because of the features of

\hat{f}

and

F_{η}

, we know that

f_{η}

satisfies (52). Let

L_{η} = L

. From (14), we have

f_{η} ({\tilde{x}}_{\bar{K}}^{u}) - ℓ \leq \frac{c^{2} L Ω_{w, X}}{2 {\bar{K}}^{2}}, \forall \bar{K} \geq 1 .

Then according to the stopping test in Step 3b in procedure

G_{M U S L}

, we have

f_{η} ({\tilde{x}}_{K}^{u}) - ℓ > \frac{λ θ Δ_{0}}{2} .

Due to

L = L_{1} + L_{η}

, (35) and (38) we obtain that

\begin{matrix} \bar{K} < c Ω_{ω, X}^{\frac{1}{2}} \sqrt{\frac{L}{λ θ Δ_{0}}} & < c Ω_{ω, X}^{\frac{1}{2}} (\sqrt{\frac{L_{1}}{λ θ Δ_{0}}} + \sqrt{\frac{L_{η}}{λ θ Δ_{0}}}) \\ = c Ω_{ω, X}^{\frac{1}{2}} (\sqrt{\frac{L_{1}}{λ θ Δ_{0}}} + \frac{\sqrt{2} ∥ A ∥}{λ θ Δ_{0}} \sqrt{\frac{\tilde{D}}{σ_{v, Y}}}) . \end{matrix}

Thus, we have

\bar{N} = c Ω_{ω, X}^{\frac{1}{2}} (\sqrt{\frac{L_{1}}{λ θ Δ_{0}}} + \frac{\sqrt{2} ∥ A ∥}{λ θ Δ_{0}} \sqrt{\frac{\tilde{D}}{σ_{v, Y}}}) + 1 .

□

Here we will also establish the complexity of the MUSL method under this special case where

\hat{f}

is smooth and convex.

Theorem 4.

Assume that

{α_{k}}

satisfy condition (24) and take

γ = 1

. If

\hat{f}

is a smooth convex function, then the following statements hold for the MUSL method.

(1) the number of non-significant phases can be bounded by

{\bar{S}}_{1} : = [{log}_{2} \frac{D_{v, Y}}{{\tilde{D}}_{1}}],

and the number of significant phases can be bounded by

{\bar{S}}_{2} : = [max \{0, {log}_{\frac{1}{q}} \frac{2 ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{ε}\}] .

(2) the total number of iterations performed by the MUSL method does not exceed

{\bar{S}}_{1} + {\bar{S}}_{2} + ({\bar{S}}_{1} + \frac{1}{1 - \sqrt{q}}) c Ω_{w, X}^{\frac{1}{2}} \sqrt{\frac{L_{1}}{λ θ ε}} + (\frac{1}{\sqrt{2} - 1} + \frac{\sqrt{2}}{1 - q}) \frac{c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{λ θ ε} .

Proof.

Let

Δ_{s} = {ub}_{s} - {lb}_{s}, s \geq 1 .

Similar to Theorem 2, when non-significant phase occurs we have

\bar{S} < {log}_{2} \frac{D_{v, Y}}{{\tilde{D}}_{1}} .

By (53), (48) and (51) we have

\begin{matrix} Δ_{1} = & f (p_{1}) - h_{0} (p_{0}, p_{1}) \\ \leq & [F (p_{1}) - F (p_{0}) - 〈 \nabla F (p_{0}), p_{1} - p_{0} 〉] + [\hat{f} (p_{1}) - \hat{f} (p_{0}) - 〈 \nabla \hat{f} (p_{0}), p_{1} - p_{0} 〉] \\ \leq & 4 ∥ A ∥ \sqrt{\frac{D_{v, Y} D_{ω, X}}{σ_{v} σ_{ω}}} + \frac{L_{1}}{2} D_{ω, X} \\ = & 2 ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}} + \frac{L_{1}}{2} D_{ω, X} . \end{matrix}

From (e) in Lemma 1 and the definition of

{ub}_{s}

and

{lb}_{s}

, we know that

ε < Δ_{{\bar{S}}_{2} + 1} \leq q Δ_{{\bar{S}}_{2}} \leq q^{{\bar{S}}_{2}} Δ_{1}

and

{(\frac{1}{q})}^{{\bar{S}}_{2}} \leq \frac{Δ_{1}}{ε}

. In conclusion, we obtain that

{\bar{S}}_{2} \leq {log}_{\frac{1}{q}} \frac{Δ_{1}}{ε} \leq {log}_{\frac{1}{q}} \frac{2 ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}} + \frac{L_{1}}{2} D_{ω, X}}{ε} .

(2) Let

{\bar{K}}_{1} = {c_{1}, c_{2}, \dots, c_{m}}

and

{\bar{K}}_{2} = {d_{1}, d_{2}, \dots, d_{m}}

denote the index sets of non-significant and significant phases respectively. For any

1 \leq i \leq m

in the non-significant phases, we have

Q_{c_{i + 1}} = 2 Q_{c_{i}}

. Then we know that

Q_{c_{i}} = 2^{i - 1} Q_{c_{1}} = 2^{i - 1} Q_{1}

and

Δ_{c_{i}} > ε .

By Proposition 2, we know that

\bar{K}

decreases about the first variable and increases about the second variable monotonously. It follows that the total number of iterations performed by the significant phases can be bounded by

\begin{matrix} {\bar{N}}_{1} = \sum_{i = 1}^{m} K_{MUSL} (Δ_{c_{i}}, {\tilde{D}}_{c_{i}}) \leq & \sum_{i = 1}^{m} K_{MUSL} (ε, {\tilde{D}}_{c_{i}}) \\ \leq & m + \sum_{i = 1}^{m} \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} {\tilde{D}}_{c_{i}}}{σ_{ω} σ_{v}}} \\ \leq & m + \sum_{i = 1}^{m} \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} 2^{i - 1} {\tilde{D}}_{c_{1}}}{σ_{ω} σ_{v}}} \\ \leq & {\bar{S}}_{1} (1 + c \sqrt{\frac{2 D_{ω, X} L_{1}}{σ_{ω} λ θ ε}}) + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{D_{ω, X} {\tilde{D}}_{1}}{σ_{ω} σ_{v}}} \sum_{i = 1}^{{\bar{S}}_{1}} 2^{\frac{i - 1}{2}} \\ \leq & {\bar{S}}_{1} (1 + c \sqrt{\frac{2 D_{ω, X} L_{1}}{σ_{ω} λ θ ε}}) + \frac{2 c ∥ A ∥}{(\sqrt{2} - 1) λ θ ε} \sqrt{\frac{D_{ω, X} D_{v, Y}}{σ_{w} σ_{v}}} \\ = & {\bar{S}}_{1} (1 + c Ω_{ω, X}^{\frac{1}{2}} \sqrt{\frac{L_{1}}{λ θ ε}}) + \frac{c ∥ A ∥}{(\sqrt{2} - 1) λ θ ε} Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}} . \end{matrix}

(54)

From the fact that

Δ_{d_{j + 1}} \leq q Δ_{d_{j}} \forall 1 \leq j \leq n

, we know that

Δ_{d_{j}} \geq q^{j - n} Δ_{d_{n}} > ε q^{j - n} .

By the monotonic of

\bar{K}

about variables and the fact that

{\tilde{D}}_{b_{j}} \leq 2 D_{v, Y}

, we know that the total number of iterations performed in the non-significant phases can be bounded by

\begin{matrix} {\bar{N}}_{2} = \sum_{j = 1}^{n} K_{MUSL} (Δ_{d_{j}}, {\tilde{D}}_{d_{j}}) & \leq \sum_{j = 1}^{n} K_{MUSL} (ε q^{j - n}, 2 D_{v, Y}) \\ \leq n + \sum_{j = 1}^{n} \frac{2 c ∥ A ∥}{λ θ ε q^{j - n}} \sqrt{\frac{D_{ω, X} 2 D_{v, Y}}{σ_{ω} σ_{v}}} \\ \leq n + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \sum_{j = 1}^{n} q^{n - j} \\ \leq {\bar{S}}_{2} + c \sqrt{\frac{2 D_{ω, X} L_{1}}{σ_{ω} λ θ ε}} \sum_{j = 1}^{{\bar{S}}_{2}} q^{\frac{n - j}{2}} + \frac{2 c ∥ A ∥}{λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \sum_{j = 1}^{{\bar{S}}_{2}} q^{n - j} \\ \leq {\bar{S}}_{2} + \frac{c}{1 - \sqrt{q}} \sqrt{\frac{2 D_{ω, X} L_{1}}{σ_{ω} λ θ ε}} + \frac{2 c ∥ A ∥}{(1 - q) λ θ ε} \sqrt{\frac{2 D_{ω, X} D_{v, Y}}{σ_{ω} σ_{v}}} \\ = {\bar{S}}_{2} + \frac{c Ω_{ω, X}^{\frac{1}{2}}}{1 - \sqrt{q}} \sqrt{\frac{L_{1}}{λ θ ε}} + \frac{\sqrt{2} c ∥ A ∥ Ω_{ω, X}^{\frac{1}{2}} Ω_{v, Y}^{\frac{1}{2}}}{(1 - q) λ θ ε} . \end{matrix}

(55)

This together with (54) and (55) shows that the total number of MUSL method does not exceed

{\bar{N}}_{1} + {\bar{N}}_{2} .

□

5. Two-Stage Stochastic Programming and Numerical Experiments

In this section, the MAPL and MUSL methods are applied to solve the two-stage stochastic programming problems. Furthermore, we compare the two modified methods with the APL and USL methods. All algorithms are implemented in MATLAB (R2014a), and Mosek (8.0) is called for solving subproblems. The environment in which the program runs is Windows 7 (64 bite), Intel(R) Core(TM) i7-6700 CPU 3.40GHz and 16G memory.

Consider the following two-stage stochastic programming with recourse

min {f (x) = c^{T} x + E [Q (x, ξ)] : x \in X},

(56)

where

Q (x, ξ) = min {q^{T} y : W y = h + T x, y \geq 0} .

(57)

Here,

x \in R^{n_{1}}

and

y \in R^{n_{2}}

are the decision variables of the first-stage problem and the second-stage problem respectively.

X \neq \emptyset

is a compact convex set, and

ξ = (q, h, T) \in R^{n_{1} + m_{2} + m_{2} + n_{1}}

is a random vector with a known probability distribution and support

Ξ

. In [33], it was pointed out that by strong convexity one has

Q (x, ξ) = max {π^{T} (h + T x) : W^{T} π \leq q} .

(58)

Let

Π (q) = {π \in R^{m_{2}} : W^{T} \leq q}

be the feasible set of (58) and we assume that

Π (q) \neq \emptyset, \forall q \in Ξ

. It is easy to know that

Q (x, ξ)

is generally non-smooth. Therefore, for general distributions of

ξ

, (56) is a non-smooth convex programming problem. In [33] Ahmed applied Nesterov’s smoothing technique to the two-stage stochastic programming problem (56), and established a proper smooth approximation to

Q (x, ξ)

as follows. For a given smoothing parameter

μ

, consider the function

f_{μ} (x)

with the form

f_{μ} (x) = c^{T} x + E [Q_{μ} (x, ξ)],

(59)

where

Q_{μ} (x, ξ) = max {π^{T} (h + T x) - \frac{μ}{2} π^{T} π : π \in Π (q)} .

(60)

For

ξ

with discrete distribution, it is shown in [34,35] that

f_{μ}

is differentiable and its gradient is Lipschitz continuous. Furthermore, when

μ

is sufficiently small,

f_{μ}

can approximate f uniformly. Based on the above analysis, we next carry out the MAPL and MUSL methods to solve the problem (59) and (60) to illustrate the effectiveness of the methods and compare them with the APL and USL methods.

We perform numerical experiments on some existing SP instances in [1,36,37] including a telecommunication design (SSN) problem and a motor freight carrier routing problem (20Term). The SSN problem studied by Sen, Doverspike, and Cosares [1] comes from the telecommunications industry: the first-stage problem allocates capacity between network links, and the second-stage problem generates demands to connections requested between point-to-point pairs. The 20Term problem studied by Mak, Morton, and Wood [37] comes from a motor freight carrier’s model: the first-stage problem determines a program of carriers, and the second-stage problem adjusts the program according a multi-commodity network. The instances of SSN, 20Term and Storm are downloaded from the link: http://pwp.gatech.edu/guanghui-lan/computer-codes/. The dimensions related to these instances are shown in Table 1.

n_{i}

denotes the number of constraints in the i-stage problem, and

m_{i}

denotes the number of variable in the i-stage problem. In particular, we assume that the number of possible realizations of

ξ

is fixed, i.e.

N = 50

or 100. In this case, a total of five instances are tested. The integers in the brackets are the number of possible realizations. For given parameters

λ \in (0, 1), θ \in (0, 1)

, tolerance

ε = 1.0 e - 6

and stepsize

α_{k} = 2 / (k + 1)

such that (24), we compare the number of iterations and CPU time of different methods. The results are shown in Table 2, Table 3 and Table 4.

There are some observations on the results in Table 2, Table 3 and Table 4. When the initial gap

Δ

is large (up to

10^{2}

or

10^{7}

), MAPL and APL algorithms are implemented 400 times. The results in Table 2 show that MAPL method has certain advantages over APL method in terms of CPU time and the number of iterations. The results in Table 3 show that in addition to the advantages in CPU time, MUSL algorithm can achieve higher accuracy. From the results in Table 4 we know that compared with the MUSL algorithm, the MAPL algorithm has less CPU time and fewer number of iterations. This is because that while the MAPL method solves

2 N

linear programming problems, MUSL only needs to solve N smooth quadratic programming problems in the progress of algorithm.

6. Conclusions

In this paper, we presented two modified BL-type methods, the modified accelerated prox-level (MAPL) and modified uniform smoothing level (MUSL) methods, for uniformly solving the black-box CP problems and a class of structure non-smooth problems. In addition, both MAPL and MUSL methods can achieve optimal complexities respectively. To illustrate the effectiveness of the modified methods, they were then applied to solve the two-stage stochastic programming with recourse and numerical experiments were carried out. Finally, the numerical results shown that the MAPL and MUSL methods have certain advantages in algorithm efficiency and solution time.

Author Contributions

C.T. mainly contributed to the algorithm design and convergence analysis; B.H. mainly contributed to the convergence analysis and numerical results; and Z.W. mainly contributed to the algorithm design. All authors have read and agree to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation (11761013, 71861002) and Guangxi Natural Science Foundation (2018GXNSFFA281007) of China.

Acknowledgments

The authors would like to thank for the support funds.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Proof of Lemma 1.

(a) By using mathematical induction, we first show part a). It is obvious that

X_{f} (ℓ) \subseteq X_{0}^{^{'}}

. Assume that

X_{k - 1}^{^{'}}

is a localizer of

X_{f} (ℓ)

i.e.,

X_{f} (ℓ) \subseteq X_{k - 1}^{^{'}}

, for all

k \geq 1

. Then we have that

y \in X_{k - 1}^{^{'}}

, for all

y \in X_{f} (ℓ)

. By the definition of h and convexity of f we know that

h (x_{k}^{l}, y) \leq f (y) \leq ℓ

, for any

y \in X_{f} (ℓ)

. From the fact that

X_{f} (ℓ) \subseteq {\underset{̲}{X}}_{k}

and

{\underset{̲}{X}}_{k} \subseteq X_{k}^{^{'}}

we have that

X_{f} (ℓ) \subseteq X_{k}^{^{'}}

.

(b) It follows from

{\bar{f}}_{k} : = min \{{\bar{f}}_{k - 1}, f ({\tilde{x}}_{k}^{u})\} \leq {\bar{f}}_{k - 1}

in Step 3 of procedure

G_{M A P L}

that

f^{*} \leq {\bar{f}}_{k} \leq {\bar{f}}_{k - 1} \leq \dots \leq {\bar{f}}_{1} \leq {\bar{f}}_{0}

, for all

k \geq 1

.

(c) Form a) we have that

X_{f} (ℓ) \subseteq X_{k}^{^{'}}

. By the optimal condition of (20) we have that

- f^{^{'}} (x_{k}) \in N (x_{k})

. Then we obtain that

〈 Δ d_{ω} (x_{k}), y - x_{k} 〉 \geq 0

, for any

y \in {\underset{̲}{X}}_{k}

. By the definition of

{\bar{X}}_{k}

we have

y \in {\bar{X}}_{k}

and further

{\underset{̲}{X}}_{k} \subseteq {\bar{X}}_{k}

, i.e., there exists a set

X_{k}^{^{'}}

such that

{\underset{̲}{X}}_{k} \subseteq X_{k}^{^{'}} \subseteq {\bar{X}}_{k}

. In conclusion, we have

X_{f} (ℓ) \subseteq {\underset{̲}{X}}_{k} \subseteq X_{k}^{^{'}} \subseteq {\bar{X}}_{k}

, for all

k \geq 1

.

(d) It follows from the fact that X is a compact convex set and the definition of

{\underset{̲}{X}}_{k}

that

{\underset{̲}{X}}_{k}

is a compact convex set. Together with the strong convexity of

d_{ω}

, we know that there is a unique optimal solution for (20) if

{\underset{̲}{X}}_{k} \neq \emptyset

. Then we have

ℓ < f (x)

, for any

x \in X

. Furthermore, we know that

ℓ \leq f^{*}

.

(e) From b) we know that

f (p^{+}) \leq {\bar{f}}_{0} = f (p)

. We have

{lb}^{+} = ℓ

if the procedure

G_{M A P L}

stops at Step 2. From the fact that

ℓ = λ lb + (1 - λ) f (p)

we have

f (p^{+}) - {lb}^{+} \leq f (p) - [λ lb + (1 - λ) f (p)] = λ (f (p) - lb) .

We have

{lb}^{+} = lb

and

f (P^{+}) = f (x_{k}^{u}) \leq ℓ + θ (f (p) - ℓ)

if the procedure

G_{M A P L}

stops at Step 3. From the fact that

ℓ = λ lb + (1 - λ) f (p)

, we have

\begin{matrix} f (p^{+}) - {lb}^{+} \leq & ℓ + θ (f (p) - ℓ) - lb \\ = & (1 - θ) ℓ + θ f (p) - lb \\ = & [1 - λ (1 - θ)] (f (p) - lb) . \end{matrix}

□

Proof of Lemma 2.

(a) The proof of a) is the same as the proof of e) of Lemma 1.

(b) If the internal procedure

G_{M U S L}

stops at Step 3, we have that

{\bar{f}}_{k} \geq ℓ + θ ({\bar{f}}_{0} - ℓ)

and

f_{η} ({\tilde{x}}_{k}^{u}) \leq ℓ + \frac{θ}{2} ({\bar{f}}_{0} - ℓ)

. Then we have that

f ({\tilde{x}}_{k}^{u}) - f_{η} ({\tilde{x}}_{k}^{u}) \geq {\bar{f}}_{k} - f_{η} ({\tilde{x}}_{k}^{u}) > \frac{θ}{2} ({\bar{f}}_{0} - ℓ) .

Furthermore we know that

η \tilde{D} = \frac{θ}{2} ({\bar{f}}_{0} - ℓ) < f ({\tilde{x}}_{k}^{u}) - f_{η} ({\tilde{x}}_{k}^{u}) < η D_{v, Y} .

Then we have

\tilde{D} < D_{v, Y}

. Together with the definition

{\tilde{D}}^{+}

in Step 3b, we know that

{\tilde{D}}^{+} = 2 \tilde{D} < 2 D_{v, Y}

. □

References

Sen, S.; Doverspike, R.D.; Cosares, S. Network planning with random demand. Telecommun. Syst. 1994, 3, 11–30. [Google Scholar] [CrossRef]
Yang, L.F.; Jian, J.B.; Wang, Y.Y.; Dong, Z.Y. Projected mixed integer programming formulations for unit commitment problem. Int. J. Electr. Power Energy Syst. 2015, 68, 195–202. [Google Scholar] [CrossRef]
Yang, L.F.; Jian, J.B.; Zhu, Y.N.; Dong, Z.Y. Tight Relaxation Method for Unit Commitment Problem Using Reformulation and Lift-and-Project. IEEE Trans. Power Syst. 2015, 30, 13–23. [Google Scholar] [CrossRef]
Yang, L.F.; Zhang, C.; Jian, J.B.; Meng, K.; Xu, Y.; Dong, Z.Y. A novel projected two-binary-variable formulation for unit commitment in power systems. Appl. Energy 2017, 187, 732–745. [Google Scholar] [CrossRef]
Wallace, S.W.; Ziemba, W.T. Applications of Stochastic Programming; Society for Industrial and Applied Mathematics and the Mathematical Programming Society: Philadelphia, PA, USA, 2005. [Google Scholar]
Kelley, J.E., Jr. The cutting-plane method for solving convex programs. J. Soc. Ind. Appl. Math. 1960, 8, 703–712. [Google Scholar] [CrossRef]
Veinott, A.F. The Supporting Hyperplane Method for Unimodal Programming. Oper. Res. 1967, 15, 147–152. [Google Scholar] [CrossRef]
Ruszczynski, A. A regularized decomposition method for minimizing a sum of polyhedral functions. Math. Program. 1986, 35, 309–333. [Google Scholar] [CrossRef]
Linderoth, J.; Wright, S. Decomposition Algorithms for Stochastic Programming on a Computational Grid. Comput. Optim. Appl. 2003, 24, 207–250. [Google Scholar] [CrossRef]
Lemaréchal, C. Nonsmooth optimization and descent methods. In Research Report 78-4; IIASA: Laxenburg, Austria, 1978. [Google Scholar]
Mifflin, R. A modification and an extension of Lemaréchal’s algorithm for nonsmooth minimization. In Nondifferential and Variational Techniques in Optimization; Springer: Berlin, Germany, 1982; pp. 77–90. [Google Scholar]
Kiwiel, K.C. An aggregate subgradient method for nonsmooth convex minimization. Math. Program. 1983, 27, 320–341. [Google Scholar] [CrossRef]
Kiwiel, K.C. Proximity control in bundle methods for convex nondifferentiable minimization. Math. Program. 1990, 46, 105–122. [Google Scholar] [CrossRef]
Ruszczyński, A.; Świȩtanowski, A. Accelerating the regularized decomposition method for two stage stochastic linear problems. Eur. J. Oper. Res. 1997, 101, 328–342. [Google Scholar] [CrossRef]
Lemaréchal, C.; Nesterov, Y.; Nemirovskii, A. New variants of bundle methods. Math. Program. 1995, 69, 111–147. [Google Scholar] [CrossRef]
Ben-Tal, A.; Nemirovski, A. Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications; SIAM: Philadelphia, PA, USA, 2001. [Google Scholar]
Ben-Tal, A.; Nemirovski, A. Non-euclidean restricted memory level method for large-scale convex optimization. Math. Program. 2005, 102, 407–456. [Google Scholar] [CrossRef]
Richtárik, P. Approximate Level Method for Nonsmooth Convex Minimization. J. Optim. Theory Appl. 2012, 152, 334–350. [Google Scholar] [CrossRef]
Fischer, F.; Helmberg, C. A parallel bundle framework for asynchronous subspace optimization of nonsmooth convex functions. SIAM J. Optim. 2014, 24, 795–822. [Google Scholar] [CrossRef]
Kim, K.; Petra, C.G.; Zavala, V.M. An asynchronous bundle-trust-region method for dual decomposition of stochastic mixed-integer programming. SIAM J. Optim. 2019, 29, 318–342. [Google Scholar] [CrossRef]
Van Ackooij, W.; Frangioni, A. Incremental bundle methods using upper models. SIAM J. Optim. 2018, 28, 379–410. [Google Scholar] [CrossRef]
Iutzeler, F.; Malick, J.; de Oliveira, W. Asynchronous level bundle methods. Math. Program. 2019, 49, 1–30. [Google Scholar] [CrossRef]
Tang, C.; Jian, J.; Li, G. A proximal-projection partial bundle method for convex constrained minimax problems. J. Ind. Manag. Optim. 2019, 15, 757–774. [Google Scholar] [CrossRef]
Lan, G. Bundle-level type methods uniformly optimal for smooth and nonsmooth convex optimization. Math. Program. 2015, 149, 1–45. [Google Scholar] [CrossRef]
Nesterov, Y. A method for unconstrained convex minimization problem with the rate of convergence O(1/k²). Doklady AN USSR 1983, 269, 543–547. [Google Scholar]
Auslender, A.; Teboulle, M. Interior gradient and proximal methods for convex and conic optimization. SIAM J. Optim. 2006, 16, 697–725. [Google Scholar] [CrossRef]
Lan, G.; Lu, Z.; Monteiro, R.D. Primal-dual first-order methods with O(1/ϵ) iteration-complexity for cone programming. Math. Program. 2011, 126, 1–29. [Google Scholar] [CrossRef]
Lan, G. An optimal method for stochastic composite optimization. Math. Program. 2012, 133, 365–397. [Google Scholar] [CrossRef]
Nesterov, Y. Introductory Lectures on Convex Optimization a Basic Course; Springer Science & Business Media: New York, NY, USA, 2004; Volume 87. [Google Scholar]
Nesterov, Y. Smooth minimization of non-smooth functions. Math. Program. 2005, 103, 127–152. [Google Scholar] [CrossRef]
Chen, Y.; Lan, G.; Ouyang, Y.; Zhang, W. Fast Bundle-Level Type Methods for Unconstrained and Ball-Constrained Convex Optimization. Comput. Optim. Appl. 2019, 73, 159–199. [Google Scholar] [CrossRef]
Nemirovsky, A.S.; Yudin, D. Problem Complexity and Method Efficiency in Optimization. In Wiley-Interscience Series in Discrete Mathematics; Wiley-Interscience: New York, NY, USA, 1983. [Google Scholar]
Ahmed, S. Smooth Minimization of Two-Stage Stochastic Linear Programs; Georgia Institute of Technology: Atlanta, GA, USA, 2006. [Google Scholar]
Chen, X.; Qi, L.; Womersley, R.S. Newton’s method for quadratic stochastic programs with recourse. J. Comput. Appl. Math. 1995, 60, 29–46. [Google Scholar] [CrossRef]
Chen, X. A parallel BFGS-SQP method for stochastic linear programs. In Computational Techniques and Applications; World Scientific: Princeton, NJ, USA, 1995; pp. 67–74. [Google Scholar]
Linderoth, J.; Shapiro, A.; Wright, S. The empirical behavior of sampling methods for stochastic programming. Ann. Oper. Res. 2006, 142, 215–241. [Google Scholar] [CrossRef]
Mak, W.; Morton, D.P.; Wood, R.K. Monte Carlo bounding techniques for determining solution quality in stochastic programs. Oper. Res. Lett. 1999, 24, 47–56. [Google Scholar] [CrossRef]

Table 1. Data about instances.

	$n_{1}$	$m_{1}$	$n_{2}$	$m_{2}$	$\dim (ξ)$	${ub}_{1}$	$Δ_{1}$
SSN(50)	89	1	706	175	86	$2.352586 \times 10^{2}$	$1.008000 \times 10^{3}$
SSN(100)	89	1	706	175	86	$2.407279 \times 10^{2}$	$9.878400 \times 10^{2}$
20Term(50)	63	3	764	124	40	$7.718617 \times 10^{5}$	$1.804693 \times 10^{7}$
20Term(100)	63	3	764	124	40	$7.664142 \times 10^{5}$	$1.801832 \times 10^{7}$
Storm(50)	121	185	1259	528	117	$2.138447 \times 10^{7}$	$6.370092 \times 10^{6}$

Table 2. MAPL and APL methods on SP instances.

Prob	Alg	Iter	Ub	$δ$	Time
SSN(50)	MAPL	135	$4.838073 \times 10^{0}$	$9.042063 \times 10^{- 7}$	63.71
	APL	140	$4.838181 \times 10^{0}$	$7.252755 \times 10^{- 7}$	66.39
SSN(100)	MAPL	215	$7.352607 \times 10^{0}$	$9.748218 \times 10^{- 6}$	233.95
	APL	263	$7.352607 \times 10^{0}$	$9.835275 \times 10^{- 7}$	292.95
20Term(50)	MAPL	281	$2.549453 \times 10^{5}$	$1.206150 \times 10^{- 6}$	193.9
	APL	401	$2.549453 \times 10^{5}$	$1.168752 \times 10^{- 6}$	282.83
20Term(100)	MAPL	386	$2.549453 \times 10^{5}$	$8.772477 \times 10^{- 7}$	519.40
	APL	401	$2.532875 \times 10^{5}$	$1.059493 \times 10^{- 4}$	536.20
Storm(50)	MAPL	119	$1.562059 \times 10^{7}$	$1.093373 \times 10^{- 6}$	87.59
	APL	401	$1.547024 \times 10^{7}$	$2.211075 \times 10^{- 3}$	518.62

Table 3. MUSL and USL methods on SP instances.

Prob	Alg	Iter	Ub	$δ$	Time	Q
SSN(50)	MUSL	208	$4.838072 \times 10^{0}$	$3.003666 \times 10^{- 6}$	206.81	$3.200000 \times 10^{1}$
	USL	322	$4.839140 \times 10^{0}$	$9.337998 \times 10^{- 7}$	309.43	$3.200000 \times 10^{1}$
SSN(100)	MUSL	181	$7.352608 \times 10^{0}$	$1.686893 \times 10^{- 6}$	351.11	$3.200000 \times 10^{1}$
	USL	190	$7.352699 \times 10^{0}$	$8.517813 \times 10^{- 7}$	367.12	$3.200000 \times 10^{1}$
20Term(50)	MUSL	401	$2.549453 \times 10^{5}$	$6.026059 \times 10^{- 5}$	547.93	$3.022315 \times 10^{23}$
	USL	401	$2.549453 \times 10^{5}$	$8.844555 \times 10^{- 4}$	522.41	$2.147484 \times 10^{9}$
20Term(100)	MUSL	380	$2.532875 \times 10^{5}$	$2.171117 \times 10^{- 6}$	1148.29	$1.677722 \times 10^{7}$
	USL	401	$2.532875 \times 10^{5}$	$4.172471 \times 10^{- 6}$	1308.03	$1.801440 \times 10^{16}$
Storm(50)	MUSL	299	$2.138447 \times 10^{7}$	$1.087785 \times 10^{- 6}$	437.11	$5.242880 \times 10^{5}$
	USL	401	$1.546949 \times 10^{7}$	$2.866387 \times 10^{- 4}$	1084.46	$1.511157 \times 10^{23}$

Table 4. MAPL and MUSL methods on SP instances.

Prob	Alg	Iter	Ub	$δ$	Time
SSN(50)	MAPL	159	$4.838072 \times 10^{0}$	$1.016148 \times 10^{- 6}$	90.68
	MUSL	175	$4.838073 \times 10^{0}$	$9.421333 \times 10^{- 6}$	201.82
SSN(100)	MAPL	190	$7.352608 \times 10^{0}$	$6.397336 \times 10^{- 6}$	217.63
	MUSL	275	$7.352607 \times 10^{0}$	$2.900934 \times 10^{- 6}$	638.96
20Term(50)	MAPL	347	$2.549453 \times 10^{5}$	$4.465430 \times 10^{- 6}$	271.94
	MUSL	401	$2.549453 \times 10^{5}$	$3.039380 \times 10^{- 5}$	663.75
20Term(100)	MAPL	401	$2.532875 \times 10^{5}$	$3.203178 \times 10^{- 2}$	607.93
	MUSL	401	$2.532875 \times 10^{5}$	$1.512163 \times 10^{- 4}$	1174.29
Storm(50)	MAPL	19	$1.866006 \times 10^{7}$	$1.028180 \times 10^{- 6}$	18.08
	MUSL	29	$2.138447 \times 10^{7}$	$1.799315 \times 10^{- 6}$	61.25

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tang, C.; He, B.; Wang, Z. Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming. Mathematics 2020, 8, 265. https://doi.org/10.3390/math8020265

AMA Style

Tang C, He B, Wang Z. Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming. Mathematics. 2020; 8(2):265. https://doi.org/10.3390/math8020265

Chicago/Turabian Style

Tang, Chunming, Bo He, and Zhenzhen Wang. 2020. "Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming" Mathematics 8, no. 2: 265. https://doi.org/10.3390/math8020265

APA Style

Tang, C., He, B., & Wang, Z. (2020). Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming. Mathematics, 8(2), 265. https://doi.org/10.3390/math8020265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming

Abstract

1. Introduction

2. Related Work

2.1. The Bundle Level (BL) Method

2.2. The Accelerated Prox-Level (APL) Method and Its Gap Reduction Procedure $G_{M A P L}$

3. The Modified Accelerated Prox-Level (MAPL) Method

4. The Modified Uniform Smoothing Level (MUSL) Method

5. Two-Stage Stochastic Programming and Numerical Experiments

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Modified Accelerated Bundle-Level Methods and Their Application in Two-Stage Stochastic Programming

Abstract

1. Introduction

2. Related Work

2.1. The Bundle Level (BL) Method

2.2. The Accelerated Prox-Level (APL) Method and Its Gap Reduction Procedure G M A P L

3. The Modified Accelerated Prox-Level (MAPL) Method

4. The Modified Uniform Smoothing Level (MUSL) Method

5. Two-Stage Stochastic Programming and Numerical Experiments

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

2.2. The Accelerated Prox-Level (APL) Method and Its Gap Reduction Procedure $G_{M A P L}$