Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters

Yanni Guo; Xiaozhi Zhao

doi:10.3390/math7060535

and

College of Science, Civil Aviation University of China, Tianjin 300300, China

^*

Author to whom correspondence should be addressed.

Mathematics2019, 7(6), 535;https://doi.org/10.3390/math7060535

This article belongs to the Special Issue Fixed Point, Optimization, and Applications

Version Notes

Order Reprints

Abstract

In this paper, a multi-parameter proximal scaled gradient algorithm with outer perturbations is presented in real Hilbert space. The strong convergence of the generated sequence is proved. The bounded perturbation resilience and the superiorized version of the original algorithm are also discussed. The validity and the comparison with the use or not of superiorization of the proposed algorithms were illustrated by solving the

l^{1} - l^{2}

problem.

Keywords:

strong convergence; proximal scaled gradient algorithm; multi-parameter; superiorization; convex minimization problem

MSC:

47J25; 49J53; 49M37; 65J15; 90C25

1. Introduction

The superiorization method, which was introduced by Censor in 2010 [1], can solve a broad class of nonlinear constrained optimal problems that result from many practical problems such as computed tomography [2], medical image recovery [3,4], convex feasibility problems [5,6], inverse problems of radiation therapy [7] and so on, which generates an automatic procedure based on the fact that the basic algorithm has the property of bounded perturbation resilience so that it is expected to get lower values of the objective function. In recent years, some researchers have focused on finding more applications of superiorization methodology while some other researchers have investigated the bounded perturbation resilience of algorithms—see, for examples, [8,9,10,11,12,13,14,15,16,17].

In this paper, we study the bounded perturbation resilience property and the corresponding superiorization of a proximal scaled gradient algorithm with multi-parameters for solving the following non-smooth composite optimization problem of the form

min_{x \in H} [f (x) + g (x)] = : min_{x \in H} Φ (x),

(1)

where H is a real Hilbert space endowed with an inner product

< \cdot, \cdot >

and the induced norm

‖ \cdot ‖

.

f, g \in Γ_{0} (H)

with

Γ_{0} (H)

defined by

Γ_{0} (H) : = {f : H \to (- \infty, + \infty] | f is proper lower semicontinuous convex} .

In addition, f has L-Lipschitz continuous gredient

\nabla f

on H with

L > 0

.

The proximal gradient method is one of the popular iterative methods used for solving problem (1), which has received a lot of attention in the recent past due to its fast theoretical convergence rates and strong practical performance. Given an initial value

x_{0} \in H

, the proximal gradient method generates the following sequence

{x_{n}}

:

x_{n + 1} = p r o x_{γ g} (I - γ \nabla f) (x_{n}), \forall n \geq 0,

(2)

where

γ > 0

is the step size and

p r o x_{γ g}

is the proximal operator of g of order

γ

(please refer to Definition 2, Section 2). Then, the generated sequence

{x_{n}}

converges weakly to a solution of problem (1) if the solution set

S : = A r g m i n [f (x) + g (x)] \neq \emptyset

and

0 < γ < \frac{2}{L}

(see, for instance, [18], Theorem 25.8).

Xu [19] raised the following more general proximal gradient algorithm:

x_{n + 1} = p r o x_{γ_{n} g} (I - γ_{n} \nabla f) (x_{n}) .

(3)

The weak convergence of the generated sequence

{x_{n}}

was obtained. If

dim H = \infty

, the strong convergence can not be guaranteed.

The scaled method was proposed by Strand [20] for increasing the rate of convergence of some algorithm. In a finite dimensional space, the selection of scaling matrices depends on the particular problem [21,22]. Jin, Censor and Jiang [13] introduced the following projected scaled gradient (PSG) algorithm:

x_{n + 1} = P_{C} (I - γ_{n} D \nabla f) (x_{n}), \forall n \geq 0,

(4)

where

D (x_{n})

is a diagonal scaling matrix for each

x_{n}

,

P_{C} : R^{n} \to C (\subset R^{n})

is defined as

P_{C} (x) = inf {‖ x - y ‖, \forall y \in C}

for solving the following convex minimization problems:

minimize J (x) subject to x \in C,

(5)

where

C \subset R^{n}

is a nonempty, closed and convex set, the objective function

J : C \to R

is convex. With the assumption that

\sum_{n = 0}^{\infty} ‖ \nabla f (x_{n}) - D (x_{n}) \nabla f (x_{n}) ‖ < \infty

(6)

and other conditions, the convergence of the PSG method in the presence of bounded perturbations was proved.

Motivated by [13], Guo, Cui and Guo [23] discussed the proximal gradient algorithm with perturbations:

x_{n + 1} = p r o x_{λ_{n} g} (I - λ_{n} D \nabla f + e) x_{n} .

(7)

They proved that the generated sequence

{x_{n}}

converges weakly to the solution of problem (1). After that, Guo and Cui [15] applied the convex combination of contraction operator and proximal gradient operator to obtain the strong convergence of the generated sequence and discussed the bounded perturbation resilience of the exact algorithm.

In this paper, we will study the following proximal scaled gradient algorithm with multi-parameters:

x_{n + 1} : = t_{n} h (x_{n}) + γ_{n} x_{n} + λ_{n} p r o x_{α_{n} g} (x_{n} - α_{n} D (x_{n}) \nabla f (x_{n})) + e_{n}, n \geq 0,

(8)

which is a further generalization of the above algorithms. We will discuss the strong convergence of (8) and the bounded perturbation resilience of its exact algorithm just like the algorithms named above. In addition, we also study the superiorized version of the exact algorithm of (8).

The rest is organized as follows. In the next section, we introduce some basic concepts and lemmas. In Section 3, we discuss the strong convergence results of the exact and non-exact algorithms. In Section 4, we provide two numerical examples for illustrating the performances of the iterations. Finally, we summarize the main points of this paper in Section 5.

2. Preliminaries

Let H be a real Hilbert space endowed by an inner product

< \cdot, \cdot >

and the induced norm

‖ \cdot ‖

. Let

{x_{n}}

be a sequence in H.

z \in H

is said to be a weak cluster point of

{x_{n}}

if there exists a subsequence

{x_{n_{j}}}

of

{x_{n}}

that converges weakly to it. The set of all weak cluster points of

{x_{n}}

is denoted by

ω_{w} (x_{n})

. Let

T : H \to H

be a nonlinear operator. Set

F i x (T) : = {x \in H : T x = x}

.

The following definitions are needed in proving our main results.

Definition 1.

([24], Proposition 2.1)

(i): T is non-expansive if

$‖ T x - T y ‖ \leq ‖ x - y ‖, \forall x, y \in H .$

(9)
(ii): T is L-Lipschitz continuous with $L \geq 0$ , if

$‖ T x - T y ‖ \leq L ‖ x - y ‖, \forall x, y \in H .$

(10)

We call T a contractive mapping if $0 \leq L < 1$ .
(iii): T is firmly non-expansive if

${‖ T x - T y ‖}^{2} \leq {‖ x - y ‖}^{2} - {‖ (I - T) x - (I - T) y ‖}^{2} .$

(11)
(iv): T is α-averaged if there exists a non-expansive operator $S : H \to H$ and $α \in (0, 1)$ , such that

$T = (1 - α) I + α S .$

(12)

In particular, a firmly non-expansive mapping is $\frac{1}{2}$ -averaged.
(v): T is v-inverse strongly monotone (v-ism) with $v > 0$ , if

$⟨ T x - T y, x - y ⟩ \geq {v ‖ T x - T y ‖}^{2}, \forall x, y \in H .$

(13)

Definition 2.

([25], Proximal Operator) Let

g \in Γ_{0} (H)

. The proximal operator of g is defined by

p r o x_{g} (x) : = arg min_{y \in H} {\frac{{‖ y - x ‖}^{2}}{2} + g (y)}, x \in H .

(14)

The above definition is well defined since

\frac{{‖ y - x ‖}^{2}}{2} + g (y)

has only one minimizer on H for each

x \in H

and for given

g \in Γ_{0} (H)

(see [18], Proposition 12.15).

The proximal operator of g of order

α > 0

is defined as

p r o x_{α g} (x) : = arg min_{y \in H} {\frac{{‖ y - x ‖}^{2}}{2 α} + g (y)}, x \in H .

(15)

The following Lemmas 1–3 describe the properties of proximal operators.

Lemma 1.

([19,26], Lemma 2.4, Lemma 3.3) Let

g \in Γ_{0} (H)

, and

α > 0, μ > 0

. Then,

p r o x_{α g} (x) = p r o x_{μ g} (\frac{μ}{α} x + (1 - \frac{μ}{α}) p r o x_{α g} x) .

(16)

Moreover, if

α < μ

, we also have

‖ x - p r o x_{α g} (I - α \nabla f) x ‖ \leq 2 ‖ x - p r o x_{μ g} (I - μ \nabla f) x ‖, \forall x \in H .

(17)

Lemma 2.

[18] (Non-expansiveness) Let

g \in Γ_{0} (H)

and

α > 0

. Then, the proximal operator

p r o x_{α g}

is

\frac{1}{2}

-averaged. We obtain the non-expansiveness of the proximal operator

‖ p r o x_{α g} (x) - p r o x_{α g} (y) ‖ \leq ‖ x - y ‖, \forall x, y \in H .

(18)

Lemma 3.

([19], Propostion 3.2) Let

f, g \in Γ_{0} (H)

, f be differentiable and

z \in H, α > 0

. Then, z is a solution to (1) if and only if z is the fixed point of the following equation:

z = p r o x_{α g} (I - α \nabla f) z .

(19)

The following lemmas play important roles in proving the strong convergence result.

Lemma 4.

([18], Corollary 4.18) Let

T : H \to H

be a non-expansive mapping with

F i x (T) \neq \emptyset

. If

{x_{n}}

is a sequence in H converging weakly to x and if

{(I - T) x_{n}}

converges strongly to 0, then

(I - T) x = 0

.

Lemma 5.

([27], Lemma 2.5) Assume that

{s_{n}}

is a sequence of nonnegative real numbers satisfying

s_{n + 1} \leq (1 - γ_{n}) s_{n} + γ_{n} δ_{n} + β_{n}, n \geq 0,

(20)

where

{γ_{n}} \subset [0, 1]

,

{δ_{n}} \subset R

such that

(i): $\sum_{n = 0}^{\infty} γ_{n} = \infty$ ;
(ii): $lim {sup}_{n \to \infty} δ_{n} \leq 0$ ;
(iii): $\sum_{n = 0}^{\infty} β_{n} < \infty$ .

Then,

{lim}_{n \to \infty} s_{n} = 0

.

Lemma 6.

([28], Lemma 2.4) Let

x, y \in H

and

α, β \in R

. Then,

(i): ${‖ x + y ‖}^{2} \leq {‖ x ‖}^{2} + 2 ⟨ y, x + y ⟩$ ;
(ii): ${‖ α x + β y ‖}^{2} = {α (α + β) ‖ x ‖}^{2} + {β (α + β) ‖ y ‖}^{2} - α β {‖ x - y ‖}^{2}$ ;
(iii): ${‖ α x + (1 - α) y ‖}^{2} = {α ‖ x ‖}^{2} + {(1 - α) ‖ y ‖}^{2} - α (1 - α) {‖ x - y ‖}^{2}$ .

Lemma 7.

([23], Proposition 3.3) Let

f, g \in Γ_{0} (H)

. For any

0 < α < \frac{2}{L}

, where L is the Lipschitz constant of

\nabla f

.

p r o x_{α g} (I - α \nabla f)

is

\frac{α L + 2}{4}

-averaged. Hence, it is non-expansive.

3. The Convergence Analysis and the Superiorized Version

In this section, we prove that the sequence

{v_{n}}

generated by the exact form of (8) converges strongly to a solution of problem (1) at first. Then, we discuss the strong convergence of algorithm (8). Finally, we investigate the bounded perturbation resilience of the exact iteration by viewing it as a special case of algorithm (8). The superiorized version is also presented at the end of this section.

3.1. The Exact Form of Algorithm (8)

Given the errors

e_{n} \equiv 0, n \geq 0

in (8), we get the exact version of (8):

\begin{matrix} v_{n + 1} & = t_{n} h (v_{n}) + γ_{n} v_{n} + λ_{n} p r o x_{α_{n} g} (v_{n} - α_{n} D (v_{n}) \nabla f (v_{n})) \\ = : t_{n} h (v_{n}) + γ_{n} v_{n} + λ_{n} p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n}, \end{matrix}

(21)

where

{t_{n}}

,

{γ_{n}}, {λ_{n}} \subset [0, 1]

such that

0 < {inf}_{n} γ_{n}

and

t_{n} + γ_{n} + λ_{n} = 1

for all

n \geq 0

.

h : H \to H

is a

ρ

-contraction for some

ρ \in [0, 1)

.

f, g \in Γ_{0} (H)

, f has the Lipschitz continuous gradient

\nabla f

with Lipschitz constant

L > 0

.

D (x) : H \to H

is a linear bounded operator for each

x \in H

with an upper bound

N_{x}

and satisfies

\sum_{n = 0}^{\infty} ‖ \nabla f (v_{n}) - D (v_{n}) \nabla f (v_{n}) ‖ = : \sum_{n = 0}^{\infty} ‖ θ (v_{n}) ‖ < \infty .

(22)

Provided that

{t_{n}}, {λ_{n}}

and

{α_{n}}

satisfy some additional conditions, we get the following strong convergence result of algorithm (21).

Theorem 1.

Suppose that the solution set S of (1) is not empty, (22) and the following conditions hold:

(i): $0 < α : = {inf}_{n} α_{n} \leq α_{n} < \frac{2}{L}$ for all n;
(ii): ${lim}_{n \to \infty} t_{n} = 0$ and $\sum_{n = 0}^{\infty} t_{n} = \infty$ ;
(iii): ${lim}_{n \to \infty} \frac{t_{n}}{λ_{n}} = 0$ .

The sequence

{v_{n}}

generated by algorithm (21) converges strongly to a point

z \in S

, where z is the unique solution of the following variational inequality problem:

⟨ (I - h) z, v - z ⟩ \geq 0, v \in S .

(23)

Proof of Theorem 1.

We will complete the proof by three steps.

Step 1.

{v_{n}}

is a bounded sequence in H.

Let

z \in S

, then we have

z = p r o x_{α_{n} g} (I - α_{n} \nabla f) z

by Lemma 3. In view of Lemmas 2 and 7, we also get that

p r o x_{α_{n} g}

and

p r o x_{α_{n} g} (I - α_{n} \nabla f)

are non-expansive for

n \geq 0

. Now, let us calculate

\begin{matrix} ‖ v_{n + 1} - z ‖ \\ = ‖ t_{n} (h (v_{n}) - z) + γ_{n} (v_{n} - z) + λ_{n} (p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n} - z) ‖ \\ \leq t_{n} ‖ h (v_{n}) - h (z) + h (z) - z ‖ + γ_{n} ‖ v_{n} - z ‖ \\ + λ_{n} ‖ p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n} - p r o x_{α_{n} g} (I - α_{n} \nabla f) z ‖ \\ \leq t_{n} ρ ‖ v_{n} - z ‖ + t_{n} ‖ h (z) - z ‖ + γ_{n} ‖ v_{n} - z ‖ \\ + λ_{n} ‖ p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n} - p r o x_{α_{n} g} (I - α_{n} \nabla f) v_{n} \\ + p r o x_{α_{n} g} (I - α_{n} \nabla f) v_{n} - p r o x_{α_{n} g} (I - α_{n} \nabla f) z ‖ \\ \leq (1 - t_{n} (1 - ρ)) ‖ v_{n} - z ‖ + t_{n} ‖ h (z) - z ‖ + λ_{n} α_{n} ‖ θ (v_{n}) ‖ \\ = (1 - t_{n} (1 - ρ)) ‖ v_{n} - z ‖ + t_{n} (1 - ρ) \frac{1}{(1 - ρ)} ‖ h (z) - z ‖ \\ + (1 - t_{n} - γ_{n}) α_{n} ‖ θ (v_{n}) ‖ \\ \leq (1 - t_{n} (1 - ρ)) ‖ v_{n} - z ‖ + t_{n} (1 - ρ) \frac{1}{(1 - ρ)} ‖ h (z) - z ‖ \\ + (1 - t_{n} + t_{n} ρ) α_{n} ‖ θ (v_{n}) ‖ \\ = (1 - t_{n} (1 - ρ)) (‖ v_{n} - z ‖ + α_{n} ‖ θ (v_{n}) ‖) + t_{n} (1 - ρ) \frac{1}{(1 - ρ)} ‖ h (z) - z ‖ \\ \leq max {‖ v_{n} - z ‖ + α_{n} ‖ θ (v_{n}) ‖, \frac{‖ h (z) - z ‖}{1 - ρ}} . \end{matrix}

(24)

An induction argument shows that

\begin{matrix} ‖ v_{n + 1} - z ‖ & \leq max {‖ v_{0} - z ‖ + \sum_{k = 0}^{n} α_{k} ‖ θ (v_{k}) ‖, \frac{‖ h (z) - z ‖}{1 - ρ}} \\ \leq max {‖ v_{0} - z ‖ + M, \frac{‖ h (z) - z ‖}{1 - ρ}}, \end{matrix}

(25)

where

M : = \sum_{n = 0}^{\infty} α_{n} ‖ θ (v_{n}) ‖ < \infty

as

{α_{n}}

is bounded and

\sum_{n = 0}^{\infty} ‖ θ (v_{n}) ‖ < \infty

. Hence,

{v_{n}}

is bounded. Consequently,

{h (v_{n})}

is bounded since h is a

ρ

-contraction.

Step 2. There exists a subsequence

{v_{n_{j}}} \subset {v_{n}}

such that

ω_{w} (v_{n_{j}}) \subset S

.

We denote by

\begin{matrix} D_{n} : = p r o x_{α_{n} g} (I - α_{n} D \nabla f), \\ T_{n} : = p r o x_{α_{n} g} (I - α_{n} \nabla f) \end{matrix}

(26)

for briefness.

Using the notation

ρ_{n} : = \frac{λ_{n}}{1 - t_{n}}

, one has

ρ_{n} \in (0, 1)

,

\begin{matrix} u_{n} : & = \frac{1}{1 - t_{n}} (γ_{n} v_{n} + λ_{n} p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n}) \\ = (1 - ρ_{n}) v_{n} + ρ_{n} D_{n} v_{n}, \\ v_{n + 1} : & = t_{n} h (v_{n}) + (1 - t_{n}) u_{n} . \end{matrix}

(27)

Given

z \in S

, we consider by utilizing Lemma 6 (iii)

\begin{matrix} ‖ v_{n + 1} {- z ‖}^{2} \\ = ‖ t_{n} (h (v_{n}) - z) + (1 - t_{n}) (u_{n} - z) ‖^{2} \\ = ‖ t_{n} (h (v_{n}) - h (z) + h (z) - z) + (1 - t_{n}) (u_{n} - z) ‖^{2} \\ \leq ‖ t_{n} (h (v_{n}) - h (z)) + (1 - t_{n}) (u_{n} - z) ‖^{2} + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ \leq t_{n} ρ^{2} ‖ v_{n} {- z ‖}^{2} + (1 - t_{n}) {‖ u_{n} - z ‖}^{2} + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ . \end{matrix}

(28)

Meanwhile, we derive

\begin{matrix} ‖ u_{n} {- z ‖}^{2} \\ = ‖ (1 - ρ_{n}) v_{n} + ρ_{n} D_{n} v_{n} {- z ‖}^{2} \\ = ‖ v_{n} - z + ρ_{n} (D_{n} v_{n} - v_{n}) ‖^{2} \\ = ‖ v_{n} {- z ‖}^{2} + ρ_{n}^{2} {‖ D_{n} v_{n} - v_{n} ‖}^{2} - 2 ρ_{n} ⟨ v_{n} - z, v_{n} - D_{n} v_{n} ⟩ \\ = ‖ v_{n} {- z ‖}^{2} + ρ_{n}^{2} ‖ D_{n} v_{n} - v_{n} ‖^{2} - ρ_{n} (‖ v_{n} {- z ‖}^{2} + ‖ D_{n} v_{n} - v_{n} ‖^{2} - ‖ D_{n} v_{n} - z ‖^{2}) \\ = (1 - ρ_{n}) ‖ v_{n} {- z ‖}^{2} - ρ_{n} (1 - ρ_{n}) ‖ D_{n} v_{n} - v_{n} ‖^{2} + ρ_{n} {‖ D_{n} v_{n} - z ‖}^{2} . \end{matrix}

(29)

Notice that

\begin{matrix} ‖ D_{n} v_{n} - v_{n} ‖^{2} \\ = ‖ (D_{n} - T_{n}) v_{n} + T_{n} v_{n} - v_{n} ‖^{2} \\ \leq α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + ‖ T_{n} v_{n} - v_{n} ‖^{2} + 2 α_{n} ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - v_{n} ‖, \end{matrix}

(30)

and that

\begin{matrix} ‖ D_{n} v_{n} {- z ‖}^{2} \\ = ‖ D_{n} v_{n} - T_{n} v_{n} + T_{n} v_{n} {- z ‖}^{2} \\ \leq α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + ‖ v_{n} {- z ‖}^{2} + 2 α_{n} ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - z ‖ \\ \leq α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + ‖ v_{n} {- z ‖}^{2} + 2 α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ \end{matrix}

(31)

since

T_{n}

is non-expansive and

z = T_{n} z

(see Lemma 3). We then obtain by substituting (30) and (31) into (29)

\begin{matrix} ‖ u_{n} {- z ‖}^{2} \\ = (1 - ρ_{n}) ‖ v_{n} {- z ‖}^{2} - ρ_{n} (1 - ρ_{n}) ‖ D_{n} v_{n} - v_{n} ‖^{2} + ρ_{n} {‖ D_{n} v_{n} - z ‖}^{2} \\ \leq (1 - ρ_{n}) {‖ v_{n} - z ‖}^{2} \\ - ρ_{n} (1 - ρ_{n}) (α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + ‖ T_{n} v_{n} - v_{n} ‖^{2} + 2 α_{n} ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - v_{n} ‖) \\ + ρ_{n} (α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + ‖ v_{n} {- z ‖}^{2} + 2 α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖) \\ = ‖ v_{n} {- z ‖}^{2} - ρ_{n} (1 - ρ_{n}) ‖ T_{n} v_{n} - v_{n} ‖^{2} + ρ_{n}^{2} α_{n}^{2} {‖ θ (v_{n}) ‖}^{2} \\ - 2 α_{n} ρ_{n} (1 - ρ_{n}) ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - v_{n} ‖ + 2 ρ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ . \end{matrix}

(32)

Combining (28) and (32), we get

\begin{matrix} ‖ v_{n + 1} {- z ‖}^{2} \\ \leq t_{n} ρ^{2} ‖ v_{n} {- z ‖}^{2} + (1 - t_{n}) {‖ u_{n} - z ‖}^{2} + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ \leq t_{n} ρ^{2} ‖ v_{n} {- z ‖}^{2} + (1 - t_{n}) [‖ v_{n} {- z ‖}^{2} - ρ_{n} (1 - ρ_{n}) ‖ T_{n} v_{n} - v_{n} ‖^{2} + ρ_{n}^{2} α_{n}^{2} {‖ θ (v_{n}) ‖}^{2} \\ - 2 α_{n} ρ_{n} (1 - ρ_{n}) ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - v_{n} ‖ + 2 ρ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖] \\ + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ = (t_{n} ρ^{2} + (1 - t_{n})) ‖ v_{n} {- z ‖}^{2} - ρ_{n} (1 - ρ_{n}) (1 - t_{n}) {‖ T_{n} v_{n} - v_{n} ‖}^{2} \\ + ρ_{n}^{2} (1 - t_{n}) α_{n}^{2} ‖ θ (v_{n}) ‖^{2} - 2 α_{n} λ_{n} (1 - ρ_{n}) ‖ θ (v_{n}) ‖ ‖ T_{n} v_{n} - v_{n} ‖ \\ + 2 λ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ \leq (1 - t_{n} (1 - ρ^{2})) ‖ v_{n} {- z ‖}^{2} - λ_{n} (1 - ρ_{n}) {‖ T_{n} v_{n} - v_{n} ‖}^{2} \\ + ρ_{n}^{2} (1 - t_{n}) α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + 2 λ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ + 2 t_{n} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ = (1 - {\bar{t}}_{n}) ‖ v_{n} {- z ‖}^{2} + ρ_{n} λ_{n} α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + 2 λ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ \\ + {\bar{t}}_{n} [\frac{2}{1 - ρ^{2}} ⟨ h (z) - z, v_{n + 1} - z ⟩ - \frac{λ_{n} γ_{n}}{t_{n} (1 - t_{n}) (1 - ρ^{2})} {‖ T_{n} v_{n} - v_{n} ‖}^{2}] \\ \leq (1 - {\bar{t}}_{n}) {‖ v_{n} - z ‖}^{2} + {\bar{t}}_{n} ζ_{n} + U_{n}, \end{matrix}

(33)

where

{\bar{t}}_{n} : = t_{n} (1 - ρ^{2})

,

\begin{matrix} ζ_{n} : = \frac{2}{1 - ρ^{2}} ⟨ h (z) - z, v_{n + 1} - z ⟩ - \frac{λ_{n} γ_{n}}{t_{n} (1 - t_{n}) (1 - ρ^{2})} {‖ T_{n} v_{n} - v_{n} ‖}^{2}, \\ U_{n} : = ρ_{n} λ_{n} α_{n}^{2} ‖ θ (v_{n}) ‖^{2} + 2 λ_{n} α_{n} ‖ θ (v_{n}) ‖ ‖ v_{n} - z ‖ . \end{matrix}

(34)

Obviously, we have

\begin{matrix} | ζ_{n} | \leq \frac{1}{1 - ρ^{2}} ‖ h (z) - z ‖ ‖ v_{n} - z ‖ < \infty, \end{matrix}

(35)

which implies that

lim {sup}_{n \to \infty} ζ_{n}

is a finite number. Thus, there exists a subsequence

{ζ_{n_{j}}} \subset {ζ_{n}}

such that

lim {sup}_{n \to \infty} ζ_{n} = {lim}_{j \to \infty} ζ_{n_{j}}

. In addition, without loss of generality, we may assume that

v_{n_{j}}

converges weakly to some

v^{*} \in H

as

j \to \infty

since

{v_{n}}

is bounded. Notice that

\begin{matrix} ‖ v_{n_{j} + 1} - v_{n_{j}} ‖ = ‖ t_{n_{j}} (h (v_{n_{j}}) - v_{n_{j}}) + λ_{n_{j}} (D_{n_{j}} v_{n_{j}} - v_{n_{j}}) ‖ \\ \leq t_{n_{j}} ‖ h (v_{n_{j}}) - v_{n_{j}} ‖ + λ_{n_{j}} ‖ D_{n_{j}} v_{n_{j}} - T_{n_{j}} v_{n_{j}} + T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖ \\ \leq t_{n_{j}} [‖ h (v_{n_{j}}) ‖ + ‖ v_{n_{j}} ‖] + λ_{n_{j}} α_{n_{j}} ‖ θ (v_{n_{j}}) ‖ + λ_{n_{j}} ‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖ \\ \to 0, a s j \to \infty . \end{matrix}

(36)

We conclude that

{v_{n_{j} + 1}}

also converges weakly to

v^{*}

. As a result,

{lim}_{n \to \infty} ⟨ h (z) - z, v_{n_{j} + 1} - z ⟩

exists. Hence, we have

\begin{matrix} lim {sup}_{n \to \infty} ζ_{n} = {lim}_{j \to \infty} ζ_{n_{j}} \\ = {lim}_{j \to \infty} [\frac{1}{1 - ρ^{2}} ⟨ h (z) - z, v_{n_{j} + 1} - z ⟩ - \frac{λ_{n_{j}} γ_{n_{j}}}{t_{n_{j}} (1 - t_{n_{j}}) (1 - ρ^{2})} {‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖}^{2}] \\ = \frac{1}{1 - ρ^{2}} ⟨ h (z) - z, v^{*} - z ⟩ - {lim}_{j \to \infty} \frac{λ_{n_{j}} γ_{n_{j}}}{t_{n_{j}} (1 - t_{n_{j}}) (1 - ρ^{2})} {‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖}^{2} . \end{matrix}

(37)

In light of the fact that

{\frac{γ_{n_{j}}}{1 - t_{n_{j}}}}

is bounded and

{inf}_{n} \frac{γ_{n_{j}}}{1 - t_{n_{j}}} > 0

, the sequence

{\frac{λ_{n_{j}}}{t_{n_{j}}} ‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖^{2}}

is bounded. Then, condition (iii) implies

\begin{matrix} {lim}_{j \to \infty} ‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖ = 0 . \end{matrix}

(38)

Set

T : = p r o x_{α g} (I - α \nabla f)

with

α = {inf}_{n} α_{n} > 0

and apply Lemma 1. We get

\begin{matrix} lim_{j \to \infty} ‖ T v_{n_{j}} - v_{n_{j}} ‖ \leq 2 lim_{j \to \infty} ‖ T_{n_{j}} v_{n_{j}} - v_{n_{j}} ‖ = 0 . \end{matrix}

(39)

Then, Lemma 4 guarantees that

ω_{w} (v_{n_{j}}) \subset S

.

Step 3.

{v_{n}}

converges strongly to

z \in S

.

Let us show that (33) satisfies the conditions of Lemma 5. From Step 2, it has

v_{n_{j} + 1} ⇀ v^{*} (\in S)

as

j \to \infty

. Therefore,

\begin{matrix} {lim sup}_{n \to \infty} ζ_{n} & \leq \frac{2}{1 - ρ^{2}} {lim sup}_{n \to \infty} ⟨ h (z) - z, v_{n + 1} - z ⟩ \\ = \frac{2}{1 - ρ^{2}} {lim}_{j \to \infty} ⟨ h (z) - z, v_{n_{j} + 1} - z ⟩ \\ = \frac{2}{1 - ρ^{2}} ⟨ h (z) - z, v^{*} - z ⟩ \\ \leq 0 . \end{matrix}

(40)

In addition, it is obvious that

\sum_{n = 0}^{\infty} U_{n} < \infty

(see (34)) since

{ρ_{n}}, {λ_{n}}, {α_{n}}

,

{‖ v_{n} - z ‖}

are bounded sequences and

\sum_{n = 0}^{\infty} ‖ θ (v_{n}) ‖ < \infty

. Finally, we apply Lemma 5 to (33) to conclude that

‖ v_{n} - z ‖ \to 0

, as

n \to \infty

. This ends the proof. □

3.2. The Strong Convergence of Algorithm (8)

Theorem 2.

Suppose that the solution set S of (1) is not empty, (22) and the following conditions hold:

(i): $0 < α : = {inf}_{n} α_{n} \leq α_{n} < \frac{2}{L}$ for all n;
(ii): ${lim}_{n \to \infty} t_{n} = 0$ and $\sum_{n = 0}^{\infty} t_{n} = \infty$ ;
(iii): ${lim}_{n \to \infty} \frac{t_{n}}{λ_{n}} = 0$ ;
(iv): $\sum_{n = 0}^{\infty} ‖ e_{n} ‖ < \infty$ .

Then, the sequence

{x_{n}}

generated by algorithms (8) converges strongly to a point

z \in S

.

Proof of Theorem 2.

Let

{x_{n}}, {v_{n}}

be generated by (8) and (21), respectively. Then,

{v_{n}}

converges strongly to a solution of problem (1) according to Theorem 1. Thus, we only need to prove that

‖ x_{n} - v_{n} ‖ \to 0

as

n \to \infty

.

We denote by

D_{n} : = p r o x_{α_{n} g} (I - α_{n} D \nabla f)

and

T_{n} : = p r o x_{α_{n} g} (I - α_{n} \nabla f)

, respectively.

T_{n}

is non-expansive according to Lemma 7. Then, we have

\begin{matrix} ‖ x_{n + 1} - v_{n + 1} ‖ \\ = ‖ t_{n} (h (x_{n}) - h (v_{n})) + γ_{n} (x_{n} - v_{n}) + λ_{n} [p r o x_{α_{n} g} (I - α_{n} D \nabla f) x_{n} \\ - p r o x_{α_{n} g} (I - α_{n} D \nabla f) v_{n}] + e_{n} ‖ \\ = ‖ t_{n} (h (x_{n}) - h (v_{n})) + γ_{n} (x_{n} - v_{n}) + λ_{n} ((D_{n} - T_{n}) (x_{n}) - (D_{n} - T_{n}) (v_{n}) \\ + T_{n} (x_{n}) - T_{n} (v_{n})) + e_{n} ‖ \\ \leq t_{n} ‖ h (x_{n}) - h (v_{n}) ‖ + γ_{n} ‖ x_{n} - v_{n} ‖ + λ_{n} (‖ (D_{n} - T_{n}) (x_{n}) ‖ + ‖ (D_{n} - T_{n}) (v_{n}) ‖ \\ + ‖ T_{n} (x_{n}) - T_{n} (v_{n}) ‖) + ‖ e_{n} ‖ \\ \leq t_{n} ρ ‖ x_{n} - v_{n} ‖ + γ_{n} ‖ x_{n} - v_{n} ‖ + λ_{n} [α_{n} (‖ θ (x_{n}) ‖ + ‖ θ (v_{n}) ‖) + ‖ x_{n} - v_{n} ‖] + ‖ e_{n} ‖ \\ \leq (1 - t_{n} (1 - ρ)) ‖ x_{n} - v_{n} ‖ + λ_{n} α_{n} (‖ θ (x_{n}) ‖ + ‖ θ (v_{n}) ‖) + ‖ e_{n} ‖ . \end{matrix}

(41)

Applying Lemma 5 to inequality (41), we get

‖ x_{n} - v_{n} ‖ \to 0

as

n \to \infty

. We then have completed the proof owing to Theorem 1. □

3.3. Bounded Perturbation Resilience

This subsection is devoted to verifying the bounded perturbation resilience property of algorithm (21) and showing the superiorized version of it.

Given a problem

Ψ

, let

A : H \to H

be a basic algorithm operator.

Definition 3.

[9] An algorithmic operator A is bounded perturbation resilient if the sequence

{v_{n}}

, generated by

v_{n + 1} = A v_{n}

with

v_{0} \in H

, converges to a solution to Ψ; then, any sequence

{x_{n}}

generated by

x_{n + 1} = A (x_{n} + β_{n} y_{n})

with any

x_{0} \in H

, also converges to a solution of Ψ, where

{y_{n}}_{n = 0}^{\infty} \subset H

is bounded, and

{β_{n}}_{n = 0}^{\infty} \subset R

are such that

β_{n} \geq 0

for all

n \geq 0

and

\sum_{n = 0}^{\infty} β_{n} < \infty

.

If we take algorithm (21) as the basic algorithm A, the following iteration is the bounded perturbation of it:

x_{n + 1} : = t_{n} h (x_{n} + β_{n} y_{n}) + γ_{n} (x_{n} + β_{n} y_{n}) + λ_{n} p r o x_{α_{n} g} (I - α_{n} D \nabla f) (x_{n} + β_{n} y_{n}), n \geq 0 .

(42)

We have the following result.

Theorem 3.

Let H be a real Hilbert space, and

h : H \to H

a ρ-contractive operator with

ρ \in (0, 1)

, f,

g \in Γ_{0} (H)

. Assume that the solution set S to (1) is nonempty and that f has Lipschitz continuous gradient

\nabla f

on H with the Lipschitz constant

L > 0

.

{β_{n}}

,

{y_{n}}

satisfy the conditions in Definition 3,

{t_{n}}

,

{γ_{n}}

,

{λ_{n}}

and

{α_{n}}

satisfy the conditions in Theorem 1, respectively. Then, any sequence

{x_{n}}

generated by (42) converges strongly to a point

x^{*}

in S. Thus, algorithm (21) is bounded perturbation resilient.

Proof of Theorem 3.

We can rewrite algorithm (42) as

x_{n + 1} = t_{n} h (x_{n}) + γ_{n} x_{n} + λ_{n} p r o x_{α_{n} g} (I - α_{n} D \nabla f) x_{n} + {\tilde{e}}_{n}

(43)

with

\begin{matrix} {\tilde{e}}_{n} : = t_{n} (h (x_{n} + β_{n} y_{n}) - h (x_{n})) + γ_{n} β_{n} y_{n} \\ + λ_{n} [p r o x_{α_{n} g} (I - α_{n} D \nabla f) (x_{n} + β_{n} y_{n}) - p r o x_{α_{n} g} (I - α_{n} D \nabla f) x_{n}], \end{matrix}

(44)

which is obviously the same form as (8) if we certify that

\sum_{n = 0}^{\infty} ‖ {\tilde{e}}_{n} ‖ < \infty

. In fact, we have

\begin{matrix} ‖ {\tilde{e}}_{n} ‖ & = ‖ t_{n} (h (x_{n} + β_{n} y_{n}) - h (x_{n})) + γ_{n} β_{n} y_{n} \\ + λ_{n} [p r o x_{α_{n} g} (I - α_{n} D \nabla f) (x_{n} + β_{n} y_{n}) - p r o x_{α_{n} g} (I - α_{n} D \nabla f) x_{n}] ‖ \\ = ‖ t_{n} (h (x_{n} + β_{n} y_{n}) - h (x_{n})) + γ_{n} β_{n} y_{n} \\ + λ_{n} [(D_{n} - T_{n}) (x_{n} + β_{n} y_{n}) - (D_{n} - T_{n}) x_{n} + T_{n} (x_{n} + β_{n} y_{n}) - T_{n} x_{n}] ‖ \\ \leq t_{n} ρ β_{n} ‖ y_{n} ‖ + γ_{n} β_{n} ‖ y_{n} ‖ \\ + λ_{n} [α_{n} (‖ θ (x_{n} + β_{n} y_{n}) ‖ + ‖ θ (x_{n}) ‖) + β_{n} ‖ y_{n} ‖] \\ = (1 - t_{n} (1 - ρ)) β_{n} ‖ y_{n} ‖ + λ_{n} α_{n} (‖ θ (x_{n} + β_{n} y_{n}) ‖ + ‖ θ (x_{n}) ‖), \end{matrix}

(45)

where

D_{n}

and

T_{n}

are defined by (26), respectively. Then, it is easy to conclude that

\sum_{n = 0}^{\infty} ‖ {\tilde{e}}_{n} ‖ < \infty

(46)

since

{β_{n}}, {‖ θ (x_{n} + β_{n} y_{n}) ‖}, {‖ θ (x_{n}) ‖}

are all summable. Hence, Theorem 2 guarantees that any sequence

{x_{n}}

generated by (42) converges strongly to a solution of (1). That is to say, algorithm (21) is bounded perturbation resilient. □

The superiorized version is equipped with an optimization criterion, which is usually a function

ϕ : H \to R

, with the convention that, for

x \in H

,

ϕ (x)

being smaller is considered superior. To ensure this, it needs a new concept, named nonascending direction for

ϕ

at x. A vector

v \in H

is called nonascending for

ϕ

at

x \in H

if

‖ v ‖ \leq 1

, and there exists a constant

δ > 0

such that, for all

λ \in [0, δ]

,

ϕ (x + λ v) \leq ϕ (x)

. Such v at least exists one, namely, zero vector. The superiorization method then provides us with an automatic way of turning the original iterative algorithm for solving problem (1) into an algorithm, for which the value of the objective function at each iteration is not larger than that under the original iterative algorithm. At the same time, the value of

ϕ

is smaller than it is under the original algorithm. Superiorization does this by assuming that there are a summable sequence

{β_{k}}

of positive real numbers and a bounded vector sequence

{v_{k}} \subset H

(Each

{v_{k}}

is a nonascending direction for

ϕ

at some

x \in H

, and

β_{k} v_{k}

, together with the original iterative point, generates a new iterative point), and further by depending on a I steering steps aimed at reducing the values of

ϕ

at these iterative points. In addition, it makes use of a logical variable called loop. In this paper, we choose the optimization criterion function as the objective function in problem (1). Then, the superiorized version of (21) is as specified below:

4. Numerical Experiments

In this section, we solve the

l^{1} - l^{2}

norm problem by two numerical examples to illustrate the performance of the proposed iterations. The concerned algorithms are Algorithm 1 (MPGAS), the bounded perturbation algorithm (42) (MPGAB) and basic algorithm (21) (MPGA). All of these experiments were done on a quad core Intel i7-8550U CPU @1.8 GHz with 16 GB DDR4 memory.

Algorithm 1: Superiorized Version of (21)

1: Given

x_{0}

2: set

k = 0, l = - 1

3: set

x_{k} = x_{0}

4: set Error = Constant

5: while Error

> ε

6: set

n = 0

7: set

x_{k}^{n} = x_{k}

8: while

n \leq I

9: set

y_{n}

to be a nonascending vector for

ϕ

at

x_{k}^{n}

10: set loop = true

11: while loop

12:

l = l + 1

13: set

β_{n} = c^{l}

14: set

z_{n} = x_{k}^{n} + β_{n} y_{n}

15: if

ϕ (z_{n}) \leq ϕ (x_{k}^{n})

and

Φ (z_{n}) \leq Φ (x_{k}^{n})

16: set

n = n + 1

17: set

x_{k}^{n} = z_{n}

18: set loop = false

19: end if

20: end while

21: end while

22: set

x_{k + 1} = t_{k} * h (x_{k}^{I}) + γ_{k} * x_{k}^{I} + (1 - t_{k} - γ_{k}) * p r o x_{α_{k} g} (I - α_{k} D \nabla f) (x_{k}^{I})

23: set Error =

‖ x_{k + 1} - x_{k} ‖

24: set

k = k + 1

.

4.1. The $l^{1} - l^{2}$ Norm Problem

Let

b_{k}, 1 \leq k \leq N

be an orthogonal basis of

R^{N}

,

μ_{k}, 1 \leq k \leq N

be strictly positive real numbers, let

A \in R^{M \times N} \ {0}

, and

d \in R^{M}

. The

l^{1} - l^{2}

problem has the following form:

\begin{matrix} min_{x \in R^{N}} \sum_{k = 1}^{N} μ_{k} (| x^{T} b_{k} |) + \frac{1}{2} {‖ A x - d ‖}_{2}^{2} . \end{matrix}

(47)

In signal recovery problems, d is the observed signal and the original signal x is known to have a sparse representation.

We take

f (x) = \frac{1}{2} {‖ A x - b ‖}_{2}^{2}, g (x) = \sum_{k = 1}^{N} μ_{k} (| x^{T} b_{k} |)

. Then,

\nabla f (x) = A^{T} (A x - b)

with Lipschtz constant

L = ‖ A^{T} A ‖

, where

A^{T}

refers to the transpose of A. The above

l^{1} - l^{2}

problem is a special case to problem (1).

In this case, the

k^{t h}

component of

p r o x_{α_{n} g} (x)

is

{[p r o x_{α_{n} g} (x)]}_{k} = \{\begin{matrix} x^{k} + α_{n} μ_{k}, x^{k} < - α_{n} μ_{k}, \\ 0, x^{k} \in [- α_{n} μ_{k}, α_{n} μ_{k}], \\ x^{k} - α_{n} μ_{k}, x^{k} > α_{n} μ_{k}, \end{matrix}

(48)

where

x = {(x^{1}, x^{2}, \dots, x^{N})}^{T} \in R^{N}

. Then, given

v_{0}, x_{0} \in R^{N}

, arbitrarily, sequences

{v_{n}}

, generated by (21), and

{x_{n}}

, generated by (42), can be rewritten as

\begin{matrix} v_{n + 1} = t_{n} h (v_{n}) + γ_{n} v_{n} + λ_{n} p r o x_{α_{n} g} (v_{n} - α_{n} D (v_{n}) A^{T} (A (v_{n}) - b)); \\ x_{n + 1} = t_{n} h (x_{n} + β_{n} y_{n}) + γ_{n} (x_{n} + β_{n} y_{n}) \end{matrix}

(49)

\begin{matrix} + λ_{n} p r o x_{α_{n} g} (x_{n} + β_{n} y_{n} - α_{n} D (x_{n} + β_{n} y_{n}) A^{T} (A (x_{n} + β_{n} y_{n}) - b)), \end{matrix}

(50)

respectively.

4.2. Numerical Examples

Example 1.

Let

H = R^{2}, μ_{1} = μ_{2} = 1

,

A = (\begin{matrix} 1 & 2 \\ 0 & 1 \end{matrix}), d = (\begin{matrix} 1 \\ 2 \end{matrix}) .

(51)

A straightforward calculation shows that the solution set

S = {{(0, 0.6)}^{T}}

to (47) and the minimum value of the objective function for (47) is

1.6

. We solve this problem with the algorithms proposed in this paper. The numerical results can be found in Table 1.

Table 1. Results for Example 1.

Suppose that the contraction

h (x) = \frac{1}{3} x

, and the diagonal scaling matrix

D (x_{n}) = d i a g {d_{i i} (x_{n})} = d i a g {(1 + \frac{1}{n^{2}})}

. We choose

t_{n} = \frac{1}{3 n}

,

γ_{n} = 0.01 + \frac{1}{3 k}

,

λ_{n} = 1 - t_{n} - γ_{n}

, and the step size sequence

α_{n} = \frac{n}{L (n + 1)}

. For algorithm (21) with bounded perturbations, we choose the bounded sequence

{y_{n}}

as

y_{n} = \{\begin{matrix} - \frac{x_{n}}{‖ x_{n} ‖_{1}}, & i f x_{n} \neq 0, \\ 0, & i f x_{n} = 0, \end{matrix}

(52)

the summable nonnegative real sequence

{β_{n}}

as

β_{n} = c^{n}

for some

c \in (0, 1)

. For the superiorized version of (21), we take the function ϕ as the objective function in problem (47), that is

\begin{matrix} ϕ (x) = \sum_{k = 1}^{2} (| x^{T} b_{k} |) + \frac{1}{2} {‖ A x - d ‖}_{2}^{2} . \end{matrix}

(53)

The iteration numbers (“Iter"), the values of

x_{n}

(“

x_{n}

”), the values of the objective function (“Obj”) are reported in Table 1 when the stopping criterion

E r r : = ‖ x_{n} - {(0, 0.6)}^{T} ‖_{2} < 10^{- 3}

(54)

is reached.

The following Figure 1 is for

x_{0} = 2 * r a n d (2, 1)

.

Figure 1. The number of iterations with Algorithm 1(MPGAS), algorithm (42)(MPGAB) and algorithm (21)(MPGA).

From Table 1 and Figure 1 above, we see that the superiorized version and the bounded perturbation algorithm of (21) arrived at the minimum and the unique minimum point by nine iterations while the original algorithm (21) took 47 iterations to attain the same minimum with the zero initial value. Similar results were also obtained with the initial value

x_{0}

from uniform distribution.

We now discuss a general case of problem (47) by the above-mentioned algorithms.

Example 2.

Let the system matrix

A \in R^{50 \times 200}

be stimulated by standard Gaussian distribution,

μ_{k} = 1, k = 1, 2, \dots, 200

. Let the vector

d \in R^{50}

be generated from a uniform distribution in the interval [−2,2]. Solve the optimal problem (47) with the above-mentioned algorithms.

We take the parameters in the algorithms as follows:

1.: Algorithm parameters:
The contraction $h (x) = x / 3$ . The diagonal scaling matrix

$D (x_{n}) = d i a g {d_{i i} (x_{n})} = d i a g {1 - \frac{1}{n^{2}}} .$

(55)

$t_{n} = \frac{1}{3 n}$ , $γ_{n} = 0.01 + \frac{1}{2 n}$ , then $λ_{n} = 1 - t_{n} - γ_{n}$ . The step size sequence $α_{n} = \frac{n}{L (n + 1)}$ .
2.: Algorithm parameters for the superiorized version:
The summable nonnegative real sequence ${β_{n}}$ : $β_{n} = c^{n}$ for some $c \in (0, 1)$ and $I = 10$ . We set ϕ as the objective function in problem (47).

The iteration numbers

“ I t e r ”

, the computing time in seconds (“

C P U (s)

”), the error’s values (“

E r r

”) are reported in Table 2 with a random initial guess

x_{0} = 2 * r a n d (200, 1)

when the stopping criterion

E r r : = ‖ x_{n + 1} - x_{n} ‖_{2} < ε

(56)

is reached, where ε is a given small positive constant.

Table 2. Results for Example 2 with

x_{0} = 2 * r a n d (200, 1)

.

We find from Table 2 that there is no increase in the execution time of the computer by running the superiorized version, MPGAS, of original algorithm (21). In contrast, compared to the algorithms MPGAB and MPGA, MPGAS even reduces the operation time to get a smaller objective function value under the same stop criterions and initial value

x_{0}

.

5. Conclusions

In this paper, we have proposed a proximal scaled gradient algorithm with multi-parameters and studied the strong convergence of it in a real Hilbert space for solving a composite optimization problem. We have investigated the bounded perturbation resilience and the superiorized version of it as well. The validity of the proposed algorithm and the comparison of the original iteration, the bounded perturbation form and the superiorized version of it were illustrated by numerical examples. The results and numerical examples in this paper are a new attempt or application of a newly developed superiorization method. It shows that this method works well to some degree for the proposed algorithm.

Author Contributions

All authors contributed equally and significantly to this paper. Conceptualization, Y.G.; Data curation, Y.G. and X.Z.; Formal analysis, Y.G. and X.Z.

Funding

This research was funded by the Fundamental Research Funds for the Central Universities (Grant No. 3122018L004) and China Scholarship Council (Grant No. 201807315013).

Conflicts of Interest

The authors declare no conflict of interest.

References

Censor, Y.; Davidi, R.; Herman, G.T. Perturbation resilience and superiorization of iterative algorithms. Inverse Probl. 2010, 26, 65008. [Google Scholar] [CrossRef] [PubMed]
Davidi, R.; Schulte, R.W.; Censor, Y.; Xing, L. Fast superiorization using a dual perturbation scheme for proton computed tomography. Trans. Am. Nucl. Soc. 2012, 106, 73–76. [Google Scholar]
Davidi, R.; Herman, G.T.; Censor, Y. Perturbation-resilient block-iterative projection methods with application to image reconstruction from projections. Int. Trans. Oper. Res. 2009, 16, 505–524. [Google Scholar] [CrossRef] [PubMed]
Nikazad, T.; Davidi, R.; Herman, G.T. Accelerated perturbation-resilient block-iterative projection methods with application to image reconstruction. Inverse Probl. 2012, 28, 035005. [Google Scholar] [CrossRef] [PubMed]
Censor, Y.; Chen, W.; Combettes, P.L.; Davidi, R.; Herman, G.T. On the effectiveness of projection methods for convex feasibility problems with linear inequality constraints. Comput. Optim. Appl. 2012, 51, 1065–1088. [Google Scholar] [CrossRef]
Censor, Y.; Zaslavski, A.J. Strict Fejér monotonicity by superiorization of feasibility-seeking projection methods. J. Optim. Theory Appl. 2015, 165, 172–187. [Google Scholar] [CrossRef]
Davidi, R.; Censor, Y.; Schulte, R.W.; Geneser, S.; Xing, L. Feasibility-seeking and superiorization algorithm applied to inverse treatment plannning in rediation therapy. Contemp. Math. 2015, 636, 83–92. [Google Scholar]
Censor, Y.; Zaslavaski, A.J. Convergence and perturbation resilience of dynamic string averageing projection methods. Comput. Optim. Appl. 2013, 54, 65–76. [Google Scholar] [CrossRef]
Censor, Y.; Davidi, R.; Herman, G.T.; Schulte, R.W.; Tetruashvili, L. Projected subgradient minimization versus superiorization. J. Optim. Theory Appl. 2014, 160, 730–747. [Google Scholar] [CrossRef]
Dong, Q.L.; Lu, Y.Y.; Yang, J. The extragradient algorithm with inertial effects for solving the variational inequality. Optimization 2016, 65, 2217–2226. [Google Scholar] [CrossRef]
Garduño, E.; Herman, G. Superiorization of the ML-EM algorithm. IEEE Trans. Nucl. Sci. 2014, 61, 162–172. [Google Scholar]
He, H.; Xu, H.K. Perturbation resilience and superiorization methodology of averaged mappings. Inverse Probl. 2017, 33, 040301. [Google Scholar] [CrossRef]
Jin, W.; Censor, Y.; Jiang, M. Bounded perturbation resilience of projected scaled gradient methods. J. Comput. Optim. Appl. 2016, 63, 365–392. [Google Scholar] [CrossRef]
Schrapp, M.J.; Herman, G.T. Data fusion in X-ray computed tomography using a superiorization approach. Rev. Sci. Instrum. 2014, 85, 055302. [Google Scholar] [CrossRef] [PubMed]
Guo, Y.N.; Cui, W. Strong convergence and bounded perturbation resilience of a modified proximal gradient algorithm. J. Inequal. Appl. 2018, 2018, 103. [Google Scholar] [CrossRef] [PubMed]
Zhu, J.H.; Penfold, S. Total variation superiorization in dualenergy CT reconstruction for proton therapy treatment planning. Inverse Probl. 2017, 33, 044013. [Google Scholar] [CrossRef]
Zibetti, M.V.W.; Lin, C.A.; Herman, G.T. Total variation superiorized conjugate gradient method for image reconstruction. Inverse Probl. 2018, 34, 034001. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Space; Dilcher, K., Taylor, K., Eds.; Springer: New York, NY, USA, 2011. [Google Scholar]
Xu, H.K. Properties and Iterative Methods for the Lasso and Its Variants. Chin. Ann. Math. 2014, 35, 501–518. [Google Scholar] [CrossRef]
Strand, O.N. Theory and methods related to the singular-function expansion and Landweber iteration for integral equations of the first kind. SIAM J. Numer. Anal. 1974, 11, 798–825. [Google Scholar] [CrossRef]
Piana, M.; Bertero, M. Projected Landweber method and preconditioning. Inverse Probl. 1997, 13, 441–463. [Google Scholar] [CrossRef]
Neto, E.S.; Helou, D.; Álvaro, R. Convergence results for scaled gradient algorithms in positron emission tomography. Inverse Probl. 2005, 21, 1905–1914. [Google Scholar] [CrossRef]
Guo, Y.N.; Cui, W.; Guo, Y.S. Perturbation resilience of proximal gradient algorithm for composite objectives. J. Nonlinear Sci. Appl. 2017, 10, 5566–5575. [Google Scholar] [CrossRef][Green Version]
Xu, H.K. Iterative methods for the split feasibility problem in infinite-dimensional Hilbert space. Inverse Probl. 2010, 26, 105018. [Google Scholar] [CrossRef]
Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
Marino, G.; Xu, H.K. Convergence of generalized proximal point algorithm. Commun. Pure Appl. Anal. 2004, 3, 791–808. [Google Scholar]
Xu, H.K. Iterative algorithms for nonlinear operators. J. Lond. Math. Soc. 2002, 66, 240–256. [Google Scholar] [CrossRef]
Xu, H.K. Error sensitivity for strongly convergent modifications of the proximal point algorithm. J. Optim. Theory Appl. 2015, 168, 901–916. [Google Scholar]

Figure 1. The number of iterations with Algorithm 1(MPGAS), algorithm (42)(MPGAB) and algorithm (21)(MPGA).

Table 1. Results for Example 1.

Methods	$x_{0} = {(0, 0)}^{T}$			$x_{0} = 2 * rand (2, 1)$
Methods	Iter	$x_{n}$	Obj	Iter	$x_{n}$	Obj
MPGAS	9	${[0.0000, 0.6000]}^{T}$	1.600	9	${[0.0000, 0.6000]}^{T}$	1.600
MPGAB	9	${[0.0000, 0.6000]}^{T}$	1.600	9	${[0.0000, 0.6001]}^{T}$	1.600
MPGA	47	${[0.0000, 0.599044]}^{T}$	1.600	47	${[0.0000, 0.599044]}^{T}$	1.600

Table 2. Results for Example 2 with

x_{0} = 2 * r a n d (200, 1)

.

Table 2. Results for Example 2 with

x_{0} = 2 * r a n d (200, 1)

.

Methods	$Err < 10^{- 4}$			$Err < 10^{- 6}$
Methods	Iter	$CPU (s)$	$Obj$	Iter	$CPU (s)$	$Obj$
MPGAS	367	4.265625	27.127	1652	3.343750	22.790
MPGAB	1154	5.812500	28.563	2354	7.593750	22.793
MPGA	2084	9.781250	32.386	2354	8.046875	22.793

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters

Abstract

1. Introduction

2. Preliminaries

3. The Convergence Analysis and the Superiorized Version

3.1. The Exact Form of Algorithm (8)

3.2. The Strong Convergence of Algorithm (8)

3.3. Bounded Perturbation Resilience

4. Numerical Experiments

4.1. The $l^{1} - l^{2}$ Norm Problem

4.2. Numerical Examples

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

Bounded Perturbation Resilience and Superiorization of Proximal Scaled Gradient Algorithm with Multi-Parameters

Abstract

1. Introduction

2. Preliminaries

3. The Convergence Analysis and the Superiorized Version

3.1. The Exact Form of Algorithm (8)

3.2. The Strong Convergence of Algorithm (8)

3.3. Bounded Perturbation Resilience

4. Numerical Experiments

4.1. The l 1 − l 2 Norm Problem

4.2. Numerical Examples

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics

4.1. The $l^{1} - l^{2}$ Norm Problem