Proximal Linearized Iteratively Reweighted Algorithms for Nonconvex and Nonsmooth Optimization Problem

Juyeb Yeo; Myeongmin Kang

doi:10.3390/axioms11050201

Abstract

The nonconvex and nonsmooth optimization problem has been attracting increasing attention in recent years in image processing and machine learning research. The algorithm-based reweighted step has been widely used in many applications. In this paper, we propose a new, extended version of the iterative convex majorization–minimization method (ICMM) for solving a nonconvex and nonsmooth minimization problem, which involves famous iterative reweighted methods. To prove the convergence of the proposed algorithm, we adopt the general unified framework based on the Kurdyka–Łojasiewicz inequality. Numerical experiments validate the effectiveness of the proposed algorithm compared to the existing methods.

Keywords:

iterative reweighted algorithm; linearization; nonconvex optimization; nonsmooth objective function; Kurdyka–Łojasiewicz property

MSC:

65K10; 65K05; 68U10

1. Introduction

In this paper, we consider the following nonconvex and nonsmooth optimization problem of a specific structure in a n-dimensional real vector space:

min_{x \in R^{n}} F (x) : = f (x) + p (x) + h (g (x)),

(1)

where

f : R^{n} \to R \cup {\infty}

is a proper, lower semicontinuous (l.s.c.), convex and continuously differentiable function which has the Lipschitz gradient with Lipschitz constant

L_{f}

,

p : R^{n} \to R \cup {\infty}

is a proper, l.s.c. and convex function,

g : R^{n} \to R^{m}

is a proper and l.s.c. function, and

h : Image (g) \to R

is continuously differentiable. Furthermore, we assume that the coordinate functions

g_{i}

of g are convex, and the function h has a strictly continuous gradient and is coordinate-wise nondecreasing, i.e.,

h (x) \leq h (x + λ e_{i})

, where

x, x + λ e_{i} \in Image (g)

and

λ > 0

with i-th standard basis vector

e_{i}

for

i = 1, \dots, m

. We also suppose that F is coercive, closed and definable in an o-minimal structure. Several nonconvex optimization problems in image processing or signal processing have an objective function of the form (1). For example, a nonconvex and nonsmooth minimization problem for image denoising

min_{u \in R^{n \times m}} \frac{α_{1}}{2} {∥ u - f ∥}_{2}^{2} + α_{2} {∥ u - b ∥}_{1} + \sum_{i} {log (1 + | \nabla u |}_{i})

has the form of the problem (1) whose objective function satisfies all assumptions. Here,

b \in R^{n \times m}

is an observed noisy image, u is a restored image, and

α_{1}, α_{2}

are positive parameters. The following nonconvex minimization for compressive sensing problem is also an example of the proposed problem (1):

min_{x \in R^{n}} \frac{1}{2 ρ} \sum_{i = 1}^{n} {log (1 + ρ | x |}_{i}^{2}) + \frac{β}{2} {∥ x ∥}_{2}^{2} + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2},

where

β, μ, ρ

are positive parameters, A is a

m \times n

matrix with

m ≪ n

, and b is an observed signal. As we will see in the numerical experiments, we will apply the proposed method to this application.

Minimizing the sum of a finite number of given functions is an important issue of mathematical optimization research. For the minimizing of convex functions, many efficient algorithms were proposed with convergence analysis, such as gradient based method [1,2], the iterative shrinkage thresholding algorithm [3], proximal point method [4] and alternating minimization algorithm. On the other hand, it is difficult to prove the global convergence of an algorithm to solve the nonconvex minimization problem. Nevertheless, several algorithms for solving nonconvex minimization problems have been developed. Extensions in the nonconvex setting of many first-order algorithms have been proposed, such as the gradient method, proximal point method [5] and iterative shrinkage thresholding algorithm [6] for nonconvex optimization. Recently, Attouch et al. [7] extended the alternating minimization algorithm by adding a proximal term to minimize the nonconvex function. The iteratively reweighted

ℓ_{1}

algorithm [8] was proposed for solving a nonconvex minimization problem in compressive sensing. The iteratively reweighed least square method [9] was also developed for the nonconvex

ℓ_{p}

norm-based model applied to the compressive sensing problem. Very recently, Ochs et al. [10] extended these algorithms in a framework of the iterative convex majorization–minimization method (ICMM) to solve nonsmooth and nonconvex optimization problems and gave the global convergence analysis.

The Kurdyka–Łojasiewicz (KL) inequality is key when proving the global convergence of algorithms for nonsmooth and nonconvex optimization problems. A function which satisfies the KL inequality is called a KL function. Smooth KL functions and nonsmooth KL functions were introduced in [11,12,13,14]. Almost objective functions of the minimization problem in image processing satisfy the KL inequality, and it is a very useful when we deal with nonconvex objective functions. Many methods [7,15,16] whose global convergence was given based on the KL inequality have been proposed. Recently, Ochs et al. [17] proposed an inertial proximal algorithm by combining forward–backward splitting with an inertial force. Attouch et al. [18] proposed a general framework for the global convergence of descent methods for minimizing the KL function. In this paper, we utilize this general framework for the convergence of the proposed method.

In general, a nonlinear optimization problem does not have a closed-form solution. Hence, iterative algorithms frequently are used to solve a nonlinear minimization problem. Several algorithms minimize alternatively a linear approximation of nonlinear differentiable objective function. This technique is called “linearization”. The linearization technique of continuously differentiable objective function was applied in many algorithms [3,19,20] to solve constrained or unconstrained optimization problems.

The ICMM [10] is a popular algorithm for solving the problem (1). In this article, we propose an extension of the ICMM to solve the nonconvex and nonsmooth minimization problem (1). The linearization of the convex differentiable function f is considered unlike the ICMM. This enables the proposed method to deal with more applications. Further details of applications are given in Section 3. Based on the general framework introduced in [18], we prove the global convergence of the proposed model. From numerical experiments, we demonstrate the superiority of the proposed method over the existing methods, which are presented in Section 4.

The rest of this paper is organized as follows. In Section 2, we present the mathematical preliminary for nonconvex optimization and introduce the ICMM. In Section 3, we propose an extended version of the ICMM, and the iterative reweighted algorithms, which are special examples of the ICMM, are also suggested. The global convergence of the proposed algorithm is proved. In Section 4, numerical experiments are provided for our method, with comparisons to state-of-the-art methods. Finally, Section 5 summarizes our work.

2. Background

In this section, we present the mathematical preliminary for our work, and the iterative convex majorization–minimization method [10] is introduced.

2.1. Mathematical Preliminary

In Section 2.1, we introduce basic mathematical concepts and properties. More details are given in [21,22].

The concept of the Lipschitz continuity of a function is important in mathematical optimization theory. In the problem (1), we consider a continuously differentiable function which is included in a class of functions, namely the function with the Lipschitz gradient. This class is denoted by

C^{1, 1}

. A continuously differentiable function

f : R^{n} \to R

has a Lipschitz gradient with constant L if there exists

L \in [0, \infty)

with

{∥ \nabla f (x) - \nabla f (y) ∥}_{2} \leq L {∥ x - y ∥}_{2} for all x, y \in R^{n} .

There is a property for a function with the Lipschitz gradient: If

f : R^{n} \to R

is a continuously differentiable function whose gradient is Lipschitz continuous with Lipschitz constant

L_{f}

, i.e.,

f \in C^{1, 1}

, then, for any

L \geq L_{f}

, the following inequality holds:

f (x) \leq f (y) + ⟨ x - y, \nabla f (y) ⟩ + \frac{L}{2} {∥ x - y ∥}_{2}^{2}, \forall x, y \in R^{n} .

(2)

Now we introduce the generalized subdifferentials for a nonsmooth and nonconvex function.

Definition 1.

For a function

f : R^{n} \to R \cup {\infty}

and a point

\bar{x} \in dom f

,

ω is a regular subgradient of f at $\bar{x}$ , denoted by $ω \in \hat{\partial} f (\bar{x})$ , if

$\underset{\binom{x \to \bar{x}}{x \neq \bar{x}}}{lim inf} \frac{1}{∥ x - \bar{x} ∥_{2}} (f (x) - f (\bar{x}) - ⟨ x - \bar{x}, ω ⟩) \geq 0 .$
v is a (limiting) subgradient of f at $\bar{x}$ , denoted by $v \in \partial f (\bar{x})$ , if there are sequences $x^{ν} \to \bar{x}$ , with $f (x^{ν}) \to f (\bar{x}), v^{ν} \to v, v^{ν} \in \hat{\partial} f (x^{ν})$ .

A trivial property of the subdifferential is that

\hat{\partial} f (x) \subseteq \partial f (x)

. Moreover, there are many important properties of the subdifferential. These properties will be used to prove the convergence of the proposed method.

Proposition 1 (subgradient properties).

Let

f : R^{n} \to R \cup {\infty}

be a function.

1.: If f is convex, the regular and limiting subdifferentials are same sets, which is the subdifferential in the convex analysis:

${ω | f (y) - f (x) \geq ⟨ ω, y - x ⟩, \forall y \in R^{n}} .$
2.: If f is continuously differentiable on a neighborhood of $\bar{x}$ , then $\partial f (\bar{x}) = {\nabla f (\bar{x})}$ .
3.: If g is a l.s.c. function and f is continuously differentiable on a neighborhood of $\bar{x}$ , then we can obtain the subdifferential of $f + g$ as follows:

$\hat{\partial} (f + g) (x) = \nabla f (x) + \hat{\partial} (g) (x), \partial (f + g) (x) = \nabla f (x) + \partial (g) (x) .$
4.: If a proper and l.s.c. function $f : R^{n} \to R$ has a local minimum at $\bar{x}$ , then $0 \in \hat{\partial} f (\bar{x}) \subseteq \partial f (\bar{x})$ . Furthermore, if f is convex, then this condition is also sufficient for a global minimum.

For obtain the global convergence of the proposed algorithm, the objective function in the problem (1) must be a Kurdyka–Łojasiewicz function.

Definition 2.

A function

f : R^{n} \to R \cup \{\infty\}

satisfies the Kurdyka–Łojasiewicz (KL) property at a point

x^{*} \in dom \partial f

if there exist

η \in (0, + \infty]

, neighborhood U of

x^{*}

, and a continuous and convex function

ϕ : [0, η) \to R_{+}

which satisfy

(i): $ϕ (0) = 0$ ;
(ii): ϕ is differentiable in $(0, η)$ ;
(iii): $ϕ (s) > 0$ for all $s \in (0, η)$ ;
(iv): For any $x \in U \cap {x \in R : f (x^{*}) < f (x) < f (x^{*} + η)}$ ,

$dist (0, \partial f (x)) ϕ^{'} (f (x) - f (x^{*})) \geq 1 .$

A function

F : R^{n} \to R \cup \{\infty\}

is called Kurdyka–Łojasiewicz (KL) function if F satisfies the KL property at every point in

dom \partial f

.

Although the KL property is a strong condition, many functions are KL functions. The semialgebraic functions [23] are typical examples of KL functions. The set of semialgebraic functions involves polynomials, indicator functions of semialgebraic sets [23], and the

ℓ_{2}

norm. The compositions, finite sums, and finite products of semialgebraic functions are also semialgebraic. However, log and exponential functions are not semialgebraic functions. Recently, a class of definable functions in the log-exp o-minimal structure [24,25] was proposed, which involves log and exponential functions, and all semialgebraic functions. Moreover, it is proved that the functions in this class are KL functions. More details of the log-exp o-minimal structure are given in [24,25].

Lastly, we recall the general framework for an iterative algorithm when the objective function in an unconstrained minimization problem is a KL function. We consider a nonconvex unconstrained minimization problem:

min_{x \in R^{n}} F (x),

(3)

where F is a l.s.c. and proper function. Attouch et al. [18] suggested a framework for the convergence of an iterative method to solve the nonconvex minimization problem (3). They proved that a given algorithm converges when the following three conditions hold: Let

{x^{k}}_{k \in N}

be a sequence generated by a given iterative algorithm.

Hypothesis 1 (sufficient decrease condition).

There exists a positive value

a > 0

such that for each

k \in N

,

F (x^{k + 1}) + a {∥ x^{k + 1} - x^{k} ∥}_{2}^{2} \leq F (x^{k}) .

Hypothesis 2 (relative error condition).

For each

k \in N

, there exists a sequence

w^{k + 1} \in \partial F (x^{k + 1})

such that

∥ w^{k + 1} ∥_{2} \leq b {∥ x^{k + 1} - x^{k} ∥}_{2},

where b is a fixed positive constant.

Hypothesis 3 (continuity condition).

There exists a subsequence

{x^{k_{j}}}_{j \in N}

and

\tilde{x}

such that

x^{k_{j}} \to \tilde{x} a n d F (x^{k_{j}}) \to F (\tilde{x}), a s j \to \infty .

The following theorem is the convergence result, and the proof is given in ([18], Theorem 2.9).

Theorem 1.

Let

F : R^{n} \to R \cup {\infty}

be a proper l.s.c. function. We consider a sequence

{x^{k}}_{k \in N}

that satisfies Hypotheses 1–3. If F has the KL property at the cluster point

\tilde{x}

specified in Hypothesis 3, then the sequence

{x^{k}}_{k \in N}

converges to

\bar{x} = \tilde{x}

as k goes to infinity, and

\bar{x}

is a critical point of F. Moreover, the sequence

{x^{k}}_{k \in N}

has a finite length, i.e.,

\sum_{k = 0}^{\infty} {∥ x^{k + 1} - x^{k} ∥}_{2} < \infty

2.2. Iterative Convex Majorization–Minimization Method

This section recalls the iterative convex majorization–minimization method (ICMM) [10] for solving the following nonconvex minimization problem:

min_{x \in R^{n}} G (x) : = p (x) + h (g (x)),

(4)

where

G : R^{n} \to R \cup {\infty}

is proper, l.s.c., and bounded below. The further assumptions are required;

p : R^{n} \to R \cup {\infty}

is proper, l.s.c., and convex.

g : R^{n} \to X_{2}

is coordinatewise convex.

h : g (X_{2}) \to R \cup {\infty}

is coordinatewise nondecreasing.

The ICMM to solve this nonconvex problem (4) is a famous iterative algorithm. This algorithm is an adopted majorization–minimization technique that chooses a suitable family of convex surrogate functions called majorizers, and it minimizes a convex majorizer function instead of the objective function G at each iteration. The specific algorithm is summarized in Algorithm 1.

Algorithm 1 Iterative Convex Majorization–Minimization Method (ICMM).

Initialization Choose a starting point

x^{0} \in R^{n}

with

G (x^{0}) < \infty

and define a suitable family of convex surrogate functions

{(h^{x})}_{x \in R^{n}}

such that for all

x \in R^{n}

,

h^{x} \in H (x)

holds, where

H (x) : = \{q : I m (g) \to R \cup {\infty} ∣ \begin{matrix} q is proper, convex, \\ q is nondecreasing on I m (g), \\ q (g (x)) = h (g (x)), \\ q (y) \geq h (y), for y \in I m (g) \end{matrix}\} .

repeat

x^{k + 1} = \underset{x \in R^{n}}{arg min} p (x) + h^{x^{k}} (g (x))

until The algorithm satisfies a stopping condition

The convergence of Algorithm 1 was studied in [10]. Additional conditions are required for the global convergence of the ICMM. First, h should have a locally Lipschitz continuous gradient on a compact set B containing all

x^{k}

, and majorizers

h^{x}

should have globally Lipschitz continuous gradients on B for all

x \in R^{n}

, with a uniform Lipschitz constant. Another condition is that

p + h^{x} \circ g

should be strongly convex, which is a stronger condition. To show the global convergence of the ICMM, it was proved that the objective function G satisfies the three Hypotheses 1–3 for any KL function G, which were introduced in the previous section. Then, Theorem 1 is applied. As examples of the ICMM, several iteratively reweighted convex algorithms were introduced in [10], such as the iterative reweighted

ℓ_{1}

algorithm, iteratively reweighted tight convex algorithm, iteratively reweighted Huber algorithm and iteratively reweighted least squares algorithm.

3. Proposed Algorithm

3.1. Proximal Linearized Iteratively Convex Majorization–Minimization Method

In this section, we propose a novel algorithm for solving the nonconvex and nonsmooth minimization problem (1). The ICMM in Algorithm 1, introduced in the previous section, can be applied to the problem (1), since

f (x) + p (x)

is also a proper, l.s.c, convex function. This yields the following iteration:

x^{k + 1} = \underset{x \in R^{n}}{arg min} f (x) + p (x) + h^{x^{k}} (g (x)) .

(5)

In many cases, this problem does not have a closed form solution and we cannot compute the exact solution of (5). Since the convergence of the ICMM is only guaranteed under the assumption that the subproblem (5) is solved exactly, it is not applicable to many problems. To overcome this fatal drawback, an inexact stopping criterion of the subproblem (5) was also proposed in [26]. Specifically, solving the subproblem (5) requires an inner algorithm such as the iterative shrinkage thresholding algorithm [27], and fast iterative shrinkage thresholding algorithm [3]. However, it is often time consuming for solving a large-scale problem. Therefore, we extend the ICMM by adopting a linearization technique of f. Here, we consider the linear approximation of f at the k-th iterate

x^{k}

with an additional proximal term instead of

f (x)

:

f (x) \approx f (x^{k}) + ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + \frac{α}{2} {∥ x - x^{k} ∥}^{2} .

Utilizing this technique, we propose the following minimization of a convex surrogate function:

x^{k + 1} = \underset{x \in R^{n}}{arg min} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x^{k}) + h^{x^{k}} (g (x)) + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

(6)

where

α > \frac{L_{f}}{2}

is a proximal parameter. The proposed algorithm is summarized in Algorithm 2 and it is called as proximal linearized ICMM (PL-ICMM).

Algorithm 2 Proximal Linearized Iteratively Convex Majorization–Minimization Method (PL-ICMM).

Conditions

f is differentiable and has Lipschitz gradient with Lipschitz constant $L_{f}$ .
p is proper, convex, and l.s.c.
$g : R^{n} \to R^{m}$ is l.s.c. and convex.
$h : Im (g) \to R$ is differentiable.
F is bounded below.

Initialization Choose a starting point

x^{0} \in R^{n}

with

F (x^{0}) < \infty

and define a suitable family of convex surrogate functions

{(h^{x})}_{x \in R^{n}}

such that for all

x \in R^{n}

,

h^{x} \in H (x)

holds, where

H (x) : = \{q : I m (g) \to R \cup {\infty} ∣ \begin{matrix} q is proper, convex, \\ q is nondecreasing on I m (g), \\ q (g (x)) = h (g (x)), \\ q (y) \geq h (y), \forall y \in I m (g) \end{matrix}\} .

repeat

Solve

x^{k + 1} = \underset{x \in R^{n}}{arg min} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + h^{x^{k}} (g (x)) + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

until The algorithm satisfies a stopping condition.

The proposed method can be regarded as a generalized version of the ICMM and is more applicable than the ICMM. For examples, the PL-ICMM can be directly applied to the following minimizations for a regression problem, while the ICMM cannot be applied:

min_{x} \frac{1}{2} {∥ b - A x ∥}_{2}^{2} + μ \sum_{i = 1}^{N} (| x_{i} {| + ϵ)}^{q},

and

min_{x} \frac{1}{2} {∥ b - A x ∥}_{2}^{2} + μ \sum_{i = 1}^{n} log (1 + | x_{i} |),

where

A \in R^{m \times n}

,

b \in R^{m}

,

ϵ, μ > 0

, and

0 < q < 1

.

Many iteratively reweighted algorithms [28,29,30] are examples of the ICMM, which use a weighted function appropriately to serve a convex majorizer in the ICMM. A convex majorizer

h^{x^{k}}

has the form

h^{x^{k}} (x) = ⟨ w^{x^{k}}, \bar{h} (x) ⟩

for some given

\bar{h} (x)

. The weight

w^{x^{k}}

must be selected such that the function

h^{x^{k}}

satisfies the conditions of the class of convex majorizers. Similar to the ICMM, the PL-ICMM is a general algorithm which includes lots of iteratively reweighted algorithms. We also introduce proximal linearized versions of iteratively reweighted algorithms.

Algorithm 3 Proximal linearized iteratively reweighted

ℓ_{1}

algorithm (PL-IRL1).

Conditions

f is differentiable and has Lipschitz gradient with Lipschitz constant $L_{f}$ .
p is proper, convex, and l.s.c.
$g : R^{n} \to R^{m}$ is l.s.c. and convex.
$h : Im (g) \to R$ is differentiable, concave.
F is bounded below.

Initialization Choose a starting point

x^{0} \in R^{n}

with

F (x^{0}) < \infty

.

repeat

w^{x^{k}} = \nabla h (g (x^{k}))

.

Solve

x^{k + 1} = \underset{x \in R^{N}}{arg min} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + ⟨ w^{x^{k}}, g (x) ⟩ + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

until The algorithm satisfies a stopping condition

First, we propose proximal linearized iteratively reweighted

ℓ_{1}

algorithm (PL-IRL1). We further assume that the function h is concave on

I m (g)

. For a concave function h, we can define the limiting supergradient of h as an element of

- \partial (- h)

. The set of all limiting supergradients of h is denoted by

\bar{\partial} (h)

. Since

- h

is convex and differentiable,

\bar{\partial} (h) (x)

on

I m (g)

has only one element

\nabla h (x)

from the property 2 in Proposition 1. The PI-IRL1 considers the majorizer

h^{x^{k}} (y) = ⟨ \nabla h (g (x^{k})), y - g (x^{k}) ⟩

and minimizes iteratively the following convex problem:

min_{x} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + ⟨ \nabla h (g (x^{k})), g (x) - g (x^{k}) ⟩ + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

In ([10], Proposition 2), it was proved that the majorizer

⟨ \nabla h (g (x^{k})), y - g (x^{k}) ⟩ + h (g {(x)}^{k})

is the optimal majorizer of

h \circ g

at

x^{k}

. The PL-IRL1 is summarized in Algorithm 3.

Remark 1.

We assume that p is also continuously differentiable and has a Lipschitz continuous gradient with Lipschitz constant

L_{p}

, and h is additively separable, i.e.,

h (y_{1}, \dots, y_{m}) = h_{1} (y_{1}) + \dots + h_{m} (y_{m}) .

Then, the proximal iteratively reweighted algorithm [31] can be applied to the problem (1), which is the following iterative algorithm:

x^{k + 1} = \underset{x \in R^{n}}{arg min} ⟨ \nabla f (x^{k}) + \nabla p (x^{k}), x - x^{k} ⟩ + ⟨ w^{x^{k}}, g (x) ⟩ + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2},

where

w^{x^{k}} = \nabla h (g (x^{k}))

, and

α > \frac{L_{f} + L_{p}}{2}

is a parameter. Hence, our PL-IRL1 can be also regraded as an extension of the proximal iteratively reweighted algorithm.

Due to the optimality of the iterative reweighted

ℓ_{1}

algorithm, it has been used frequently to solve many applications. However, it cannot be applied to solve the problem (1) when h is not concave on

I m (g)

, such as

h (| y |) = {log (1 + | y |}^{2})

. For nonconcave function h, the iterative reweighted least square algorithm (IRLS) is well known. For a proximal linearized version of IRLS, we need more assumptions that h is additively separable on

R_{+}^{n}

and each separable function

h_{j} (y_{j})

is convex in

[0, r_{j}]

and concave in

[r_{j}, + \infty)

for some

r_{j} > 0

. The IRLS makes use of a convex majorizer

h^{x^{k}} (y) = ⟨ w^{x^{k}}, \frac{1}{2} y^{2} ⟩,

where the weights

w^{x^{k}}

are given as

w_{i}^{x^{k}} = \frac{{(\nabla h (y))}_{i}}{y_{i}}

and the the square in

y^{2}

means the coordinatewise square operation. This yields the following iterative algorithm:

min_{x} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + ⟨ w^{x^{k}}, \frac{1}{2} g {(x)}^{2} ⟩ + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

(7)

The specific algorithm is given in Algorithm 4.

Algorithm 4 Proximal Linearized Iterative Reweighted Least Square Algorithm (PL-IRLS).

Conditions

f is differentiable and has Lipschitz gradient with Lipschitz constant $L_{f}$ .
p be proper, convex, and l.s.c.
$g : R^{n} \to R^{m}$ be l.s.c. and convex.
h is additively separable on $R_{+}^{n}$ , i.e., $h (x_{1}, \dots, x_{m}) = h_{1} (x_{1}) + \dots + h_{m} (x_{m})$ , and each $h_{j}$ is convex in $[0, r_{j}]$ and concave in $[r_{j}, \infty)$ for some $r_{j} > 0$ .

Initialization Choose a starting point

x^{0} \in R^{n}

with

F (x^{0}) < \infty

.

repeat

y = g (x^{k})

.

w_{i}^{x^{k}} = \frac{{(\nabla h (y))}_{i}}{y_{i}}

,

i = 1, \dots, m

.

Solve

x^{k + 1} = \underset{x \in R^{n}}{arg min} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + ⟨w^{x^{k}}, \frac{1}{2} {(g (x))}^{2}⟩ + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2} .

until The algorithm satisfies a stopping condition

The majorization property of PL-IRLS can be also obtained from ([10], Proposition 23), when

h (y)

has the form

h (y) = log (1 + ρ y^{2})

for any

ρ > 0

.

3.2. Convergence Analysis of the PL-ICMM

First, we prove the partial convergence of the PL-ICMM. From the following proposition, the sequence

{x^{k}}

generated by Algorithm 2 induces the convergence of the sequence

{F (x^{k})}

of objective function value at

x^{k}

.

Proposition 2.

Let

{x^{k}}_{k \in N}

be generated by Algorithm 2, and let

h^{x} \in H (x)

for all

x \in R^{n}

. If

α > L_{f}

, then the sequence

{F (x^{k})}_{k \in N}

monotonically decreases and hence converges.

Proof.

Let F be bounded below by L. We can obtain

\begin{matrix} L & \leq F (x^{k + 1}) = f (x^{k + 1}) + p (x^{k + 1}) + h (g (x^{k + 1})), \\ \leq f (x^{k}) + ⟨ \nabla f (x^{k}), x^{k + 1} - x^{k} ⟩ + \frac{α}{2} {∥ x^{k + 1} - x^{k} ∥}_{2}^{2} + p (x^{k + 1}) + h^{x^{k}} (g (x^{k + 1})), \\ \leq f (x^{k}) + p (x^{k}) + h^{x^{k}} (g (x^{k})), \\ = f (x^{k}) + p (x^{k}) + h (g (x^{k})) = F (x^{k}), \end{matrix}

where the second inequality is obtained from the property (2) and the majorization property of

h^{{x^{k}}}

, the third inequality is obtained from the optimality of the subproblem in Algorithm 2 and the last inequality is obtained from the majorization property of

h^{{x^{k}}}

. The sequence

{F (x^{k})}_{k \in N}

decreases and is bounded from below. Hence, it converges. ☐

We can also obtain the local convergence of the PL-ICMM, given in the following proposition.

Proposition 3.

Let F be coercive. Then the sequence

{x^{k}}_{k \in N}

is bounded and has at least one accumulation point.

Proof.

By Proposition 2, the sequence

{F (x^{k})}

is monotonically decreasing, and therefore the sequence

{x^{k}}

is contained in the level set

L (x^{0}) : = \{x \in R^{n} : F (x) \leq F (x^{0})\} .

From the coercivity of F, we conclude the boundedness of the set

L (x^{0})

. By the Bolzano–Weierstrass theorem,

{x^{k}}_{k \in N}

has at least one accumulation point. ☐

Now, we prove the global convergence of the proposed algorithm. We utilize the general framework for the convergence of an iterative method, introduced in [18]. Specifically, we prove the three Hypothesis 1–3 with

F (x) = f (x) + p (x) + h (g (x))

in the problem (1) and the sequence

{x^{k}}

generated by the PL-ICMM. As a result, we obtain the global convergence of the PL-ICMM by using Theorem 1. We further assume the following:

h has a locally Lipschitz gradient on a compact set containing the sequence ${g (x^{k})}$ , and majorizers $h^{x}$ have a Lipschitz gradient on a compact set containing the sequence ${g (x^{k})}$ with a common Lipschitz constant.

To prove the global convergence of the PL-ICMM, we need the following lemma which shows the subdifferential calculus about the composition and summation of two functions. The proof of this lemma is provided in ([10], Lemma 1).

Lemma 1.

Under the given conditions for PL-ICMM, the following holds for all

x^{*} \in R^{n}

:

1.: For all $x \in R^{n}$ ,

$\begin{matrix} \partial (h^{x^{*}} \circ g) (x) = \partial ⟨ \nabla h^{x^{*}}, g ⟩ (x), \\ \partial (h \circ g) (x) = \partial ⟨ \nabla h, g ⟩ (x) . \end{matrix}$
2.: For all $x \in R^{n}$ ,

$\begin{matrix} \partial (p + h^{x^{*}} \circ g) (x) = \partial p (x) + \partial (h^{x^{*}} \circ g) (x), \\ \partial (p + h \circ g) (x) = \partial p (x) + \partial (h \circ g) (x) . \end{matrix}$

First we prove the sufficient decreasing condition for the proposed method in the following proposition.

Proposition 4 (Sufficient decreasing conditions).

Let

α > \frac{L_{f}}{2}

. There exists

a > 0

such that for all

k \in N

F (x^{k + 1}) \leq F (x^{k}) - \frac{a}{2} {∥ x^{k} - x^{k + 1} ∥}_{2}^{2} .

Proof.

From the property (2) of

C^{1, 1}

functions, we have

f (x^{k + 1}) - f (x^{k}) \leq ⟨ x^{k + 1} - x^{k}, \nabla f (x^{k}) ⟩ + \frac{L_{f}}{2} {∥ x^{k + 1} - x^{k} ∥}_{2}^{2} .

(8)

By the definition of the subdifferential of a convex function, we can obtain

\begin{matrix} p (x^{k + 1}) - p (x^{k}) & \leq & ⟨ ξ_{p}, x^{k + 1} - x^{k} ⟩, \end{matrix}

(9)

\begin{matrix} h^{x^{k}} (g (x^{k + 1})) - h^{x^{k}} (g (x^{k})) & \leq & ⟨ ξ_{h}^{k}, x^{k + 1} - x^{k} ⟩, \end{matrix}

(10)

where

ξ_{p} \in \partial p (x^{k + 1})

and

ξ_{h}^{k} \in \partial (h^{x^{k}} \circ g (x^{k + 1}))

are subderivatives of p and

h^{x^{k}} \circ g

at

x^{k + 1}

, respectively. Since

x^{k + 1}

is the minimizer of the problem

arg min_{x} ⟨ \nabla f (x^{k}), x - x^{k} ⟩ + p (x) + h^{x^{k}} \circ g (x) + \frac{α}{2} {∥ x - x^{k} ∥}_{2}^{2},

there exist the subderivatives

ξ_{p} \in \partial p (x^{k + 1})

and

ξ_{h}^{k} \in \partial (h^{x^{k}} \circ g (x^{k + 1}))

by Lemma 1 such that

\nabla f (x^{k}) + ξ_{p} + ξ_{h}^{k} + α (x^{k + 1} - x^{k}) = 0 .

(11)

From the facts that

h \circ g (x^{k + 1}) \leq h^{x^{k}} \circ g (x^{k + 1})

and

h \circ g (x^{k}) = h^{x^{k}} \circ g (x^{k})

, we obtain that for all

k \in N

,

\begin{matrix} F (x^{k + 1}) - F (x^{k}) & = & (f (x^{k + 1}) + p (x^{k + 1}) + h (g (x^{k + 1})) \\ - (f (x^{k}) + p (x^{k}) + h (g (x^{k})), \\ \leq & (f (x^{k + 1}) + p (x^{k + 1}) + h^{x^{k}} (g (x^{k + 1})) \\ - (f (x^{k}) + p (x^{k}) + h^{x^{k}} (g (x^{k})), \\ \leq & ⟨ ξ_{p}, x^{k + 1} - x^{k} ⟩ + ⟨ ξ_{h}^{k}, x^{k + 1} - x^{k} ⟩ \\ + ⟨ x^{k + 1} - x^{k}, \nabla f (x^{k}) ⟩ + \frac{L_{f}}{2} {∥ x^{k + 1} - x^{k} ∥}_{2}^{2}, \\ = & ⟨ ξ_{p} + ξ_{h}^{k} + \nabla f (x^{k}), x^{k + 1} - x^{k} ⟩ + \frac{L_{f}}{2} {∥ x^{k + 1} - x^{k} ∥}_{2}^{2}, \\ = & - \frac{2 α - L_{f}}{2} {∥ x^{k + 1} - x^{k} ∥}_{2}^{2}, \end{matrix}

where the third inequality is obtained from Equations (8)–(10) and the last equality is obtained from the property (11). Let

a = \frac{2 α - L_{f}}{2}

. Since

α > \frac{L_{f}}{2}

,

a > 0

. Therefore, we can obtain the following result:

F (x^{k + 1}) \leq F (x^{k}) - \frac{a}{2} {∥ x^{k} - x^{k + 1} ∥}_{2}^{2} .

☐

The relative error condition (Hypothesis 2) for the PL-ICMM is proved in Proposition 5.

Proposition 5 (relative error condition).

For all

k \in N

, there exists a positive constant

C > 0

(independent of k) and

ξ^{k + 1} \in \partial F (x^{k + 1})

such that

{∥ ξ^{k + 1} ∥}_{2} \leq C {∥ x^{k + 1} - x^{k} ∥}_{2}

Proof.

By the optimality of the subproblem of PL-ICMM and Lemma 1, there exist

ξ_{p} \in \partial p (x^{k + 1})

and

ξ_{h}^{k} \in \partial (h^{x^{k}} \circ g (x^{k + 1}))

, satisfying

\nabla f (x^{k}) + ξ_{p} + ξ_{h}^{k} + α (x^{k + 1} - x^{k}) = 0 .

Let

y^{k} = \nabla h^{x^{k}} (g (x^{k + 1}))

. Then, by Lemma 1 and the property of the subdifferential,

\begin{matrix} \partial (h^{x^{k}} \circ g) (x^{k + 1}) = \partial ⟨ y^{k}, g ⟩ (x^{k + 1}) = \sum_{i} \partial (y_{i}^{k} g_{i}) (x^{k + 1}) = \sum_{i} y_{i}^{k} \partial g_{i} (x^{k + 1}) . \end{matrix}

We can decompose

ξ_{h}^{k} = \partial (h^{x^{k}} \circ g) (x^{k + 1}) = \sum_{i} y_{i}^{k} η_{i}

for some

η_{i} \in \partial g_{i} (x^{k + 1})

. Similarly, for any subderivative

ξ_{h} \in \partial (h \circ g) (x^{k + 1})

,

ξ_{h}

can also be decomposed as

ξ_{h} = \partial (h \circ g) (x^{k + 1}) = \partial ⟨ y, g ⟩ (x^{k + 1}) = \sum_{i} \partial (y_{i} g_{i}) (x^{k + 1}) = \sum_{i} y_{i} \partial g_{i} (x^{k + 1}),

where

y = \nabla h (g (x^{k + 1}))

. Hence, it can be obtained from Lemma 1 that

ξ^{k + 1} : = \nabla f (x^{k + 1}) + ξ_{p} + ξ_{h}

is a subderivative of F at

x^{k + 1}

, i.e.,

ξ^{k + 1} \in \partial F (x^{k + 1})

, where

ξ_{h} : = \sum_{i} y_{i} η_{i} \in \partial (h \circ g) (x^{k + 1})

.

From Proposition 4 and the coercivity of F, the sequence

{x^{k}}

is bounded. Hence, we can find a compact set containing this sequence in

R^{n}

. The convexity of g implies the Lipschitz continuity on a compact, convex subset of

R^{n}

involving

{x^{k}}

for all k. From the further assumption,

\nabla h

and

\nabla h^{x}

are Lipschitz continuous on a compact, convex subset B of

R^{m}

containing

g (x^{k})

for all k. Let

L_{1}

and

L_{2}

be the Lipschitz constants of g and

\nabla h

, respectively. The common Lipschitz constant of

\nabla h^{x}

on B is set to be

L_{h}

. By the property of the local Lipschitz continuity of g, we can obtain

{∥ ξ_{h} - ξ_{h}^{k} ∥}_{2} = {∥ \sum_{i} (y_{i} - y_{i}^{k}) η_{i} ∥}_{2} = {∥ \sum_{i} {(y - y^{k})}_{i} η_{i} ∥}_{2} \leq L_{1} {∥ y - y_{i} ∥}_{2} .

(12)

Since

\nabla f (x^{k}) + ξ_{p} + ξ_{h}^{k} + α (x^{k + 1} - x^{k}) = 0

, the following identities hold:

\begin{matrix} {∥ ξ^{k + 1} ∥}_{2} & = & {∥ \nabla f (x^{k + 1}) + ξ_{p} + ξ_{h} ∥}_{2}, \\ = & {∥ \nabla f (x^{k + 1}) + ξ_{p} + ξ_{h} - (\nabla f (x^{k}) + ξ_{p} + ξ_{h}^{k} + α (x^{k + 1} - x^{k})) ∥}_{2} . \end{matrix}

(13)

From the fact

\nabla h (g (x^{k})) = \nabla h^{x^{k}} (g (x^{k}))

and the Lipschitz continuity of

\nabla h

and

\nabla h^{x^{k}}

, we have

\begin{matrix} {∥ ξ^{k + 1} ∥}_{2} & \leq {∥ ξ_{h} - ξ_{h}^{k} ∥}_{2} + {∥ \nabla f (x^{k + 1}) - \nabla f (x^{k}) ∥}_{2} + α {∥ x^{k + 1} - x^{k} ∥}_{2}, \\ \leq L_{1} {∥ y - y^{k} ∥}_{2} + {∥ \nabla f (x^{k + 1}) - \nabla f (x^{k}) ∥}_{2} + α {∥ x^{k + 1} - x^{k} ∥}_{2}, \\ = L_{1} {∥ y - \nabla h (g (x^{k})) + \nabla h^{x^{k}} (g (x^{k})) - y^{k} ∥}_{2} + {∥ \nabla f (x^{k + 1}) - \nabla f (x^{k}) ∥}_{2} \\ + α ∥ x^{k + 1} - x^{k} ∥_{2}, \\ \leq (L_{h} + L_{2}) L_{1} {∥ g (x^{k + 1}) - g (x^{k}) ∥}_{2} + L_{f} ∥ x^{k + 1} - x^{k} ∥_{2} + α {∥ x^{k + 1} - x^{k} ∥}_{2}, \\ \leq ((L_{h} + L_{2}) L_{1}^{2} + L_{f} + α) {∥ x^{k + 1} - x^{k} ∥}_{2}, \end{matrix}

where the first inequality is obtained from the Equation (13), and the second inequality is obtained from the Equation (12). Letting

C : = (L_{h} + L_{2}) L_{1}^{2} + L_{f} + α

, the final result can be obtained. ☐

Proposition 6 (continuity condition).

There exists a convergent subsequence

{x^{k_{j}}}

of

{x^{k}}

and its limit

\bar{x}

satisfying

lim_{j \to \infty} x^{k_{j}} = \bar{x}, F (x^{k_{j}}) \to F (\bar{x}) a s j \to \infty .

Proof.

The boundedness of

{x^{k}}

implies the existence of a convergent subsequence. Let

{x^{k_{j}}}

be a convergent subsequence of

{x^{k}}

such that

x^{k_{j}} \to \bar{x}

as

j \to \infty

. We define sequences

ξ_{1}^{k_{j}} \in \partial p (x^{k_{j}})

and

ξ_{2}^{k_{j}} \in \partial (h \circ g) (x^{k_{j}})

such that

0 = \nabla f (x^{k_{j} - 1}) + ξ_{1}^{k_{j}} + ξ_{2}^{k_{j}} + α (x^{k_{j}} - x^{k_{j} - 1}) .

Let

q^{k_{j}} (x) = \frac{α}{2} {∥ x - x^{k_{j} - 1} ∥}_{2}^{2}

. Clearly,

q^{k_{j}}

is a convex function. Due to the strict continuity of

h^{x^{k_{j} - 1}} \circ g + q^{k_{j}}

,

\partial (h^{x^{k_{j} - 1}} \circ g + q^{k_{j}}) (x)

is bounded. Clearly,

{\nabla f (x^{k_{j}})}

is bounded from the continuity of

\nabla f

. Therefore,

{ξ_{1}^{k_{j}}}

is bounded and

lim_{j \to \infty} ⟨ ξ_{1}^{k_{j}}, \bar{x} - x^{k_{j}} ⟩ = 0

. Using the facts of the lower semicontinuity of F, the continuity of f and the convexity of p, we obtain

\begin{matrix} F (\bar{x}) & \leq \underset{j \to \infty}{lim inf} F (x^{k_{j}}), \\ \leq \underset{j \to \infty}{lim sup} F (x^{k_{j}}), \\ \leq \underset{j \to \infty}{lim sup} f (x^{k_{j}}) + \underset{j \to \infty}{lim sup} p (x^{k_{j}}) + \underset{j \to \infty}{lim sup} (h \circ g) (x^{k_{j}}), \\ = f (\bar{x}) + \underset{j \to \infty}{lim sup} p (x^{k_{j}}) + lim_{j \to \infty} ⟨ ξ_{1}^{k_{j}}, \bar{x} - x^{k_{j}} ⟩ + h \circ g (\bar{x}), \\ = f (\bar{x}) + \underset{j \to \infty}{lim sup} (p (x^{k_{j}}) + ⟨ ξ_{1}^{k_{j}}, \bar{x} - x^{k_{j}} ⟩) + h \circ g (\bar{x}), \\ \leq f (\bar{x}) + p (\bar{x}) + h \circ g (\bar{x}) = F (\bar{x}) . \end{matrix}

Thus,

F (x^{k_{j}}) \to F (\bar{x})

as

j \to \infty

. ☐

In Propositions 4–6, we show that PL-ICMM satisfies the three alternative Hypotheses 1–3. Finally, we can obtain the global convergence of the proposed algorithm.

Theorem 2.

Let

F : R^{n} \to R \cup {\infty}

be a proper l.s.c. function. Let the sequence

{x^{k}}

be generated by PL-ICMM. If F has the

K L

property at the cluster point

x^{*} : = lim_{j \to \infty} x^{k_{j}}

, then the sequence

{x^{k}}_{k \in N}

converges to

x^{*} \in X

as

k \to \infty

and

x^{*}

is a critical point of

F .

Moreover, the sequence

{x^{k}}_{k \in N}

has a finite length, i.e.,

\sum_{k = 0}^{\infty} {∥ x^{k + 1} - x^{k} ∥}_{2} < \infty .

Proof.

Propositions 4–6 yield all of the requirements for Theorem 1. According to Theorem 1, we can obtain the final results. ☐

4. Numerical Experiments and Discussion

In this section, we present the numerical results of the proposed methods, and applications of the proposed algorithms are provided. We consider compressive sensing in signal processing. All numerical experiments are implemented with MATLAB R2020b on a 64-bit Windows 10 desktop with an Intel Xeon(R) 2.40 GHz CPU, with 64 GB RAM.

4.1. Numerical Results for PL-IRL1

First, we show the performance of PL-IRL1. The main concept of compressive sensing is that a sparse signal can be recovered from incomplete information, i.e., underdetermined system

A x = b

where

m ≪ n

. We say that x is k-sparse if x has only k-nonzero elements. The compressive sensing problem is generally an ill-posed problem, and there exist many solutions mathematically. To obtain the sparse solution, the basic model for compressive sensing is called lasso, which has the following form:

min_{x \in R^{n}} {∥ x ∥}_{1} + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2},

where

A \in R^{m \times n}

with

m ≪ n

,

b \in R^{m}

,

μ > 0

is a positive regularization parameter. This problem is a convex relaxation of the following nonconvex

ℓ_{0}

minimization problem:

min_{x \in R^{n}} {∥ x ∥}_{0} + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2},

where

{∥ x ∥}_{0}

is defined as the number of nonzero elements of input x. Recently, sparse signal recovery from an observed signal corrupted impulsive noise was interested in lots of works [32,33,34,35]. For the sparse recovery with impulsive noise, the following

ℓ_{1}

-fidelity based convex problem can be often applied:

min_{x \in R^{n}} {∥ x ∥}_{1} + \frac{μ}{2} {∥ A x - b ∥}_{1} .

We also consider nonconvex varations of this model for a compressive sensing problem with impulsive noise as follows:

min_{x \in R^{n}} \sum_{i = 1}^{n} {log (1 + ρ | x |}_{i}) + \frac{μ}{2} {∥ A x - b ∥}_{1},

and

min_{x \in R^{n}} \sum_{i = 1}^{n} \frac{{| x |}_{i}}{1 + {ρ | x |}_{i}} + \frac{μ}{2} {∥ A x - b ∥}_{1},

where

ρ > 0

is a positive parameter. Unfortunately the PL-IRL1 can be directly applied to these nonconvex problems. Hence, we add the auxiliary variable

z \in R^{m}

and adopt the penalty technique, leading us to obtain the following nonconvex and nonsmooth minimization problems:

min_{x \in R^{n}, z \in R^{m}} F_{1} (x, z) = \sum_{i = 1}^{n} {log (1 + ρ | x |}_{i}) + \frac{β}{2} {∥ z - A x + b ∥}_{2}^{2} + μ {∥ z ∥}_{1},

(14)

and

min_{x \in R^{n}, z \in R^{m}} F_{2} (x, z) = \sum_{i = 1}^{n} \frac{{| x |}_{i}}{1 + {ρ | x |}_{i}} + \frac{β}{2} {∥ z - A x + b ∥}_{2}^{2} + μ {∥ z ∥}_{1},

(15)

where

β

is a positive penalty constant. We set

\begin{matrix} g (x) = | x |, h (y) = μ \sum_{i = 1}^{n} log (1 + ρ y_{i}) or \sum_{i = 1}^{n} \frac{{| x |}_{i}}{1 + {ρ | x |}_{i}}, \\ f (x) = \frac{β}{2} {∥ z - A x + b ∥}_{2}^{2}, p (x) = \frac{μ}{2} {∥ z ∥}_{1} . \end{matrix}

In this setting, the minimization problems (14) and (15) have the form of problem (1). Since the objective function of the problem (14) is definable in the log-exp o-minimal structure, it is a KL function. The objective function of the minimization problem (15) is a semialgebraic function and it is also a KL function. Moreover, these are closed and coercive. The function f is a convex, proper and continuously differentiable function, p is a convex, proper and l.s.c function, and g is a proper and l.s.c function. The function

h : R_{+}^{n} \to R

is coordinatewise nondecreasing, continuously differentiable and concave. Since the objective function involves nondifferentiable term

{μ ∥ z ∥}_{1}

, the proximal iterative reweighted algorithm proposed in [31] cannot be applied. Hence, we can apply IRL1 [10] or PL-IRL1 to solve the given problem (14). The IRL1 or PL-IRL1 applied to the problem (14) is given as follows:

\{\begin{matrix} w^{k} & = & \{\begin{matrix} \frac{ρ}{1 + ρ | x^{k} |} & for (14) \\ \frac{1}{(1 + ρ | x^{k} {|)}^{2}} & for (15) \end{matrix}, \\ (x^{k + 1}, z^{k + 1}) & = & arg min_{x, z} ⟨ w^{k}, | x | ⟩ + \frac{β}{2} {∥ z - A x + b ∥}_{2}^{2} + μ {∥ z ∥}_{1}, \end{matrix}

(16)

and

\{\begin{matrix} w^{k} & = & \{\begin{matrix} \frac{ρ}{1 + ρ | x^{k} |} & for (14) \\ \frac{1}{(1 + ρ | x^{k} {|)}^{2}} & for (15) \end{matrix}, \\ (x^{k + 1}, z^{k + 1}) & = & arg min_{x, z} ⟨ w^{k}, | x | ⟩ + β ⟨ z^{k} - A x^{k} + b, z - A x + b ⟩ + μ {∥ z ∥}_{1} \\ + \frac{δ}{2} ∥ x - x^{k} ∥_{2}^{2} + \frac{δ}{2} {∥ z - z^{k} ∥}_{2}^{2}, \end{matrix}

(17)

respectively. For solving the convex subproblem of IRL1, the optimality conditions are given as follows:

\{\begin{matrix} 0 & \in & \partial (⟨ w^{k}, | x | ⟩) (x^{k + 1}) + β A^{T} (A x^{k + 1} - b - z^{k + 1}), \\ 0 & \in & β (z^{k + 1} - A x^{k + 1} + b) + {μ \partial (∥ z ∥}_{1}) (z^{k + 1}) . \end{matrix}

(18)

This system (18) of equations does not have a closed form solution. Hence, the IRL1 cannot be employed to solve the problem (14). On the other hand, the optimality conditions of the convex subproblem of our method are given as

\{\begin{matrix} 0 & \in & \partial (⟨ w^{k}, | x | ⟩) (x^{k + 1}) + β A^{T} (A x^{k} - b - z^{k}) + δ (x^{k + 1} - x^{k}), \\ 0 & \in & β (z^{k} - A x^{k} + b) + {μ (∥ z ∥}_{1}) (z^{k + 1}) + δ (z^{k + 1} - z^{k}) . \end{matrix}

(19)

These equations in (19) are separable, and each problem has a closed form solution:

\begin{matrix} x^{k + 1} & = & shrink (x^{k} - \frac{β}{δ} A^{T} (A x^{k} - b - z^{k}), \frac{w^{k}}{δ}), \\ z^{k + 1} & = & shrink (z^{k} - \frac{β}{δ} (z^{k} - A x^{k} + b), \frac{μ}{δ}), \end{matrix}

where shrink function is defined as

shrink (a, b) = sign (a) \cdot max (| a | - b, 0) .

To show the convergence of the PL-IRL1 to solve problems (14) and (15), we perform the numerical experiments in the following setting. The size of A is

n = 5000

and

m = 2500

. We use an orthonormal Gaussian measurement matrix A whose entries are randomly chosen by standard Gaussian distribution, and then each column of A is divided by its

ℓ_{2}

norm. The number l of nonzero elements of the original sparse signal

x_{0}

is fixed at 50, the locations of nonzero elements are selected randomly, and the values of nonzero elements are chosen by Gaussian distribution

N (0, 10^{2})

. The observed data b can be calculated by

b = A x_{0} + n,

where n is the Gaussian mixture noise that consists of two Gaussian components, which is one of impulsive noise in signal processing [36]. The measured equation of n is given by

n = ν n_{1} + (1 - ν) n_{2},

where

n_{1}

and

n_{2}

are Gaussian noise with mean 0 and standard deviation

η

and

q η

. The

n_{1}

denotes the background noise, and

n_{1}

represents the influence of outliers. The parameter

ν \in (0, 1)

controls the proportion of the large outliers and

q > 1

controls the strength of the outliers. Here, we fixed these parameters at

(ν, q, η) = (0.1, \sqrt{10}, 0.02)

.

Since

{∥ A ∥}_{2} = 1

, the Lipschitz constant of

\nabla f

is

2 β

. So,

δ

is set to be

β + 0.001

. The regularization parameter

μ

is fixed at

0.2

for (14) or

1.5

for (15); the penalty parameter

β

is set to be 2 for (14) or 28 for (15); and the controlling parameter

ρ

of nonconvexity is set to be

0.1

. For the stopping condition of our algorithm, we use relative errors over energy function values, whose specific formulation is given as

\frac{F_{i} (x^{k}) - F_{i} (x^{k + 1})}{F_{i} (x^{0})} < 10^{- 7}, i = 1, 2 .

For this setting, 100 different numerical tests are conducted.

In Table 1, we present the computing time, number of iterations, final energy value, and relative error. From the relative errors, we can observe that the proposed algorithm finds an approximated sparse solution with small error for all cases. In Figure 1, we illustrate the relative errors and energy values

F_{i} (x^{k})

over k iterations. Since the penalty parameter

β

for (14) is smaller than that for (15), the computing time of our method for solving (14) is faster than that for solving (15). We can also see similar results in terms of the number of iterations. Ultimately, we can show in Figure 1 and Table 1 the fast convergence of the PL-IRL1, and the final energy value has enough small value.

Table 1. Numerical results for PL-IRL1 algorithm applied to models (14) and (15).

Figure 1. Comparison results of the energy and relative error for PL-IRL1 algorithm applied to problems (14) and (15). Left: Result for problem (14). Right: Result for problem (15).

4.2. Numerical Results for PL-IRLS

Second, we present numerical results of PL-IRLS compared with the iPiano [17] method for the compressive sensing problem in signal processing. Specifically, the restoration of the sparse signal corrupted by additive Gaussian noise is considered. We apply the algorithms PL-IRLS and iPiano to the following unconstrained problems:

min_{x \in R^{n}} \frac{1}{2 ρ} \sum_{i = 1}^{n} {log (1 + ρ | x |}_{i}^{2}) + \frac{β}{2} {∥ x ∥}_{2}^{2} + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2},

(20)

and

min_{x \in R^{n}} \frac{1}{2} \sum_{i = 1}^{n} \frac{{| x |}_{i}^{2}}{1 + {ρ (| x |}_{i} {+ c)}^{2}} + \frac{β}{2} {∥ x ∥}_{2}^{2} + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2},

(21)

where

β, μ

are positive parameters, c is a positive constant, and

ρ > 0

is the parameter that controls the nonconvexity of regularizing term. These problems are a nonconvex variations of lasso, which is a well-known model for compressive sensing. The objective functions of problems (20) and (21) are definable in the log-exp o-minimal structure, and they are also close, coercive KL functions. With setting

\begin{matrix} g (x) = | x |, h (y) = \frac{1}{2 ρ} \sum_{i = 1}^{n} log (1 + ρ y_{i}^{2}) or \frac{1}{2} \sum_{i = 1}^{n} \frac{y_{i}^{2}}{1 + ρ {(y_{i} + c)}^{2}} \\ f (x) = \frac{μ}{2} {∥ A x - b ∥}_{2}^{2}, p (x) = \frac{β}{2} {∥ x ∥}_{2}^{2}, \end{matrix}

all assumptions in Algorithm 4 are satisfied. Since the norm of the Hessian matrix of h is bounded on

R_{+}^{n}

, it has also a strictly continuous gradient. The PL-IRLS applied to the problems (20) and (21) can be obtained by

\{\begin{matrix} w^{k} & = & \{\begin{matrix} \frac{1}{1 + ρ | x^{k} |^{2}}, & for (20) \\ \frac{c ρ (| x^{k} | + c) + 1}{(1 + ρ (| x^{k} {| + c))}^{2}} & for (21) \end{matrix}, \\ x^{k + 1} & = & arg min_{x} \frac{1}{2} ⟨ w^{k}, x^{2} ⟩ + \frac{δ}{2} {∥ x - x^{k} ∥}_{2}^{2} + μ ⟨ A^{T} (A x^{k} - b), x - x^{k} ⟩ . \end{matrix}

(22)

The subproblem in (22) is a quadratic problem, and its normal equation is given as follows:

w^{k} x + δ (x - x^{k}) + μ A^{T} (A x^{k} - b) = 0 .

It can be rewritten as the following linear equation,

(d i a g (w^{k}) + δ I) x = δ x^{k} - μ A^{T} (A x^{k} - b),

where

d i a g (w^{k})

is a diagonal matrix whose diagonal entries consist of

w^{k}

. Since

d i a g (w^{k}) + δ I

is a diagonal matrix, this linear equation can be solved exactly and easily. The majorization property of PL-IRLS is obtained from ([10], Proposition 23) for any

ρ > 0

and

c > 0

.

In this experiment, we use partial discrete cosine transform (DCT) matrices A whose i-th rows are selected from the

n \times n

DCT matrix. We note that the partial DCT matrices are implicitly stored, i.e., matrix-vector multiplications in

A x

or

A^{T} x

are computed by the DCT or inverse of DCT. Hence, we can use partial DCT matrices of very large sizes. Here, the size

n, m

is fixed at

(n, m)

= (100,000, 30,000). Then, the original IRLS can be actually applied to the nonsmooth and nonconvex problem (20), which is given by

\{\begin{matrix} w^{k} & = & \{\begin{matrix} \frac{1}{1 + ρ | x^{k} |^{2}}, & for (20) \\ \frac{c ρ (| x^{k} | + c) + 1}{(1 + ρ (| x^{k} {| + c))}^{2}} & for (21) \end{matrix}, \\ x^{k + 1} & = & arg min_{x} \frac{1}{2} ⟨ w^{k}, x^{2} ⟩ + \frac{μ}{2} {∥ A x - b ∥}_{2}^{2} . \end{matrix}

(23)

The optimality condition of the subproblem in the method (23) is

w^{k} x^{k + 1} + μ A^{T} (A x^{k + 1} - b) = 0,

and it is equivalent to

(d i a g (w^{k}) + μ A^{T} A) x^{k + 1} = μ A^{T} b .

Since the size of our measurement matrix is very large, finding the exact solution of this linear equation is time consuming and it seem to be impossible in many cases.

The number l of nonzero elements of the sparse signal

x_{0}

is fixed at 2000, and the locations of nonzero elements are randomly chosen. The values of nonzero elements are selected from standard Gaussian distribution. The observed data b is measured by the formula

b = A x_{0} + n,

where n is the Gaussian white noise with mean 0 and standard deviation 0.02. The regularization parameters

(β, μ)

are fixed at

(0.001, 1.5)

for (20) or

(0.001, 0.9)

for (21) in all tests. The values of

(ρ, c)

are also fixed at

(250, 0.001)

. The proximity parameter

δ

in our method is set to be

\frac{μ}{2} + 0.0001

because the

ℓ_{2}

norm of a partial DCT matrix is less than or equal to 1.

We present the mean values and standard deviations over 100 trials of computing time, number of iterations, energy value and relative error in Table 2. In Figure 2, we plot the relative errors and energy values of PL-IRLS and iPiano over iterations. Figure 2 shows the convergence of PL-IRLS and iPiano. For the tests, average values of the energy value and relative error for PL-IRLS are almost same with those for the iPiano. This shows that PL-IRLS and iPiano recover almost the same sparse solutions. Hence, it demonstrates the similar performance between PL-IRLS and iPiano in terms of the accuracy and optimization of energy functional. On the other hand, it can be observed in Table 2 and Figure 2 that PL-IRLS is faster than iPiano. Therefore, PL-IRLS is superior to iPiano for solving the nonconvex and nonsmooth problems (20) and (21). In conclusion, PL-IRLS gives better performance over iPiano.

Table 2. Numerical results for PL-IRL1 and iPiano applied to the model (20) and (21).

Figure 2. Comparison results of the energy and relative errors for PL-IRLS and iPiano applied to problem (20) and (21). Left: Result for problem (20). Right: Result for problem (21).

5. Conclusions

In this paper, we proposed proximal linearized reweighted algorithms to solve nonconvex and nonsmooth unconstrained minimization problem (1). Based on the general unified framework, we suggested an extension of the iterative convex majorization–minimization method for solving (1). Moreover, extended versions of the iteratively reweighted

ℓ_{1}

algorithm and iterative least square algorithm were also introduced. The global convergence of the proposed algorithm was also proved under uncertain assumptions. Lastly, the numerical results related to compressive sensing demonstrated that the proposed methods provides the outstanding performance compared with state-of-the-art methods. Recently, several algorithms were extended by imposing an additional inertial term, resulting in a faster convergence rate. In future, we will study a proximal linearized reweighted algorithm with an inertial force.

Author Contributions

Conceptualization, M.K.; Formal analysis, J.Y. and M.K.; Funding acquisition, M.K.; Methodology, J.Y. and M.K.; Software, M.K.; Validation, J.Y. and M.K.; Writing original draft, J.Y. and M.K.; Writing review and editing, J.Y. and M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by research fund of Chungnam National University.

Acknowledgments

The authors would like to thank the referees for their helpful comments. The authors would like to thank Chungnam National University for funding support.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

l.s.c.	Lower semicontinuous
ICMM	Iterative convex majorization–minimization method
KL	Kurdyka–Łojasiewicz
PL-ICMM	Proximal linearized iterative convex majorization–minimization method
PL-IRL1	Proximal linearized iteratively reweighted $ℓ_{1}$ algorithm
PL-IRLS	Proximal linearized iteratively reweighted least square algorithm
IRL1	Iteratively reweighted $ℓ_{1}$ algorithm
DCT	discrete cosine transform

References

Curry, H.B. The method of steepest descent for non-linear minimization problems. Q. Appl. Math. 1944, 2, 258–261. [Google Scholar] [CrossRef] [Green Version]
Nesterov, Y.E. A method for solving the convex programming problem with convergence rate O $(1 / k \hat{2})$ . Dokl. Akad. Nauk Sssr 1983, 269, 543–547. [Google Scholar]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef] [Green Version]
Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef] [Green Version]
Kaplan, A.; Tichatschke, R. Proximal point methods and nonconvex optimization. J. Glob. Optim. 1998, 13, 389–406. [Google Scholar] [CrossRef]
Gong, P.; Zhang, C.; Lu, Z.; Huang, J.; Ye, J. A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems. In Proceedings of the 30th International Conference on Machine Learning. PMLR 2013, 28, 37–45. [Google Scholar]
Attouch, H.; Bolte, J.; Redont, P.; Soubeyran, A. Proximal alternating minimization and projection methods for nonconvex problems: An approach based on the Kurdyka-Łojasiewicz inequality. Math. Oper. Res. 2010, 35, 438–457. [Google Scholar] [CrossRef] [Green Version]
Candes, E.J.; Wakin, M.B.; Boyd, S.P. Enhancing sparsity by reweighted ℓ₁ minimization. J. Fourier Anal. Appl. 2008, 14, 877–905. [Google Scholar] [CrossRef]
Lai, M.J.; Xu, Y.; Yin, W. Improved iteratively reweighted least squares for unconstrained smoothed ℓ_q minimization. SIAM J. Numer. Anal. 2013, 51, 927–957. [Google Scholar] [CrossRef]
Ochs, P.; Dosovitskiy, A.; Brox, T.; Pock, T. On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 2015, 8, 331–372. [Google Scholar] [CrossRef]
Łojasiewicz, S. Sur la géométrie semi-et sous-analytique. Ann. l’Institut Fourier 1993, 43, 1575–1595. [Google Scholar] [CrossRef]
Łojasiewicz, S. Une propriété topologique des sous-ensembles analytiques réels. Les Équations aux Dérivées Partielles 1963, 117, 87–89. [Google Scholar]
Bolte, J.; Daniilidis, A.; Lewis, A.; Shiota, M. Clarke subgradients of stratifiable functions. SIAM J. Optim. 2007, 18, 556–572. [Google Scholar] [CrossRef] [Green Version]
Bolte, J.; Daniilidis, A.; Lewis, A. The Lojasiewicz inequality for nonsmooth subanalytic functions with applications to subgradient dynamical systems. SIAM J. Optim. 2007, 17, 1205–1223. [Google Scholar] [CrossRef]
Bolte, J.; Combettes, P.L.; Pesquet, J.C. Alternating proximal algorithm for blind image recovery. In Proceedings of the 2010 IEEE International Conference on Image Processing, Hong Kong, China, 26–29 September 2010; pp. 1673–1676. [Google Scholar] [CrossRef] [Green Version]
Attouch, H.; Bolte, J. On the convergence of the proximal algorithm for nonsmooth functions involving analytic features. Math. Program. 2009, 116, 5–16. [Google Scholar] [CrossRef]
Ochs, P.; Chen, Y.; Brox, T.; Pock, T. iPiano: Inertial proximal algorithm for nonconvex optimization. SIAM J. Imaging Sci. 2014, 7, 1388–1419. [Google Scholar] [CrossRef]
Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
Osher, S.; Mao, Y.; Dong, B.; Yin, W. Fast linearized Bregman iteration for compressive sensing and sparse denoising. Commun. Math. Sci. 2010, 8, 93–111. [Google Scholar]
Bolte, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 2014, 146, 459–494. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2009; Volume 317. [Google Scholar]
Mordukhovich, B.S.; Nam, N.M.; Yen, N. Fréchet subdifferential calculus and optimality conditions in nondifferentiable programming. Optimization 2006, 55, 685–708. [Google Scholar] [CrossRef] [Green Version]
Bochnak, J.; Coste, M.; Roy, M.F. Real Algebraic Geometry; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 36. [Google Scholar]
Wilkie, A.J. Model completeness results for expansions of the ordered field of real numbers by restricted Pfaffian functions and the exponential function. J. Am. Math. Soc. 1996, 9, 1051–1094. [Google Scholar] [CrossRef]
Dries, L.P.D.v.d. Tame Topology and O-Minimal Structures; Cambridge University Press: Cambridge, UK, 1998; Volume 248. [Google Scholar]
Kang, M. Approximate versions of proximal iteratively reweighted algorithms including an extended IP-ICMM for signal and image processing problems. J. Comput. Appl. Math. 2020, 376, 112837. [Google Scholar] [CrossRef]
Combettes, P.L.; Wajs, V.R. Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
Chartrand, R.; Yin, W. Iteratively reweighted algorithms for compressive sensing. In Proceedings of the 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, 31 March–4 April 2008; pp. 3869–3872. [Google Scholar]
Daubechies, I.; DeVore, R.; Fornasier, M.; Güntürk, C.S. Iteratively reweighted least squares minimization for sparse recovery. Commun. Pure Appl. Math. 2010, 63, 1–38. [Google Scholar] [CrossRef] [Green Version]
Needell, D. Noisy signal recovery via iterative reweighted L1-minimization. In Proceedings of the 2009 Conference Record of the Forty-Third Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1–4 November 2009; pp. 113–117. [Google Scholar]
Sun, T.; Jiang, H.; Cheng, L. Global convergence of proximal iteratively reweighted algorithm. J. Glob. Optim. 2017, 68, 815–826. [Google Scholar] [CrossRef]
Ji, Y.; Yang, Z.; Li, W. Bayesian sparse reconstruction method of compressed sensing in the presence of impulsive noise. Circuits Syst. Signal Process. 2013, 32, 2971–2998. [Google Scholar] [CrossRef]
Javaheri, A.; Zayyani, H.; Figueiredo, M.A.; Marvasti, F. Robust sparse recovery in impulsive noise via continuous mixed norm. IEEE Signal Process. Lett. 2018, 25, 1146–1150. [Google Scholar] [CrossRef]
Wen, F.; Liu, P.; Liu, Y.; Qiu, R.C.; Yu, W. Robust Sparse Recovery in Impulsive Noise via ℓ_p-ℓ₁ Optimization. IEEE Trans. Signal Process. 2016, 65, 105–118. [Google Scholar] [CrossRef]
Wen, J.; Weng, J.; Tong, C.; Ren, C.; Zhou, Z. Sparse signal recovery with minimization of 1-norm minus 2-norm. IEEE Trans. Veh. Technol. 2019, 68, 6847–6854. [Google Scholar] [CrossRef]
Wen, F.; Pei, L.; Yang, Y.; Yu, W.; Liu, P. Efficient and robust recovery of sparse signal and image using generalized nonconvex regularization. IEEE Trans. Comput. Imaging 2017, 3, 566–579. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Comparison results of the energy and relative error for PL-IRL1 algorithm applied to problems (14) and (15). Left: Result for problem (14). Right: Result for problem (15).

Figure 2. Comparison results of the energy and relative errors for PL-IRLS and iPiano applied to problem (20) and (21). Left: Result for problem (20). Right: Result for problem (21).

Table 1. Numerical results for PL-IRL1 algorithm applied to models (14) and (15).

Model		Computing Time (s)	Iterations	Energy	Relative Error
Problem (14)	mean	5.437452	142	34.1832	$3.5502 \times 10^{- 2}$
Problem (14)	std	0.293790	8	2.1721	$4.5748 \times 10^{- 3}$
Problem (15)	mean	8.960141	236	282.1852	$3.1598 \times 10^{- 2}$
Problem (15)	std	0.563042	15	13.0279	$4.2781 \times 10^{- 3}$

Table 2. Numerical results for PL-IRL1 and iPiano applied to the model (20) and (21).

Method		Computing Time (s)	Iterations	Energy	Relative Error
Model (20)
PL-IRLS	mean	3.255264	53	21.2322	$8.7822 \times 10^{- 2}$
PL-IRLS	std	0.215043	1	0.2123	$2.0949 \times 10^{- 3}$
iPiano	mean	4.165273	68	21.2323	$8.7787 \times 10^{- 2}$
iPiano	std	0.226674	1	0.2123	$2.0945 \times 10^{- 3}$
Model (21)
PL-IRLS	mean	5.588428	90	7.1838	$6.1408 \times 10^{- 2}$
PL-IRLS	std	0.775879	11	0.0483	$1.4041 \times 10^{- 3}$
iPiano	mean	8.849843	144	7.1840	$6.1389 \times 10^{- 2}$
iPiano	std	0.915589	12	0.0483	$1.4009 \times 10^{- 3}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.