Proximal Linearized Iteratively Reweighted Algorithms for Nonconvex and Nonsmooth Optimization Problem

: The nonconvex and nonsmooth optimization problem has been attracting increasing attention in recent years in image processing and machine learning research. The algorithm-based reweighted step has been widely used in many applications. In this paper, we propose a new, extended version of the iterative convex majorization–minimization method (ICMM) for solving a nonconvex and nonsmooth minimization problem, which involves famous iterative reweighted methods. To prove the convergence of the proposed algorithm, we adopt the general uniﬁed framework based on the Kurdyka–Łojasiewicz inequality. Numerical experiments validate the effectiveness of the proposed algorithm compared to the existing methods.


Introduction
In this paper, we consider the following nonconvex and nonsmooth optimization problem of a specific structure in a n-dimensional real vector space: where f : R n → R ∪ {∞} is a proper, lower semicontinuous (l.s.c.), convex and continuously differentiable function which has the Lipschitz gradient with Lipschitz constant L f , p : R n → R ∪ {∞} is a proper, l.s.c. and convex function, g : R n → R m is a proper and l.s.c. function, and h : Image(g) → R is continuously differentiable. Furthermore, we assume that the coordinate functions g i of g are convex, and the function h has a strictly continuous gradient and is coordinate-wise nondecreasing, i.e., h(x) ≤ h(x + λe i ), where x, x + λe i ∈ Image(g) and λ > 0 with i-th standard basis vector e i for i = 1, · · · , m. We also suppose that F is coercive, closed and definable in an o-minimal structure. Several nonconvex optimization problems in image processing or signal processing have an objective function of the form (1). For example, a nonconvex and nonsmooth minimization problem for image denoising min u∈R n×m has the form of the problem (1) whose objective function satisfies all assumptions. Here, b ∈ R n×m is an observed noisy image, u is a restored image, and α 1 , α 2 are positive parameters. The following nonconvex minimization for compressive sensing problem is also an example of the proposed problem (1): where β, µ, ρ are positive parameters, A is a m × n matrix with m n, and b is an observed signal. As we will see in the numerical experiments, we will apply the proposed method to this application.
Minimizing the sum of a finite number of given functions is an important issue of mathematical optimization research. For the minimizing of convex functions, many efficient algorithms were proposed with convergence analysis, such as gradient based method [1,2], the iterative shrinkage thresholding algorithm [3], proximal point method [4] and alternating minimization algorithm. On the other hand, it is difficult to prove the global convergence of an algorithm to solve the nonconvex minimization problem. Nevertheless, several algorithms for solving nonconvex minimization problems have been developed. Extensions in the nonconvex setting of many first-order algorithms have been proposed, such as the gradient method, proximal point method [5] and iterative shrinkage thresholding algorithm [6] for nonconvex optimization. Recently, Attouch et al. [7] extended the alternating minimization algorithm by adding a proximal term to minimize the nonconvex function. The iteratively reweighted 1 algorithm [8] was proposed for solving a nonconvex minimization problem in compressive sensing. The iteratively reweighed least square method [9] was also developed for the nonconvex p norm-based model applied to the compressive sensing problem. Very recently, Ochs et al. [10] extended these algorithms in a framework of the iterative convex majorization-minimization method (ICMM) to solve nonsmooth and nonconvex optimization problems and gave the global convergence analysis.
The Kurdyka-Łojasiewicz (KL) inequality is key when proving the global convergence of algorithms for nonsmooth and nonconvex optimization problems. A function which satisfies the KL inequality is called a KL function. Smooth KL functions and nonsmooth KL functions were introduced in [11][12][13][14]. Almost objective functions of the minimization problem in image processing satisfy the KL inequality, and it is a very useful when we deal with nonconvex objective functions. Many methods [7,15,16] whose global convergence was given based on the KL inequality have been proposed. Recently, Ochs et al. [17] proposed an inertial proximal algorithm by combining forward-backward splitting with an inertial force. Attouch et al. [18] proposed a general framework for the global convergence of descent methods for minimizing the KL function. In this paper, we utilize this general framework for the convergence of the proposed method.
In general, a nonlinear optimization problem does not have a closed-form solution. Hence, iterative algorithms frequently are used to solve a nonlinear minimization problem. Several algorithms minimize alternatively a linear approximation of nonlinear differentiable objective function. This technique is called "linearization". The linearization technique of continuously differentiable objective function was applied in many algorithms [3,19,20] to solve constrained or unconstrained optimization problems.
The ICMM [10] is a popular algorithm for solving the problem (1). In this article, we propose an extension of the ICMM to solve the nonconvex and nonsmooth minimization problem (1). The linearization of the convex differentiable function f is considered unlike the ICMM. This enables the proposed method to deal with more applications. Further details of applications are given in Section 3. Based on the general framework introduced in [18], we prove the global convergence of the proposed model. From numerical experiments, we demonstrate the superiority of the proposed method over the existing methods, which are presented in Section 4.
The rest of this paper is organized as follows. In Section 2, we present the mathematical preliminary for nonconvex optimization and introduce the ICMM. In Section 3, we propose an extended version of the ICMM, and the iterative reweighted algorithms, which are special examples of the ICMM, are also suggested. The global convergence of the proposed algorithm is proved. In Section 4, numerical experiments are provided for our method, with comparisons to state-of-the-art methods. Finally, Section 5 summarizes our work.

Background
In this section, we present the mathematical preliminary for our work, and the iterative convex majorization-minimization method [10] is introduced.

Mathematical Preliminary
In Section 2.1, we introduce basic mathematical concepts and properties. More details are given in [21,22].
The concept of the Lipschitz continuity of a function is important in mathematical optimization theory. In the problem (1), we consider a continuously differentiable function which is included in a class of functions, namely the function with the Lipschitz gradient. This class is denoted by There is a property for a function with the Lipschitz gradient: If f : R n → R is a continuously differentiable function whose gradient is Lipschitz continuous with Lipschitz constant L f , i.e., f ∈ C 1,1 , then, for any L ≥ L f , the following inequality holds: Now we introduce the generalized subdifferentials for a nonsmooth and nonconvex function.
A trivial property of the subdifferential is that ∂ f (x) ⊆ ∂ f (x). Moreover, there are many important properties of the subdifferential. These properties will be used to prove the convergence of the proposed method.
If f is convex, the regular and limiting subdifferentials are same sets, which is the subdifferential in the convex analysis: If g is a l.s.c. function and f is continuously differentiable on a neighborhood ofx, then we can obtain the subdifferential of f + g as follows:

4.
If a proper and l.s.c. function f : R n → R has a local minimum atx, then Furthermore, if f is convex, then this condition is also sufficient for a global minimum.
For obtain the global convergence of the proposed algorithm, the objective function in the problem (1) must be a Kurdyka-Łojasiewicz function.

Definition 2.
A function f : R n → R ∪ {∞} satisfies the Kurdyka-Łojasiewicz (KL) property at a point x * ∈ dom ∂ f if there exist η ∈ (0, +∞], neighborhood U of x * , and a continuous and convex function φ Although the KL property is a strong condition, many functions are KL functions. The semialgebraic functions [23] are typical examples of KL functions. The set of semialgebraic functions involves polynomials, indicator functions of semialgebraic sets [23], and the 2 norm. The compositions, finite sums, and finite products of semialgebraic functions are also semialgebraic. However, log and exponential functions are not semialgebraic functions. Recently, a class of definable functions in the log-exp o-minimal structure [24,25] was proposed, which involves log and exponential functions, and all semialgebraic functions. Moreover, it is proved that the functions in this class are KL functions. More details of the log-exp o-minimal structure are given in [24,25].
Lastly, we recall the general framework for an iterative algorithm when the objective function in an unconstrained minimization problem is a KL function. We consider a nonconvex unconstrained minimization problem: where F is a l.s.c. and proper function. Attouch et al. [18] suggested a framework for the convergence of an iterative method to solve the nonconvex minimization problem (3). They proved that a given algorithm converges when the following three conditions hold: Let {x k } k∈N be a sequence generated by a given iterative algorithm.
Hypothesis 1 (sufficient decrease condition). There exists a positive value a > 0 such that for each k ∈ N, Hypothesis 2 (relative error condition). For each k ∈ N, there exists a sequence w k+1 ∈ ∂F(x k+1 ) such that w k+1 where b is a fixed positive constant.

Hypothesis 3 (continuity condition).
There exists a subsequence {x k j } j∈N andx such that The following theorem is the convergence result, and the proof is given in ( [18], Theorem 2.9). Theorem 1. Let F : R n → R ∪ {∞} be a proper l.s.c. function. We consider a sequence {x k } k∈N that satisfies Hypotheses 1-3. If F has the KL property at the cluster pointx specified in Hypothesis 3, then the sequence {x k } k∈N converges tox =x as k goes to infinity, andx is a critical point of F. Moreover, the sequence {x k } k∈N has a finite length, i.e.,

Iterative Convex Majorization-Minimization Method
This section recalls the iterative convex majorization-minimization method (ICMM) [10] for solving the following nonconvex minimization problem: where G : R n → R ∪ {∞} is proper, l.s.c., and bounded below. The further assumptions are required; p : R n → R ∪ {∞} is proper, l.s.c., and convex. g : The ICMM to solve this nonconvex problem (4) is a famous iterative algorithm. This algorithm is an adopted majorization-minimization technique that chooses a suitable family of convex surrogate functions called majorizers, and it minimizes a convex majorizer function instead of the objective function G at each iteration. The specific algorithm is summarized in Algorithm 1.

Algorithm 1 Iterative Convex Majorization-Minimization Method (ICMM).
Initialization Choose a starting point until The algorithm satisfies a stopping condition The convergence of Algorithm 1 was studied in [10]. Additional conditions are required for the global convergence of the ICMM. First, h should have a locally Lipschitz continuous gradient on a compact set B containing all x k , and majorizers h x should have globally Lipschitz continuous gradients on B for all x ∈ R n , with a uniform Lipschitz constant. Another condition is that p + h x • g should be strongly convex, which is a stronger condition. To show the global convergence of the ICMM, it was proved that the objective function G satisfies the three Hypotheses 1-3 for any KL function G, which were introduced in the previous section. Then, Theorem 1 is applied. As examples of the ICMM, several iteratively reweighted convex algorithms were introduced in [10], such as the iterative reweighted 1 algorithm, iteratively reweighted tight convex algorithm, iteratively reweighted Huber algorithm and iteratively reweighted least squares algorithm.

Proximal Linearized Iteratively Convex Majorization-Minimization Method
In this section, we propose a novel algorithm for solving the nonconvex and nonsmooth minimization problem (1). The ICMM in Algorithm 1, introduced in the previous section, can be applied to the problem (1), since f (x) + p(x) is also a proper, l.s.c, convex function. This yields the following iteration: In many cases, this problem does not have a closed form solution and we cannot compute the exact solution of (5). Since the convergence of the ICMM is only guaranteed under the assumption that the subproblem (5) is solved exactly, it is not applicable to many problems. To overcome this fatal drawback, an inexact stopping criterion of the subproblem (5) was also proposed in [26]. Specifically, solving the subproblem (5) requires an inner algorithm such as the iterative shrinkage thresholding algorithm [27], and fast iterative shrinkage thresholding algorithm [3]. However, it is often time consuming for solving a large-scale problem. Therefore, we extend the ICMM by adopting a linearization technique of f . Here, we consider the linear approximation of f at the k-th iterate x k with an additional proximal term instead of f (x): Utilizing this technique, we propose the following minimization of a convex surrogate function: where α > L f 2 is a proximal parameter. The proposed algorithm is summarized in Algorithm 2 and it is called as proximal linearized ICMM (PL-ICMM).

Conditions
• f is differentiable and has Lipschitz gradient with Lipschitz constant L f . • p is proper, convex, and l.s.c. • g : R n → R m is l.s.c. and convex.
Initialization Choose a starting point x 0 ∈ R n with F(x 0 ) < ∞ and define a suitable family of convex surrogate functions (h x ) x∈R n such that for all repeat Solve until The algorithm satisfies a stopping condition.
The proposed method can be regarded as a generalized version of the ICMM and is more applicable than the ICMM. For examples, the PL-ICMM can be directly applied to the following minimizations for a regression problem, while the ICMM cannot be applied: Many iteratively reweighted algorithms [28][29][30] are examples of the ICMM, which use a weighted function appropriately to serve a convex majorizer in the ICMM. A convex majorizer h x k has the form h x k (x) = w x k ,h(x) for some givenh(x). The weight w x k must be selected such that the function h x k satisfies the conditions of the class of convex majorizers. Similar to the ICMM, the PL-ICMM is a general algorithm which includes lots of iteratively reweighted algorithms. We also introduce proximal linearized versions of iteratively reweighted algorithms.

Conditions
• f is differentiable and has Lipschitz gradient with Lipschitz constant L f . • p is proper, convex, and l.s.c.
. Solve until The algorithm satisfies a stopping condition First, we propose proximal linearized iteratively reweighted 1 algorithm (PL-IRL1). We further assume that the function h is concave on Im(g). For a concave function h, we can define the limiting supergradient of h as an element of −∂(−h). The set of all limiting supergradients of h is denoted by∂(h). Since −h is convex and differentiable,∂(h)(x) on Im(g) has only one element ∇h(x) from the property 2 in Proposition 1. The PI-IRL1 considers the majorizer h x k (y) = ∇h(g(x k )), y − g(x k ) and minimizes iteratively the following convex problem: In ( [10], Proposition 2), it was proved that the majorizer ∇h(g(x k )),
Then, the proximal iteratively reweighted algorithm [31] can be applied to the problem (1), which is the following iterative algorithm: where w x k = ∇h(g(x k )), and α > L f +L p 2 is a parameter. Hence, our PL-IRL1 can be also regraded as an extension of the proximal iteratively reweighted algorithm.
Due to the optimality of the iterative reweighted 1 algorithm, it has been used frequently to solve many applications. However, it cannot be applied to solve the problem (1) when h is not concave on Im(g), such as h(|y|) = log(1 + |y| 2 ). For nonconcave function h, the iterative reweighted least square algorithm (IRLS) is well known. For a proximal linearized version of IRLS, we need more assumptions that h is additively separable on R n + and each separable function h j (y j ) is convex in [0, r j ] and concave in [r j , +∞) for some r j > 0. The IRLS makes use of a convex majorizer where the weights w x k are given as w x k i = (∇h(y)) i y i and the the square in y 2 means the coordinatewise square operation. This yields the following iterative algorithm: The specific algorithm is given in Algorithm 4.

Conditions
• f is differentiable and has Lipschitz gradient with Lipschitz constant L f . • p be proper, convex, and l.s.c. • g : R n → R m be l.s.c. and convex. • h is additively separable on R n + , i.e., h(x 1 , · · · , x m ) = h 1 (x 1 ) + · · · + h m (x m ), and each h j is convex in [0, r j ] and concave in [r j , ∞) for some r j > 0.
Initialization Choose a starting point x 0 ∈ R n with F(x 0 ) < ∞. repeat y = g(x k ).
until The algorithm satisfies a stopping condition The majorization property of PL-IRLS can be also obtained from ([10], Proposition 23), when h(y) has the form h(y) = log(1 + ρy 2 ) for any ρ > 0.

Convergence Analysis of the PL-ICMM
First, we prove the partial convergence of the PL-ICMM. From the following proposition, the sequence {x k } generated by Algorithm 2 induces the convergence of the sequence {F(x k )} of objective function value at x k . Proposition 2. Let {x k } k∈N be generated by Algorithm 2, and let h x ∈ H(x) for all x ∈ R n . If α > L f , then the sequence {F(x k )} k∈N monotonically decreases and hence converges.
Proof. Let F be bounded below by L. We can obtain where the second inequality is obtained from the property (2) and the majorization property of h {x k } , the third inequality is obtained from the optimality of the subproblem in Algorithm 2 and the last inequality is obtained from the majorization property of h {x k } . The sequence {F(x k )} k∈N decreases and is bounded from below. Hence, it converges.
We can also obtain the local convergence of the PL-ICMM, given in the following proposition. Proposition 3. Let F be coercive. Then the sequence {x k } k∈N is bounded and has at least one accumulation point.
Proof. By Proposition 2, the sequence {F(x k )} is monotonically decreasing, and therefore the sequence {x k } is contained in the level set From the coercivity of F, we conclude the boundedness of the set L(x 0 ). By the Bolzano-Weierstrass theorem, {x k } k∈N has at least one accumulation point. Now, we prove the global convergence of the proposed algorithm. We utilize the general framework for the convergence of an iterative method, introduced in [18]. Specifically, we prove the three Hypothesis 1-3 with F(x) = f (x) + p(x) + h(g(x)) in the problem (1) and the sequence {x k } generated by the PL-ICMM. As a result, we obtain the global convergence of the PL-ICMM by using Theorem 1. We further assume the following: • h has a locally Lipschitz gradient on a compact set containing the sequence {g(x k )}, and majorizers h x have a Lipschitz gradient on a compact set containing the sequence {g(x k )} with a common Lipschitz constant.
To prove the global convergence of the PL-ICMM, we need the following lemma which shows the subdifferential calculus about the composition and summation of two functions. The proof of this lemma is provided in ([10], Lemma 1).

Lemma 1.
Under the given conditions for PL-ICMM, the following holds for all x * ∈ R n : 1.
For all x ∈ R n ,

2.
For all x ∈ R n , First we prove the sufficient decreasing condition for the proposed method in the following proposition.

Proposition 4 (Sufficient decreasing conditions). Let
Proof. From the property (2) of C 1,1 functions, we have By the definition of the subdifferential of a convex function, we can obtain where ξ p ∈ ∂p(x k+1 ) and ξ k h ∈ ∂(h x k • g(x k+1 )) are subderivatives of p and h x k • g at x k+1 , respectively. Since x k+1 is the minimizer of the problem arg min From the facts that h where the third inequality is obtained from Equations (8)-(10) and the last equality is obtained from the property (11). Let a = 2α−L f 2 . Since α > L f 2 , a > 0. Therefore, we can obtain the following result: The relative error condition (Hypothesis 2) for the PL-ICMM is proved in Proposition 5.
Proposition 5 (relative error condition). For all k ∈ N, there exists a positive constant C > 0 (independent of k) and ξ k+1 ∈ ∂F(x k+1 ) such that Proof. By the optimality of the subproblem of PL-ICMM and Lemma 1, there exist ξ p ∈ ∂p(x k+1 ) and ξ k h ∈ ∂(h x k • g(x k+1 )), satisfying Let y k = ∇h x k (g(x k+1 )). Then, by Lemma 1 and the property of the subdifferential, We can decompose Similarly, for any subderivative ξ h ∈ ∂(h • g)(x k+1 ), ξ h can also be decomposed as where y = ∇h(g(x k+1 )). Hence, it can be obtained from Lemma 1 that ξ k+1 : From Proposition 4 and the coercivity of F, the sequence {x k } is bounded. Hence, we can find a compact set containing this sequence in R n . The convexity of g implies the Lipschitz continuity on a compact, convex subset of R n involving {x k } for all k. From the further assumption, ∇h and ∇h x are Lipschitz continuous on a compact, convex subset B of R m containing g(x k ) for all k. Let L 1 and L 2 be the Lipschitz constants of g and ∇h, respectively. The common Lipschitz constant of ∇h x on B is set to be L h . By the property of the local Lipschitz continuity of g, we can obtain Since ∇ f (x k ) + ξ p + ξ k h + α(x k+1 − x k ) = 0, the following identities hold: From the fact ∇h(g(x k )) = ∇h x k (g(x k )) and the Lipschitz continuity of ∇h and ∇h x k , we have where the first inequality is obtained from the Equation (13), and the second inequality is obtained from the Equation (12). Letting C := (L h + L 2 )L 2 1 + L f + α, the final result can be obtained. Proposition 6 (continuity condition). There exists a convergent subsequence {x k j } of {x k } and its limitx satisfying lim Proof. The boundedness of {x k } implies the existence of a convergent subsequence. Let {x k j } be a convergent subsequence of {x k } such that x k j →x as j → ∞. We define Clearly, q k j is a convex function. Due to the strict continu- Using the facts of the lower semicontinuity of F, the continuity of f and the convexity of p, we obtain In Propositions 4-6, we show that PL-ICMM satisfies the three alternative Hypotheses 1-3. Finally, we can obtain the global convergence of the proposed algorithm. Theorem 2. Let F : R n → R ∪ {∞} be a proper l.s.c. function. Let the sequence {x k } be generated by PL-ICMM. If F has the KL property at the cluster point x * := lim j→∞ x k j , then the sequence {x k } k∈N converges to x * ∈ X as k → ∞ and x * is a critical point of F. Moreover, the sequence {x k } k∈N has a finite length, i.e., Proof. Propositions 4-6 yield all of the requirements for Theorem 1. According to Theorem 1, we can obtain the final results.

Numerical Experiments and Discussion
In this section, we present the numerical results of the proposed methods, and applications of the proposed algorithms are provided. We consider compressive sensing in signal processing. All numerical experiments are implemented with MATLAB R2020b on a 64-bit Windows 10 desktop with an Intel Xeon(R) 2.40 GHz CPU, with 64 GB RAM.

Numerical Results for PL-IRL1
First, we show the performance of PL-IRL1. The main concept of compressive sensing is that a sparse signal can be recovered from incomplete information, i.e., underdetermined system Ax = b where m n. We say that x is k-sparse if x has only k-nonzero elements. The compressive sensing problem is generally an ill-posed problem, and there exist many solutions mathematically. To obtain the sparse solution, the basic model for compressive sensing is called lasso, which has the following form: where A ∈ R m×n with m n, b ∈ R m , µ > 0 is a positive regularization parameter. This problem is a convex relaxation of the following nonconvex 0 minimization problem: where x 0 is defined as the number of nonzero elements of input x. Recently, sparse signal recovery from an observed signal corrupted impulsive noise was interested in lots of works [32][33][34][35]. For the sparse recovery with impulsive noise, the following 1 -fidelity based convex problem can be often applied: We also consider nonconvex varations of this model for a compressive sensing problem with impulsive noise as follows: where ρ > 0 is a positive parameter. Unfortunately the PL-IRL1 can be directly applied to these nonconvex problems. Hence, we add the auxiliary variable z ∈ R m and adopt the penalty technique, leading us to obtain the following nonconvex and nonsmooth minimization problems: and where β is a positive penalty constant. We set  (14) and (15). Left: Result for problem (14). Right: Result for problem (15). Table 1. Numerical results for PL-IRL1 algorithm applied to models (14) and (15). In this setting, the minimization problems (14) and (15) have the form of problem (1). Since the objective function of the problem (14) is definable in the log-exp o-minimal structure, it is a KL function. The objective function of the minimization problem (15) is a semialgebraic function and it is also a KL function. Moreover, these are closed and coercive. The function f is a convex, proper and continuously differentiable function, p is a convex, proper and l.s.c function, and g is a proper and l.s.c function. The function h : R n + → R is coordinatewise nondecreasing, continuously differentiable and concave. Since the objective function involves nondifferentiable term µ z 1 , the proximal iterative reweighted algorithm proposed in [31] cannot be applied. Hence, we can apply IRL1 [10] or PL-IRL1 to solve the given problem (14). The IRL1 or PL-IRL1 applied to the problem (14) is given as follows:

Model
and respectively. For solving the convex subproblem of IRL1, the optimality conditions are given as follows: This system (18) of equations does not have a closed form solution. Hence, the IRL1 cannot be employed to solve the problem (14). On the other hand, the optimality conditions of the convex subproblem of our method are given as These equations in (19) are separable, and each problem has a closed form solution: where shrink function is defined as shrink(a, b) = sign(a) · max(|a| − b, 0).
To show the convergence of the PL-IRL1 to solve problems (14) and (15), we perform the numerical experiments in the following setting. The size of A is n = 5000 and m = 2500. We use an orthonormal Gaussian measurement matrix A whose entries are randomly chosen by standard Gaussian distribution, and then each column of A is divided by its 2 norm. The number l of nonzero elements of the original sparse signal x 0 is fixed at 50, the locations of nonzero elements are selected randomly, and the values of nonzero elements are chosen by Gaussian distribution N (0, 10 2 ). The observed data b can be calculated by where n is the Gaussian mixture noise that consists of two Gaussian components, which is one of impulsive noise in signal processing [36]. The measured equation of n is given by where n 1 and n 2 are Gaussian noise with mean 0 and standard deviation η and qη. The n 1 denotes the background noise, and n 1 represents the influence of outliers. The parameter ν ∈ (0, 1) controls the proportion of the large outliers and q > 1 controls the strength of the outliers. Here, we fixed these parameters at (ν, q, η) = (0.1, √ 10, 0.02). Since A 2 = 1, the Lipschitz constant of ∇ f is 2β. So, δ is set to be β + 0.001. The regularization parameter µ is fixed at 0.2 for (14) or 1.5 for (15); the penalty parameter β is set to be 2 for (14) or 28 for (15); and the controlling parameter ρ of nonconvexity is set to be 0.1. For the stopping condition of our algorithm, we use relative errors over energy function values, whose specific formulation is given as For this setting, 100 different numerical tests are conducted.
In Table 1, we present the computing time, number of iterations, final energy value, and relative error. From the relative errors, we can observe that the proposed algorithm finds an approximated sparse solution with small error for all cases. In Figure 1, we illustrate the relative errors and energy values F i (x k ) over k iterations. Since the penalty parameter β for (14) is smaller than that for (15), the computing time of our method for solving (14) is faster than that for solving (15). We can also see similar results in terms of the number of iterations. Ultimately, we can show in Figure 1 and Table 1 the fast convergence of the PL-IRL1, and the final energy value has enough small value.

Numerical Results for PL-IRLS
Second, we present numerical results of PL-IRLS compared with the iPiano [17] method for the compressive sensing problem in signal processing. Specifically, the restoration of the sparse signal corrupted by additive Gaussian noise is considered. We apply the algorithms PL-IRLS and iPiano to the following unconstrained problems: and min x∈R n where β, µ are positive parameters, c is a positive constant, and ρ > 0 is the parameter that controls the nonconvexity of regularizing term. These problems are a nonconvex variations of lasso, which is a well-known model for compressive sensing. The objective functions of problems (20) and (21) are definable in the log-exp o-minimal structure, and they are also close, coercive KL functions. With setting all assumptions in Algorithm 4 are satisfied. Since the norm of the Hessian matrix of h is bounded on R n + , it has also a strictly continuous gradient. The PL-IRLS applied to the problems (20) and (21) can be obtained by for (20) cρ(|x k | + c) + 1 (1 + ρ(|x k | + c)) 2 for (21) , The subproblem in (22) is a quadratic problem, and its normal equation is given as follows: It can be rewritten as the following linear equation, where diag(w k ) is a diagonal matrix whose diagonal entries consist of w k . Since diag(w k ) + δI is a diagonal matrix, this linear equation can be solved exactly and easily. The majorization property of PL-IRLS is obtained from ([10] , Proposition 23) for any ρ > 0 and c > 0.
In this experiment, we use partial discrete cosine transform (DCT) matrices A whose i-th rows are selected from the n × n DCT matrix. We note that the partial DCT matrices are implicitly stored, i.e., matrix-vector multiplications in Ax or A T x are computed by the DCT or inverse of DCT. Hence, we can use partial DCT matrices of very large sizes. Here, the size n, m is fixed at (n, m) = (100,000, 30,000). Then, the original IRLS can be actually applied to the nonsmooth and nonconvex problem (20), which is given by for (20) cρ(|x k | + c) + 1 (1 + ρ(|x k | + c)) 2 for (21) , The optimality condition of the subproblem in the method (23) is Since the size of our measurement matrix is very large, finding the exact solution of this linear equation is time consuming and it seem to be impossible in many cases.
The number l of nonzero elements of the sparse signal x 0 is fixed at 2000, and the locations of nonzero elements are randomly chosen. The values of nonzero elements are selected from standard Gaussian distribution. The observed data b is measured by the formula b = Ax 0 + n, where n is the Gaussian white noise with mean 0 and standard deviation 0.02. The regularization parameters (β, µ) are fixed at (0.001, 1.5) for (20) or (0.001, 0.9) for (21) in all tests. The values of (ρ, c) are also fixed at (250, 0.001). The proximity parameter δ in our method is set to be µ 2 + 0.0001 because the 2 norm of a partial DCT matrix is less than or equal to 1. We present the mean values and standard deviations over 100 trials of computing time, number of iterations, energy value and relative error in Table 2. In Figure 2, we plot the relative errors and energy values of PL-IRLS and iPiano over iterations. Figure 2 shows the convergence of PL-IRLS and iPiano. For the tests, average values of the energy value and relative error for PL-IRLS are almost same with those for the iPiano. This shows that PL-IRLS and iPiano recover almost the same sparse solutions. Hence, it demonstrates the similar performance between PL-IRLS and iPiano in terms of the accuracy and optimization of energy functional. On the other hand, it can be observed in Table 2 and Figure 2 that PL-IRLS is faster than iPiano. Therefore, PL-IRLS is superior to iPiano for solving the nonconvex and nonsmooth problems (20) and (21). In conclusion, PL-IRLS gives better performance over iPiano.  (20) and (21). Left: Result for problem (20). Right: Result for problem (21).

Conclusions
In this paper, we proposed proximal linearized reweighted algorithms to solve nonconvex and nonsmooth unconstrained minimization problem (1). Based on the general unified framework, we suggested an extension of the iterative convex majorization-minimization method for solving (1). Moreover, extended versions of the iteratively reweighted 1 algorithm and iterative least square algorithm were also introduced. The global convergence of the proposed algorithm was also proved under uncertain assumptions. Lastly, the numerical results related to compressive sensing demonstrated that the proposed methods provides the outstanding performance compared with state-of-the-art methods. Recently, several algorithms were extended by imposing an additional inertial term, resulting in a faster convergence rate. In future, we will study a proximal linearized reweighted algorithm with an inertial force.