Optimality Conditions for Group Sparse Constrained Optimization Problems

: In this paper, optimality conditions for the group sparse constrained optimization (GSCO) problems are studied. Firstly, the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones of the group sparse set are derived. Secondly, by using tangent cones and normal cones, four types of stationary points for GSCO problems are given: T B - stationary point, N B -stationary point, T C -stationary point and N C -stationary point, which are used to characterize ﬁrst-order optimality conditions for GSCO problems. Furthermore, both the relationship among the four types of stationary points and the relationship between stationary points and local minimizers are discussed. Finally, second-order necessary and sufﬁcient optimality conditions for GSCO problems are provided.


Introduction
The sparsity of a vector means that few entries of the vector are non-zero, while the group sparsity of a vector means that non-zero entries or zero entries in the vector may have some group structures, that is, they appear in blocks in certain areas. A vector can be grouped according to the prior information of the group structure among the entries, and then each group is examined to see if they are zeros entirely. For example, genes of the same biological path can be regarded as a group in gene expression analysis, so when they are described by a vector, the vector has group sparsity. Since it was first proposed by Yuan and Lin [1] in 2006, the group sparse optimization has attracted much attention of researchers [2][3][4][5]. The aim of group sparse optimization is to seek a solution of group sparsity for a system. It is now known that group sparse optimization has broad applications in bioinformatics, pattern recognition, image restoration, neuroimaging and other fields [1,[6][7][8]. For instance, we can restore the signal by use of group sparse optimization according to the prior information of its group sparse structure. Moreover, the stability of the recovery can be improved in the presence of noise while the accuracy of the recovery can be improved in the absence of noise [2]. In practical problems, it is more targeted to adopt the corresponding group sparse optimization model for problems with group sparse structure [9].
The general sparse constrained optimization has been researched by many authors and achieved a lot. Here we mention few of them. In [10], the authors proposed both concepts of restricted strong convexity and restricted strong smoothness to ensure the existence of unique solution for the sparse constrained optimization, and obtained the corresponding error bounds. In [11], the authors defined N B -stationary point and N C -stationary point for the sparse constrained optimization. Beck and Eldar [12] put forward three types of first-order necessary optimality conditions for sparse constraints optimization. One of them is the basic feasibility which is a generation of the necessary optimality condition for zero gradient in unconstrained optimization. Another one of them is the L-stationary point which is based on the fixed point condition and can be used to derive the iterative hard thresholding algorithm for solving sparse constrained optimization problems. As we all know, Calamai and More [13] introduced T B -stationary points and T C -stationary points to describe the optimal conditions for general constrained optimization problems. Although N-stationary points, L-stationary points and T-stationary points are equivalent for convex optimization problems, they are not equivalent for sparse constrained optimization problems because of the non-convexity. In [14], the authors provided a description of the tangent cone and the normal cone of the sparse set, and then used to describe the first-order optimality condition and the second-order optimality condition, furthermore, they extended the results to the optimization problems subjected to sparse constraints and non-negative constraints. Chen, Pan, and Xiu [15] characterized the solutions of three kinds of sparse optimization problems and investigated the relationship among them. Recently, Bian and Chen [16] gave an exact continuous relaxation problem for the sparsity penalty optimization problem, and proposed a smoothing proximal gradient for the relaxation problem.
However, the above works are mainly for general sparse optimization problems. Due to the complexity of the group sparse structure, there still lacks of research on group sparse constrained optimization problems. If the group sparsity is a penalty in the objective function, Peng and Chen [17] studied the first-order and second-order optimality conditions for the relaxation problems for group sparse optimization problems, while Pan and Chen [? ] used a capped folded concave function to approximate the group sparsity function and showed that the solution set of the continuous approximation problem and the set of group sparse solutions are same.
This paper focuses on the following group sparse constrained optimization (GSCO) problem, that is, where f : R n → R is a continuously differentiable function or a twice continuously differentiable function, x ∈ R n is divided into m disjoint groups, denoted by x = (x 1 , · · · , x m ) with x i = (x i(1) , · · · , x i(n i ) ) ∈ R n i , i = 1, · · · , m and ∑ m i=1 n i = n, n i ≥ 1, x 2,0 := ∑ m i=1 { x i 2 = 0} counts the number of non-zero groups in x, where x i 2 is the 2 vector norm of the ith group x i . Throughout this paper, for simplicity, · denotes the 2 vector norm. Let k be a positive integer with k ≤ m ≤ n, and S := {x : x 2,0 ≤ k} be a group sparse set.
Problem (1) is called GSCO due to the group structure in its entries. When m = n and n i = 1, i = 1, · · · , m, Problem (1) reduces to the standard sparse constrained optimization.
Problem (1) is non-convex, non-smooth, and non-Lipschitz, for which the optimality conditions are of the theoretical importance. It is the basis of analyzing and solving the problem. The optimality conditions for constrained optimization are closely related to tangent cones and normal cones of the constraint set. We will use Boligand tangent cone, Clarke tangent cone and the corresponding normal cones of the group sparse set to describe optimality conditions for Problem (1). This paper is organized as follows. In Section 2, some basic notations and definitions are introduced. In Section 3, the equivalent expressions of Boligand tangent cone, Clarke tangent cone, and the corresponding normal cones of the group sparse constraint set S are given. In Section 4, first-order optimality conditions for Problem (1) based on the tangent cones and normal cones of S are provided. The relationship between stationary points and local minimizers of Problem (1) is also discussed. In Section 5, second-order necessary and sufficient optimality conditions for Problem (1) are given. At last, a brief concluding remark is given in Section 6.

Notations and Definitions
In this section, we introduce some notations and preliminaries including the definitions of Boligand tangent cone, Clarke tangent cone and their corresponding normal cones.
For any x = (x 1 , x 2 , · · · , x m ) ∈ R n with x i ∈ R n i , the group support set of x is denoted by |Γ(x)| is the cardinality of the set Γ(x), then x 2,0 = |Γ(x)|, which means x 2,0 is the number of groups in x that have nonzero 2 -norm. For the n-dimensional real number space R n , R x i denotes the x i coordinate axis, and R 2 x i x j denotes the x i Ox j coordinate plane. Let e i ∈ R n denote the n-dimensional vector in which the entries in ith group are all ones and the other entries are all zeros. Let e ij (i = 1, · · · , m, j = 1, · · · , n i ) denote the n-dimensional vector in which the jth entry of the ith group is one and the other entries are all zeros.
For a smooth function f : The following example shows that the group sparse structure is different from the sparse structure.
We show the different ways of grouping and the corresponding group sparsity of x as follows.
(1) When (2) When x = (x 1 , (x 2 , x 3 )) , n 1 = 1, In the end of this section, we will introduce the definition of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones [19]. Definition 1. [19] Let Ω ⊆ R n be an arbitrary nonemepty set. The Bouligand tangent cone T B Ω ( x), the Clarke tangent cone T C Ω ( x) and their corresponding normal cone N B Ω ( x) and N C Ω ( x) to the set Ω at the point x ∈ Ω are defined as follows.

Tangent Cones and Normal Cones of the Group Sparse Set S
Tangent cones and normal cones are widely used to describe optimality conditions for constrained optimization problems [19]. The following two theorems give the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones to the group sparse constraint set S.

Theorem 1.
For any x ∈ S, the Boligand tangent cone T B S ( x) and Fréchet normal cone N B S ( x) to the group sparse set S at the point x has the following equivalent expressions: Proof. (i) According to the definition of Bouligand tangent cone, we have Firstly, we prove that Therefore, (3) Hence we obtain (ii) According to the definition of Fréchet normal cone, We also have u, Next, we give the equivalent characterizations of Clarke tangent cone and Clarke normal cone of the group sparse constraint set S.

Theorem 2.
For any x ∈ S, the Clarke tangent cone and the Clarke normal cone of the group sparse set S at x have the following equivalent expressions: Proof. (i) According to the definition of Clarke tangent cone, we have Let λ t = 1 t 2 ↓ 0 and where 1 n i is an n i -dimensional vector of all ones. Then and thus {x t } ⊆ S, x t i 0 = 0, and lim t→∞ x t = x. For any y t → d, we have Since y t i 0 → d i 0 = 0, for any sufficiently large t, we have Therefore, x t + λ t y t / ∈ S for any sufficiently large t, which means d / ∈ T C S ( x) according to the definition of T C S ( x). This contradiction shows that For any {x t } ⊆ S, with lim t→∞ x t = x and any {λ t } ⊂ R + with lim t→∞ λ t = 0, we have Let y t = x t − x + d, then from (4), we get Γ(y t ) ⊆ Γ(x t ) and In addition, lim It is easy to prove that {d ∈ R n : (ii) According to the definition of Clarke normal cone, we have For any d ∈ T C S ( x) and any u ∈ N C S ( x), we have Obviously, the following relationship holds for Boligand tangent cone, Clarke normal cone and the corresponding normal cones of the group sparse set S at any point x ∈ S:

Remark 1.
In [14], the authors gave the expressions of tangent cone and normal cone to the sparse set {x ∈ R n : x 0 ≤ k} . Theorems 1 and 2 in this paper are the extension of their results.
In the end of this section, we give an example of the tangent cones of S in R 3 .

Example 2.
Consider the group sparse set where x 1 is the first group, and (x 2 , x 3 ) is the second group. Consider its Bouligand tangent cone and Clarke tangent cone at three points: x 1 = (0, (1, 1)) , x 2 = (0, (1, 0)) and x 3 = (1, (0, 0)) . It is easy to get the following statements: Figure 1 provides the figures of the above Bouligand tangent cones and Clarke tangent cones. From example 3.1, we can see that the key of group sparsity is to survey whether each group as a whole is zero instead of checking whether each entry is zero.

First-Order Optimality Conditions for Problem (1)
The optimality conditions for optimization problems are usually closely related to their stationary points. In this section, we use Bouligand tangent cones, Clarke tangent cones and their corresponding normal cones to specifically describe the N-stationary points and T-stationary points of Problem (1), then based on the descriptions, we investigate the relationship among the stationary points and the relationship between stationary points and local minimizers. (1) if it meets the following conditions respectively : (i) N -stationary point: 0 ∈ ∇ f (x * ) + N S (x * ); (ii) T -stationary point: 0 = ∇ S f (x * ) ; where ∈ {B, C} stands for the sense of Bouligand or Clarke, and

Definition 2. x * ∈ S is called an N -stationary point or T -stationary point of Problem
is the projection gradient on Bouligand tangent cone or Clarke tangent cone.
Next, we will study the link between N B -stationary point and T B -stationary point of Problem (1). Theorem 3. Suppose x * ∈ S, then the following statements hold for Problem (1): (i) If x * 2,0 = k, then x * is an N B -stationary point ⇔ x * is a T B -stationary point; (ii) If x * 2,0 < k, then x * is an N B -stationary point ⇔ ∇ f (x * ) = 0 ⇔ x * is a T B -stationary point.
Proof. (i) Let x * 2,0 = k. On one hand, suppose x * ∈ S is an N B -stationary point of Problem (1), then It is easy to check that the converse is also true. That is, when x * 2,0 = k, it holds that On the other hand, suppose x * ∈ S is a T B -stationary point of Problem (1), then By Theorem 1, T B S (x * ) = {d ∈ R n : d 2,0 ≤ k, x * + γd 2,0 ≤ k, ∀γ ∈ R}. Hence, in the case of x * 2,0 = k, we have Accordingly, we have It is easy to check that the converse is also true. That is, in the case of x * 2,0 = k, the following equivalence holds Combining (7) with (8), we can conclude that, when x * 2,0 = k, x * is an N B -stationary point of Problem (1) if and only if it is a T B -stationary point of Problem (1).
(ii) In the case of x * 2,0 < k, we first prove the equivalent relationship between N B -stationary point of Problem (1) and ∇ f (x * ) = 0.
On one hand, suppose x * ∈ S is an N B -stationary point of Problem (1), then Hence the following implication holds On the other hand, suppose ∇ f (x * ) = 0. In the case of x * 2,0 < k, by theorem 1, . Hence the following implication holds From (9) and (10), we get the following equivalent relationship that is, in the case of x * 2,0 < k, x * is an N B -stationary point if and only if ∇ f (x * ) = 0. In the following part, we prove the equivalent relationship between T B -stationary point of Problem (1) and ∇ f (x * ) = 0 in the case of x * 2,0 < k. Suppose x * ∈ S satisfies ∇ f (x * ) = 0, then by Theorem 1, That is, Conversely, suppose x * is a T B -stationary point of Problem (1), i.e., then by Theorem 1, For any i 0 ∈ {1, 2, · · · , m}, taked ∈ R n such that Γ(d) = {i 0 } andd i 0 = −(∇ f (x * )) i 0 . Following from |Γ(x * )| = x * 2,0 < k, we have According to the arbitrariness of i 0 , we get ∇ f (x * ) = 0. That is, Combining (12) with (13), in the case of x * < k, the following equivalent relationship holds The proof is thus finished.
Furthermore, for Problem (1), its N C -stationary point and T C -stationary point have the following equivalent relationship. (1), let x * ∈ S, then x * is an N C -stationary point if and only if it is a T C -stationary point.

Theorem 4. For Problem
Proof. On one hand, by Theorem 2, N C S (x * ) = {u ∈ R n : u i = 0, i ∈ Γ(x * )}. Then we have the following equivalences: On the other hand, by Theorem 2, T C S (x * ) = {d ∈ R n : d i = 0, i / ∈ Γ(x * )}. Then according to the definition of ∇ C S f (x * ), we have that Thus by directly computing, ∇ C S f (x * ) satisfies Therefore, the following equivalent relationships hold: x * is a T C -stationary point of Problem (1) Combine (14) and (15), then we get the following equivalent relationships: The proof is thus complete.
Next, we investigate the relationship among the four types of stationary points of Problem (1).
Theorem 5. Let x * ∈ S, then the following statements hold for Problem (1): (i) If x * is an N B -stationary point, then it must be an N C -stationary point; (ii) If x * is a T B -stationary point, then it must be a T C -stationary point.
Proof. (i) Let x * is an N B -stationary point of Problem (1). There are two cases: x * 2,0 = k and x * 2,0 < k. Case 1: x * 2,0 = k. In this case, by (7), x * is an N B -stationary point if and only if which, by (14), is equivalent to that x * is an N C -stationary point of Problem (1). Thus we obtain that N B -stationary point and N C -stationary point are equivalent in the case of x * 2,0 = k.
Clearly, in the case of x * 2,0 < k, if x * is an N B -stationary point of Problem (1), it must be an N C -stationary point (the converse is not true). That is, (ii) According to Theorems 3 and 4, the N B -stationary point of Problem (1) is equivalent to its T B -stationary point, and the N C -stationary point of Problem (1) is equivalent to its T C -stationary point, this is, Moreover, from (16), The proof is finished.
To have a clear presentation, based on the proofs of Theorems 3 and 4, we use Table 1 to display the characterizations of the four types of stationary points of Problem (1).  Stationary Point In the end of this section, we discuss the relationship between the local minimizers of Problem (1) and its stationary points. Theorem 6. Let x * ∈ S be a local minimizer of Problem (1), then the following two statements hold: (i) x * is an N B -stationary point and hence an N C -stationary point; (ii) x * is a T B -stationary point and hence a T C -stationary point.
Combining the above two cases with (7) and (11), we know that x * is an N B -stationary point of Problem (1). From Theorem 5, x * is also an N C -stationary point of Problem (1).
(ii) From (i), x * is both N B -stationary point and N C -stationary point. According to Theorems 3 and 4, x * is both T B -stationary point and T C -stationary point. The proof is complete.
As a summary of this section, we conclude the relationship among local minimizers and the four stationary points of Problem (1) as follows: local minimizer ⇒ N B -stationary point ⇔ T B -stationary point ⇓ ⇓ N C -stationary point ⇔ T C -stationary point.

Second-Order Optimality Conditions for Problem (1)
In this section, we provide some second-order necessary or sufficient optimality conditions for Problem (1) by use of Clarke tangent cone.
Theorem 7 (Second-order necessary condition). Let x * ∈ S be a local minimizer of Problem (1), then for any d ∈ T C S (x * ), it must hold that d ∇ f (x * ) = 0 and where ∇ 2 f (x * ) is the Hessian matrix of f at x * .
Proof. Since x * ∈ S is a local minimizer of Problem (1), by Theorem 6, x * is also an N C -stationary point. By (14), According to (5), for any d ∈ T C S (x * ), Thus, for any d ∈ T C S (x * ), it holds In addition, since x * is a local minimizer of Problem (1), for sufficiently small α > 0 and any d ∈ T C S (x * ), we have By Taylor's Theorem, Combine (17)- (19), then which implies d ∇ 2 f (x * )d ≥ 0, ∀d ∈ T C S (x * ). The desired result is derived. Finally, we give a second-order sufficient condition for the optimality of Problem (1).
Hence, for any z t ∈ R n Γ(x * ) \ {0}, we have z t ∈ T C S (x * ) \ {0}, which together with (5) yields By Taylor's Theorem, Under the assumption that f (x t ) < f (x * ) + 1 t x t − x * 2 , we obtain Letting t → ∞, we get which contradicts the condition that d ∇ 2 f (x * )d > 0 holds for any d ∈ T C S (x * ) \ {0}. Therefore, the second-order growth condition must hold at x * .

Concluding Remarks
In this paper, the first-order optimality conditions are built for group sparsity constrained optimization problems by use of Bouligand tangent cone, Clarke tangent and their corresponding normal cones, and the relationship among the local minimizers and the four types of stationary points of Problem (1) is investigated. Furthermore, the second-order sufficient and second-order necessary optimality conditions for group sparsity constrained optimization problems are provided. The results show that N C -stationary points of Problem (1) may be strictly local minimizers, and even can fulfill the second-order growth condition under some mild conditions. The results provide the theoretical basis for analyz-ing or solving the group sparsity constrained optimization problems. In the future, we will use the optimality conditions to design algorithms for solving the problems.