Abstract
In this paper, optimality conditions for the group sparse constrained optimization (GSCO) problems are studied. Firstly, the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones of the group sparse set are derived. Secondly, by using tangent cones and normal cones, four types of stationary points for GSCO problems are given: -stationary point, -stationary point, -stationary point and -stationary point, which are used to characterize first-order optimality conditions for GSCO problems. Furthermore, both the relationship among the four types of stationary points and the relationship between stationary points and local minimizers are discussed. Finally, second-order necessary and sufficient optimality conditions for GSCO problems are provided.
1. Introduction
The sparsity of a vector means that few entries of the vector are non-zero, while the group sparsity of a vector means that non-zero entries or zero entries in the vector may have some group structures, that is, they appear in blocks in certain areas. A vector can be grouped according to the prior information of the group structure among the entries, and then each group is examined to see if they are zeros entirely. For example, genes of the same biological path can be regarded as a group in gene expression analysis, so when they are described by a vector, the vector has group sparsity. Since it was first proposed by Yuan and Lin [1] in 2006, the group sparse optimization has attracted much attention of researchers [2,3,4,5]. The aim of group sparse optimization is to seek a solution of group sparsity for a system. It is now known that group sparse optimization has broad applications in bioinformatics, pattern recognition, image restoration, neuroimaging and other fields [1,6,7,8]. For instance, we can restore the signal by use of group sparse optimization according to the prior information of its group sparse structure. Moreover, the stability of the recovery can be improved in the presence of noise while the accuracy of the recovery can be improved in the absence of noise [2]. In practical problems, it is more targeted to adopt the corresponding group sparse optimization model for problems with group sparse structure [9].
The general sparse constrained optimization has been researched by many authors and achieved a lot. Here we mention few of them. In [10], the authors proposed both concepts of restricted strong convexity and restricted strong smoothness to ensure the existence of unique solution for the sparse constrained optimization, and obtained the corresponding error bounds. In [11], the authors defined -stationary point and -stationary point for the sparse constrained optimization. Beck and Eldar [12] put forward three types of first-order necessary optimality conditions for sparse constraints optimization. One of them is the basic feasibility which is a generation of the necessary optimality condition for zero gradient in unconstrained optimization. Another one of them is the L-stationary point which is based on the fixed point condition and can be used to derive the iterative hard thresholding algorithm for solving sparse constrained optimization problems. As we all know, Calamai and Moŕe [13] introduced -stationary points and -stationary points to describe the optimal conditions for general constrained optimization problems. Although N-stationary points, L-stationary points and T-stationary points are equivalent for convex optimization problems, they are not equivalent for sparse constrained optimization problems because of the non-convexity. In [14], the authors provided a description of the tangent cone and the normal cone of the sparse set, and then used to describe the first-order optimality condition and the second-order optimality condition, furthermore, they extended the results to the optimization problems subjected to sparse constraints and non-negative constraints. Chen, Pan, and Xiu [15] characterized the solutions of three kinds of sparse optimization problems and investigated the relationship among them. Recently, Bian and Chen [16] gave an exact continuous relaxation problem for the sparsity penalty optimization problem, and proposed a smoothing proximal gradient for the relaxation problem.
However, the above works are mainly for general sparse optimization problems. Due to the complexity of the group sparse structure, there still lacks of research on group sparse constrained optimization problems. If the group sparsity is a penalty in the objective function, Peng and Chen [17] studied the first-order and second-order optimality conditions for the relaxation problems for group sparse optimization problems, while Pan and Chen [18] used a capped folded concave function to approximate the group sparsity function and showed that the solution set of the continuous approximation problem and the set of group sparse solutions are same.
This paper focuses on the following group sparse constrained optimization (GSCO) problem, that is,
where is a continuously differentiable function or a twice continuously differentiable function, is divided into m disjoint groups, denoted by with , and , counts the number of non-zero groups in , where is the vector norm of the ith group . Throughout this paper, for simplicity, denotes the vector norm. Let k be a positive integer with , and be a group sparse set.
Problem (1) is called GSCO due to the group structure in its entries. When and , Problem (1) reduces to the standard sparse constrained optimization.
Problem (1) is non-convex, non-smooth, and non-Lipschitz, for which the optimality conditions are of the theoretical importance. It is the basis of analyzing and solving the problem. The optimality conditions for constrained optimization are closely related to tangent cones and normal cones of the constraint set. We will use Boligand tangent cone, Clarke tangent cone and the corresponding normal cones of the group sparse set to describe optimality conditions for Problem (1).
This paper is organized as follows. In Section 2, some basic notations and definitions are introduced. In Section 3, the equivalent expressions of Boligand tangent cone, Clarke tangent cone, and the corresponding normal cones of the group sparse constraint set S are given. In Section 4, first-order optimality conditions for Problem (1) based on the tangent cones and normal cones of S are provided. The relationship between stationary points and local minimizers of Problem (1) is also discussed. In Section 5, second-order necessary and sufficient optimality conditions for Problem (1) are given. At last, a brief concluding remark is given in Section 6.
2. Notations and Definitions
In this section, we introduce some notations and preliminaries including the definitions of Boligand tangent cone, Clarke tangent cone and their corresponding normal cones.
For any with , the group support set of is denoted by
is the cardinality of the set , then , which means is the number of groups in that have nonzero -norm.
For the n-dimensional real number space , denotes the coordinate axis, and denotes the coordinate plane. Let denote the n-dimensional vector in which the entries in ith group are all ones and the other entries are all zeros. Let denote the n-dimensional vector in which the jth entry of the ith group is one and the other entries are all zeros.
For a smooth function , let
where denotes the jth entry in and denotes the jth entry in .
The following example shows that the group sparse structure is different from the sparse structure.
Example 1.
Let be a 3-dimensional vector. We show the different ways of grouping and the corresponding group sparsity of as follows.
- (1)
- Whenif then ;if then ;if then ;if then .
- (2)
- When ,if then ;if then ;if then .
- (3)
- When , ,if thenif then .
In the end of this section, we will introduce the definition of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones [19].
Definition 1
([19]). Let be an arbitrary nonemepty set. The Bouligand tangent cone , the Clarke tangent cone and their corresponding normal cone and to the set Ω at the point are defined as follows.
- (1)
- Bouligand tangent cone:
- (2)
- Fréchet normal cone:
- (3)
- Clarke tangent cone:
- (4)
- Clarke normal cone:
3. Tangent Cones and Normal Cones of the Group Sparse Set
Tangent cones and normal cones are widely used to describe optimality conditions for constrained optimization problems [19]. The following two theorems give the equivalent characterizations of Bouligand tangent cone, Clarke tangent cone and their corresponding normal cones to the group sparse constraint set S.
Theorem 1.
For any , the Boligand tangent cone and Fréchet normal cone to the group sparse set S at the point has the following equivalent expressions:
where , , is the ith group of , is the ith group of .
Proof.
(i) According to the definition of Bouligand tangent cone, we have
Firstly, we prove that
For any , there exists such that then
for any sufficiently large t. It follows from that
Since with , then . Due to , we obtain
Therefore,
According to and , then for any . Hence we get
Conversely, for any , take any sequence such that and , let , then . Since for any , we get
which means . It follows from that . Hence we obtain
From , we get
Hence we have , which means .
The above proof yields .
It is easy to prove that
(ii) According to the definition of Fréchet normal cone,
For any and any , it must hold .
If , we have
Since , for any , we have and . Thus we have , , and then
which, together with the arbitrariness of for , implies . Therefore, . It is easy to prove that .
If , for any , it holds
We also have , which also implies . Due to , and , it must hold , and then . □
Next, we give the equivalent characterizations of Clarke tangent cone and Clarke normal cone of the group sparse constraint set S.
Theorem 2.
For any , the Clarke tangent cone and the Clarke normal cone of the group sparse set S at have the following equivalent expressions:
Proof.
(i) According to the definition of Clarke tangent cone, we have
We first prove .
To prove , we assume, on the contrary, that there exists , but . Then there exists but , which implies that but .
Note that . For any , take such that
Let and
where is an -dimensional vector of all ones. Then
and thus , , and . For any , we have
Since , for any sufficiently large t, we have
Therefore, for any sufficiently large t, which means according to the definition of . This contradiction shows that .
To prove , let . For any , with and any with , we have
Let , then from (4), we get and
In addition, . It is easy to know that according to the definition of . From the arbitrariness of , we have
Therefore, we have proved that .
Since and for any , it must hold . Hence we get
It is easy to prove that , then
(ii) According to the definition of Clarke normal cone, we have
For any and any , we have
From (5), , then we get , and thus
which means due to the arbitrariness of . Therefore, . □
Obviously, the following relationship holds for Boligand tangent cone, Clarke normal cone and the corresponding normal cones of the group sparse set S at any point :
Remark 1.
In [14], the authors gave the expressions of tangent cone and normal cone to the sparse set . Theorems 1 and 2 in this paper are the extension of their results.
In the end of this section, we give an example of the tangent cones of S in .
Example 2.
Consider the group sparse set
where is the first group, and is the second group. Consider its Bouligand tangent cone and Clarke tangent cone at three points: and . It is easy to get the following statements: , , ; , , ;
; ;
; ;
; .
Therefore, , .
Figure 1 provides the figures of the above Bouligand tangent cones and Clarke tangent cones.
Figure 1.
Bouligand tangent cones and Clarke tangent cones of S in , where , and .
From example 3.1, we can see that the key of group sparsity is to survey whether each group as a whole is zero instead of checking whether each entry is zero.
4. First-Order Optimality Conditions for Problem (1)
The optimality conditions for optimization problems are usually closely related to their stationary points. In this section, we use Bouligand tangent cones, Clarke tangent cones and their corresponding normal cones to specifically describe the N-stationary points and T-stationary points of Problem (1), then based on the descriptions, we investigate the relationship among the stationary points and the relationship between stationary points and local minimizers.
Definition 2.
is called an -stationary point or -stationary point of Problem (1) if it meets the following conditions respectively:
- (i)
- -stationary point:
- (ii)
- -stationary point:
where stands for the sense of Bouligand or Clarke, and
is the projection gradient on Bouligand tangent cone or Clarke tangent cone.
Next, we will study the link between -stationary point and -stationary point of Problem (1).
Theorem 3.
Suppose , then the following statements hold for Problem (1):
- (i)
- If , then is an -stationary point ⇔ is a -stationary point;
- (ii)
- If , then is an -stationary point ⇔ is a -stationary point.
Proof.
(i) Let .
On one hand, suppose is an -stationary point of Problem (1), then
that is, . By Theorem 2, , then we have
i.e.,
It is easy to check that the converse is also true. That is, when , it holds that
On the other hand, suppose is a -stationary point of Problem (1), then
By Theorem 1, . Hence, in the case of , we have
Accordingly, we have
For , , then ; For , obviously, . Hence we get
According to , we have
It is easy to check that the converse is also true. That is, in the case of , the following equivalence holds
Combining (7) with (8), we can conclude that, when , is an -stationary point of Problem (1) if and only if it is a -stationary point of Problem (1).
(ii) In the case of , we first prove the equivalent relationship between -stationary point of Problem (1) and .
On one hand, suppose is an -stationary point of Problem (1), then
that is, . It follows from that . Hence the following implication holds
On the other hand, suppose . In the case of , by theorem 1, . Therefore
i.e., . Hence the following implication holds
From (9) and (10), we get the following equivalent relationship
that is, in the case of , is an -stationary point if and only if .
In the following part, we prove the equivalent relationship between -stationary point of Problem (1) and in the case of .
Suppose satisfies , then by Theorem 1,
That is,
Hence we get that for any satisfying , , .
For any , take such that and . Following from , we have
From , we obtain , and then
According to the arbitrariness of , we get . That is,
The proof is thus finished. □
Furthermore, for Problem (1), its -stationary point and -stationary point have the following equivalent relationship.
Theorem 4.
Proof.
On one hand, by Theorem 2, . Then we have the following equivalences:
On the other hand, by Theorem 2, . Then according to the definition of , we have that
Thus by directly computing, satisfies
Therefore, the following equivalent relationships hold:
The proof is thus complete. □
Next, we investigate the relationship among the four types of stationary points of Problem (1).
Theorem 5.
Let , then the following statements hold for Problem (1):
- (i)
- If is an -stationary point, then it must be an -stationary point;
- (ii)
- If is a -stationary point, then it must be a -stationary point.
Proof.
(i) Let is an -stationary point of Problem (1). There are two cases: and .
Case 1: . In this case, by (7), is an -stationary point if and only if
which, by (14), is equivalent to that is an -stationary point of Problem (1). Thus we obtain that -stationary point and -stationary point are equivalent in the case of .
Clearly, in the case of , if is an -stationary point of Problem (1), it must be an -stationary point (the converse is not true). That is,
(ii) According to Theorems 3 and 4, the -stationary point of Problem (1) is equivalent to its -stationary point, and the -stationary point of Problem (1) is equivalent to its -stationary point, this is,
Moreover, from (16),
Therefore,
The proof is finished. □
To have a clear presentation, based on the proofs of Theorems 3 and 4, we use Table 1 to display the characterizations of the four types of stationary points of Problem (1).
Table 1.
The characterizations of -, -, -, - stationary point for Problem (1).
In the end of this section, we discuss the relationship between the local minimizers of Problem (1) and its stationary points.
Theorem 6.
- (i)
- is an -stationary point and hence an -stationary point;
- (ii)
- is a -stationary point and hence a -stationary point.
Proof.
Due to , there are two cases: and .
Case 1: . In this case, , then
By the optimality conditions for the above problems, we have
That is, .
Case 2: . In this case,
It can be derived that , . That is, ,
Combining the above two cases with (7) and (11), we know that is an -stationary point of Problem (1). From Theorem 5, is also an -stationary point of Problem (1).
(ii) From (i), is both -stationary point and -stationary point. According to Theorems 3 and 4, is both -stationary point and -stationary point. The proof is complete. □
As a summary of this section, we conclude the relationship among local minimizers and the four stationary points of Problem (1) as follows:
5. Second-Order Optimality Conditions for Problem (1)
In this section, we provide some second-order necessary or sufficient optimality conditions for Problem (1) by use of Clarke tangent cone.
Theorem 7
(Second-order necessary condition). Let be a local minimizer of Problem (1), then for any , it must hold that and
where is the Hessian matrix of f at .
Proof.
According to (5), for any ,
Thus, for any , it holds
In addition, since is a local minimizer of Problem (1), for sufficiently small and any , we have
By Taylor’s Theorem,
Hence,
which implies , . The desired result is derived. □
Finally, we give a second-order sufficient condition for the optimality of Problem (1).
Theorem 8
(Second-order sufficient condition). Let be an -stationary point of Problem (1), if for any , it holds , then the following two statements hold:
- (i)
- is a strictly local minimizer of Problem (1);
- (ii)
- satisfies the second-order growth condition, that is, there are and such that for any ,
where .
Proof.
(i) Since is an -stationary point of Problem (1), from Theorem 2, we have
For any , by (5),
Then for any , it holds
By Taylor’s Theorem, for any sufficiently small ,
Since , then for any sufficiently small ,
Therefore, is a strictly local minimizer of Problem (1).
(ii) Assume, on the contrary, that the second-order growth condition does not hold at , then there is a sequence such that but
Let , then . Since is bounded, without loss of generality, suppose , then .
It follows that . Due to , we have , then
for any sufficiently large t. From , we get
Hence, for any , we have , which together with (5) yields
By Taylor’s Theorem,
Since , we have
Under the assumption that , we obtain
Letting , we get
which contradicts the condition that holds for any . Therefore, the second-order growth condition must hold at . □
6. Concluding Remarks
In this paper, the first-order optimality conditions are built for group sparsity constrained optimization problems by use of Bouligand tangent cone, Clarke tangent and their corresponding normal cones, and the relationship among the local minimizers and the four types of stationary points of Problem (1) is investigated. Furthermore, the second-order sufficient and second-order necessary optimality conditions for group sparsity constrained optimization problems are provided. The results show that -stationary points of Problem (1) may be strictly local minimizers, and even can fulfill the second-order growth condition under some mild conditions. The results provide the theoretical basis for analyzing or solving the group sparsity constrained optimization problems. In the future, we will use the optimality conditions to design algorithms for solving the problems.
Author Contributions
Methodology, D.P.; Project administration, D.P.; Supervision, D.P.; Writing original draft, W.W.; Writing review and editing, D.P. All authors have read and agreed to the published version of the manuscript.
Funding
This research was supported by NSFC (11861020), the Growth Project of Education Department of Guizhou Province for Young Talents in Science and Technology ([2018]121), the Foundation for Selected Excellent Project of Guizhou Province for High-level Talents Back from Overseas ([2018]03), and the Science and Technology Planning Project of Guizhou Province ([2018]5781).
Conflicts of Interest
The authors declare no conflict of interest.
References
- Yuan, M.; Lin, Y. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 2006, 68, 49–67. [Google Scholar] [CrossRef]
- Huang, J.; Breheny, P.; Ma, S. A selective review of group selection in high-dimensional models. Stat. Sci. 2012, 27, 481–499. [Google Scholar] [CrossRef] [PubMed]
- Huang, J.; Ma, S.; Xue, H.; Zhang, C.H. A group bridge approach for variable selection. Biometrika 2009, 96, 339–355. [Google Scholar] [CrossRef] [PubMed]
- Meier, L.; van de Geer, S.; Bühlmann, P. The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 2008, 70, 53–71. [Google Scholar] [CrossRef]
- Yang, Y.; Zou, H. A fast unified algorithm for solving group-lasso penalize learning problems. Stat. Comput. 2015, 25, 1129–1141. [Google Scholar] [CrossRef]
- Beck, A.; Hallak, N. Optimization involving group sparsity terms. Math. Program. 2018, 178, 39–67. [Google Scholar] [CrossRef]
- Hu, Y.; Li, C.; Meng, K.; Qin, J.; Yang, X. Group sparse optimization via ℓp,q regularization. J. Mach. Learn. Res. 2017, 18, 1–52. [Google Scholar]
- Jiao, Y.; Jin, B.; Lu, X. Group sparse recovery via the ℓ0(ℓ2) penalty: Theory and algorithm. IEEE Trans. Signal Process. 2017, 65, 998–1012. [Google Scholar] [CrossRef]
- Huang, J.; Zhang, T. The benefit of group sparsity. Ann. Stat. 2010, 38, 1978–2004. [Google Scholar] [CrossRef]
- Agarwal, A.; Negahban, S.; Wainwright, M.J. Fast global convergence rates of gradient methods for high-dimensional statistical recovery. Int. Conf. Neural Inf. Process. Syst. 2010, 23, 37–45. [Google Scholar]
- Attouch, H.; Bolte, J.; Svaiter, B.F. Convergence of descent methods for semi-algebraic and tame problems: Proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. Program. 2013, 137, 91–129. [Google Scholar] [CrossRef]
- Beck, A.; Eldar, Y. Sparsity constrained nonlinear optimization: Optimality conditions and algorithms. SIAM J. Optim. 2013, 23, 1480–1509. [Google Scholar] [CrossRef]
- Calamai, P.H.; More, J.J. Projection gradient methods for linearly constrained problems. Math. Program. 1987, 39, 93–116. [Google Scholar] [CrossRef]
- Pan, L.L.; Xiu, N.H.; Zhou, S.L. On Solutions of Sparsity Constrained Optimization. J. Oper. Res. Soc. China 2015, 3, 421–439. [Google Scholar] [CrossRef]
- Chen, X.J.; Pan, L.L.; Xiu, N.H. Solution sets of three sparse optimization problems for multivariate regression. Appl. Comput. Harmon. A 2020, revised. [Google Scholar]
- Bian, W.; Chen, X.J. A smoothing proximal gradient algorithm for nonsmooth convex regression with cardinality penalty. SIAM J. Numer. Anal. 2020, 58, 858–883. [Google Scholar] [CrossRef]
- Peng, D.T.; Chen, X.J. Computation of second-order directional stationary points for group sparse optimization. Optim. Methods Softw. 2020, 35, 348–376. [Google Scholar] [CrossRef]
- Pan, L.L.; Chen, X.J. Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging Sci. 2021. Available online: https://www.polyu.edu.hk/ama/staff/xjchen/Re_gsparseAugust.pdf (accessed on 5 November 2020).
- Rockafellar, R.T.; Wets, R.J. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).