1. Introduction
Data generated in various fields often exhibit clear monotonicity, as seen in meteorological climate data [
1,
2], economic demand/supply curves [
3], and biological growth curves [
4]. Thus, this paper focuses on statistical models under order constraints. Specifically, suppose that we have
m observations
for
, where
is a vector with
n features and
is a response value. We concentrate on addressing the following optimization problem:
where
is an
data matrix,
is the response vector,
are given regularization parameters, and
. In high-dimensional statistical regression, it is common for the number of features to exceed the number of samples. Therefore, in our paper, we assume that
. The penalty term is composed of two components: the first enforces sparsity in the coefficient estimates by incorporating prior knowledge, and the second penalizes violations of monotonicity among adjacent pairs.
Problem (
1) is a generalization of a wide range of ordered convex problems, including an isotonic regression model [
5], nearly isotonic regression model [
6], and ordered lasso problem [
7]. The isotonic regression problem involves determining a vector
that approximates a given vector
while ensuring that
z exhibits a non-decreasing (or non-increasing) sequence, i.e.,
where
and
. Since the restricted monotonicity constraint may lead to a model too rigid and make it difficult to adapt to complex data structures, Tibshirani et al. [
6] relaxed this monotonicity constraint and considered the following nearly isotonic regression model:
where
is a given parameter. It is evident that problem (
1) is a generalization of problem (
3), as it addresses general regression issues and incorporates sparsity constraints on the coefficients. We refer to problem (
1) as the generalized convex nearly isotonic regression (
GCNIR) problem.
In addition, problem (
1) can also be regarded as a generalization of the following ordered lasso problem [
7]:
Clearly, problem (
4) extends the lasso problem by incorporating a monotonicity constraint on the absolute values of the coefficients. Like problem (
2), this approach can lead to an overly rigid model (
4). However, the
GCNIR problem (
1) mitigates the stringent monotonicity requirement of the ordered lasso, transforming it into a convex problem that is more flexible and tractable.
The
GCNIR problem (
1) can be reformulated into a convex quadratic programming (QP) problem by introducing new variables:
where
and
denote the all-ones column vector and
identity matrix, respectively. This implies that one can utilize the QP function “quadprog” provided by MATLAB or well-developed QP solvers, such as Gurobi and CPLEX [
8], to compute reformulation (
5) and thus solve problem (
1). However, since
has a size of
, the computational cost of solving and storing
becomes prohibitive, making it challenging to apply the aforementioned methods to large-scale problems.
Due to the challenges in solving the QP reformulation (
5), it is logical to adapt the methods used for the previously discussed problems to address problem (
1). The pool adjacent violators algorithm (PAVA) [
9] is a cornerstone method for tackling shape-constrained statistical regression problems, as discussed in [
10]. Initially developed for the isotonic regression model (
2), PAVA has been extended to accommodate the nearly isotonic regression model (
3), with adaptations such as the modified PAVA (MPAVA) [
6] and the generalized PAVA (GPAVA) [
2]. Despite its broad application, there is no theoretical guarantee that PAVA can be modified to tackle convex nonseparable minimization problems. Additionally, other approaches, such as the Generalized Proximal Gradient algorithm [
7] and the alternating direction method of multipliers (ADMM) [
11], have been proposed for solving the ordered lasso problem (
4). To our knowledge, most current techniques for dealing with ordered models rely primarily on first-order information from the associated nonsmooth optimization framework. Consequently, we aim to develop a customized algorithm that utilizes second-order information to address the
GCNIR problem more effectively.
This paper aims to develop a semismooth Newton-based augmented Lagrangian (
Ssnal) algorithm to address the
GCNIR problem (
1) from a dual viewpoint. The
Ssnal algorithm’s primary benefits include its superior convergence characteristics and reduced computational demands, which are achieved by exploiting second-order sparsity and employing efficient strategies within the semismooth Newton (
Ssn) algorithm. Furthermore, the
Ssnal algorithm has demonstrated its effectiveness in handling large-scale sparse convex models, as evidenced by its performance in applications such as Lasso [
12], group Lasso [
13], fused Lasso [
14], clustered Lasso [
15,
16], multi-task Lasso [
17],
trend filtering [
18], density matrix least squares problems [
19], the Dantzig selector [
20], and others [
21,
22,
23,
24]. Building on these successes, we propose to apply the
Ssnal algorithm to solve problem (
1).
The primary contributions of this paper are as follows. First, we calculate the proximal mapping related to the GCNIR regularizer and its generalized Jacobian. Second, we utilize the Ssnal algorithm to address the GCNIR problem from a dual perspective. Furthermore, by capitalizing on the low-rank properties and second-order sparsity inherent in the GCNIR problem, we significantly reduce the computational cost associated with the Ssn algorithm when solving the subproblems. Lastly, we perform a numerical analysis comparing our algorithm with first-order methods, including ADMM and the Accelerated Proximal Gradient (APG) method, demonstrating the efficiency and robustness of our approach.
The remaining sections of this paper are organized as follows.
Section 2 delves into the analysis of the proximal mapping associated with the
GCNIR regularizer and its generalized Jacobian.
Section 3 outlines the framework of the
Ssnal algorithm and discusses its convergence properties when applied to the dual formulation of the
GCNIR problem (
1). In
Section 4, we evaluate the performance of the
Ssnal algorithm through numerical experiments. Finally, we conclude the paper in
Section 5.
Notation. For any , “” represents a diagonal matrix with in its i-th diagonal component. “” refers to an absolute vector, where each entry i is . “” indicates the sign vector, i.e., when , when , and when is equal to zero. Additionally, the notation “” refers to the support of the element z, specifically the collection of indices for which is not equal to zero. For any positive integer n, and are the unit column vectors. , while . denotes the Moore–Penrose pseudoinverse of the matrix . Typically, denotes the Fenchel conjugate of a given function h.
2. The Proximal Mapping of the GCNIR Regularizer and Its Generalized Jacobian
In this section, we shall present some results concerning the proximal mapping linked to the GCNIR regularizer along with its generalized Jacobian, which are necessary for later analysis.
Given any scalar
, for any proper closed convex function
, the proximal mapping and Moreau envelope [
25] of
p is defined by
The Moreau identity [
26] holds, i.e.,
According to [
27],
is convex and continuously differentiable, and
Let
be the
GCNIR regularizer in (
1), i.e.,
where
.
Before diving into the proximal mapping associated with the
GCNIR regularizer
, we briefly introduce
and relevant results, which are discussed in [
14].
Define
, where
B is defined by
The proximal mapping
with respect to
is given as
Lemma 1. (See [
14], Lemma 1)
. For any given , we have that , where On the basis of the above lemma, we can now explicitly calculate . For later convenience, we define .
Proposition 1. For any given , it follows that for any , Proof. According to the definition of the proximal mapping, it holds that for any
,
It follows from ([
28], Corollary 4) that for any
,
This completes the proof. □
Next, we analyze the generalized Jacobian of
, which is crucial for leveraging computational efficiency. We begin with presenting some findings concerning the generalized HS-Jacobian for
, according to [
14,
29].
As noted in [
14], the generalized HS-Jacobian for
at
is given by
where
with
and
where
is an optimal Lagrangian multiplier for the constraint
and
is an active index set.
The multifunction
is given by
The subsequent proposition demonstrates that and may be regarded as the generalized HS-Jacobian for at , at , respectively.
Proposition 2. For all , there exists a neighborhood of w such that for any ,where . Proof. The results are derived from [
14] (Proposition 2) and [
29] (Lemma 2.1) with minor revisions. □
The multifunction
is defined as
where
The mapping
essentially acts as the generalized Jacobian for
at
w, which can be derived using the change-of-variables technique from previous work in [
14] (Theorem 2).
Theorem 4. Let λ and τ be non-negative real numbers, and let w be an element of . The set-valued function has the following properties: it is compact-valued, nonempty, and upper semicontinuous. For each , it can be concluded that the matrix V is symmetric and positive semidefinite. Furthermore, there exists a neighborhood of w that for any ,