A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem

Xu, Yanmei; Lin, Lanyu; Liu, Yong-Jin

doi:10.3390/math13030501

Open AccessArticle

A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem

by

Yanmei Xu

¹,

Lanyu Lin

^1,* and

Yong-Jin Liu

²

¹

School of Mathematics and Statistics, Fuzhou University, Fuzhou 350108, China

²

Center for Applied Mathematics of Fujian Province, School of Mathematics and Statistics, Fuzhou University, Fuzhou 350108, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(3), 501; https://doi.org/10.3390/math13030501

Submission received: 9 January 2025 / Revised: 28 January 2025 / Accepted: 29 January 2025 / Published: 2 February 2025

Download Versions Notes

Abstract

The generalized convex nearly isotonic regression problem addresses a least squares regression model that incorporates both sparsity and monotonicity constraints on the regression coefficients. In this paper, we introduce an efficient semismooth Newton-based augmented Lagrangian (Ssnal) algorithm to solve this problem. We demonstrate that, under reasonable assumptions, the Ssnal algorithm achieves global convergence and exhibits a linear convergence rate. Computationally, we derive the generalized Jacobian matrix associated with the proximal mapping of the generalized convex nearly isotonic regression regularizer and leverage the second-order sparsity when applying the semismooth Newton method to the subproblems in the Ssnal algorithm. Numerical experiments conducted on both synthetic and real datasets clearly demonstrate that our algorithm significantly outperforms first-order methods in terms of efficiency and robustness.

Keywords:

generalized convex nearly isotonic regression; augmented Lagrangian algorithm; semismooth Newton method

MSC:

90C06; 90C25; 90C90

1. Introduction

Data generated in various fields often exhibit clear monotonicity, as seen in meteorological climate data [1,2], economic demand/supply curves [3], and biological growth curves [4]. Thus, this paper focuses on statistical models under order constraints. Specifically, suppose that we have m observations

(a_{i}, b_{i})

for

i = 1, \dots, m

, where

a_{i} = (a_{i 1}, \dots, a_{i n})

is a vector with n features and

b_{i}

is a response value. We concentrate on addressing the following optimization problem:

min_{z \in R^{n}} \{\frac{1}{2} {∥ A z - b ∥}^{2} + λ {∥ z ∥}_{1} + τ \sum_{i = 1}^{n - 1} {(z_{i} - z_{i + 1})}_{+}\},

(1)

where

A = (a_{1}, \dots, a_{m})

is an

m \times n

data matrix,

b = {(b_{1}, \dots, b_{m})}^{⊤}

is the response vector,

λ, τ \geq 0

are given regularization parameters, and

{(z)}_{+} = \max (z, 0)

. In high-dimensional statistical regression, it is common for the number of features to exceed the number of samples. Therefore, in our paper, we assume that

m \leq n

. The penalty term is composed of two components: the first enforces sparsity in the coefficient estimates by incorporating prior knowledge, and the second penalizes violations of monotonicity among adjacent pairs.

Problem (1) is a generalization of a wide range of ordered convex problems, including an isotonic regression model [5], nearly isotonic regression model [6], and ordered lasso problem [7]. The isotonic regression problem involves determining a vector

z \in R^{n}

that approximates a given vector

y \in R^{n}

while ensuring that z exhibits a non-decreasing (or non-increasing) sequence, i.e.,

\min_{z \in R^{n}} \{\sum_{i = 1}^{n} {(y_{i} - z_{i})}^{2} | z_{1} \leq z_{2} \leq \dots \leq z_{n}\},

(2)

where

z = {(z_{1}, z_{2}, \dots, z_{n})}^{⊤}

and

y = {(y_{1}, y_{2}, \dots, y_{n})}^{⊤}

. Since the restricted monotonicity constraint may lead to a model too rigid and make it difficult to adapt to complex data structures, Tibshirani et al. [6] relaxed this monotonicity constraint and considered the following nearly isotonic regression model:

min_{z \in R^{n}} \{\frac{1}{2} \sum_{i = 1}^{n} {(y_{i} - z_{i})}^{2} + τ \sum_{i = 1}^{n - 1} {(z_{i} - z_{i + 1})}_{+}\},

(3)

where

τ \geq 0

is a given parameter. It is evident that problem (1) is a generalization of problem (3), as it addresses general regression issues and incorporates sparsity constraints on the coefficients. We refer to problem (1) as the generalized convex nearly isotonic regression (GCNIR) problem.

In addition, problem (1) can also be regarded as a generalization of the following ordered lasso problem [7]:

\begin{matrix} min_{z \in R^{n}} & \frac{1}{2} {∥ A z - b ∥}^{2} + λ {∥ z ∥}_{1} \\ s . t . & | z_{1} | \geq | z_{2} | \geq \dots \geq | z_{n} | . \end{matrix}

(4)

Clearly, problem (4) extends the lasso problem by incorporating a monotonicity constraint on the absolute values of the coefficients. Like problem (2), this approach can lead to an overly rigid model (4). However, the GCNIR problem (1) mitigates the stringent monotonicity requirement of the ordered lasso, transforming it into a convex problem that is more flexible and tractable.

The GCNIR problem (1) can be reformulated into a convex quadratic programming (QP) problem by introducing new variables:

\begin{matrix} min_{s \in R^{4 n}} & \frac{1}{2} s^{⊤} H s + 〈 v, s 〉 \\ s . t . & W s = 0, \\ s \geq 0, \end{matrix}

(5)

where

\begin{matrix} v = [\begin{matrix} λ 1_{n} - A^{⊤} b \\ λ 1_{n} + A^{⊤} b \\ τ 1_{n} \\ 0_{n \times 1} \end{matrix}], H = {[\begin{matrix} A^{⊤} A & - A^{⊤} A & 0 & 0 \\ - A^{⊤} A & A^{⊤} A & 0 & 0 \\ 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 \end{matrix}]}_{4 n \times 4 n}, \end{matrix}

\begin{matrix} W = [\begin{matrix} S, - S, - I_{n}, I_{n} \end{matrix}], S = {[\begin{matrix} 1 & - 1 & 0 & \dots & 0 \\ 0 & 1 & - 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 & - 1 \\ 0 & 0 & \dots & 0 & 0 \end{matrix}]}_{n \times n}, \end{matrix}

1_{n}

and

I_{n}

denote the all-ones column vector and

n \times n

identity matrix, respectively. This implies that one can utilize the QP function “quadprog” provided by MATLAB or well-developed QP solvers, such as Gurobi and CPLEX [8], to compute reformulation (5) and thus solve problem (1). However, since

A^{⊤} A

has a size of

n \times n

, the computational cost of solving and storing

A^{⊤} A

becomes prohibitive, making it challenging to apply the aforementioned methods to large-scale problems.

Due to the challenges in solving the QP reformulation (5), it is logical to adapt the methods used for the previously discussed problems to address problem (1). The pool adjacent violators algorithm (PAVA) [9] is a cornerstone method for tackling shape-constrained statistical regression problems, as discussed in [10]. Initially developed for the isotonic regression model (2), PAVA has been extended to accommodate the nearly isotonic regression model (3), with adaptations such as the modified PAVA (MPAVA) [6] and the generalized PAVA (GPAVA) [2]. Despite its broad application, there is no theoretical guarantee that PAVA can be modified to tackle convex nonseparable minimization problems. Additionally, other approaches, such as the Generalized Proximal Gradient algorithm [7] and the alternating direction method of multipliers (ADMM) [11], have been proposed for solving the ordered lasso problem (4). To our knowledge, most current techniques for dealing with ordered models rely primarily on first-order information from the associated nonsmooth optimization framework. Consequently, we aim to develop a customized algorithm that utilizes second-order information to address the GCNIR problem more effectively.

This paper aims to develop a semismooth Newton-based augmented Lagrangian (Ssnal) algorithm to address the GCNIR problem (1) from a dual viewpoint. The Ssnal algorithm’s primary benefits include its superior convergence characteristics and reduced computational demands, which are achieved by exploiting second-order sparsity and employing efficient strategies within the semismooth Newton (Ssn) algorithm. Furthermore, the Ssnal algorithm has demonstrated its effectiveness in handling large-scale sparse convex models, as evidenced by its performance in applications such as Lasso [12], group Lasso [13], fused Lasso [14], clustered Lasso [15,16], multi-task Lasso [17],

ℓ_{1}

trend filtering [18], density matrix least squares problems [19], the Dantzig selector [20], and others [21,22,23,24]. Building on these successes, we propose to apply the Ssnal algorithm to solve problem (1).

The primary contributions of this paper are as follows. First, we calculate the proximal mapping related to the GCNIR regularizer and its generalized Jacobian. Second, we utilize the Ssnal algorithm to address the GCNIR problem from a dual perspective. Furthermore, by capitalizing on the low-rank properties and second-order sparsity inherent in the GCNIR problem, we significantly reduce the computational cost associated with the Ssn algorithm when solving the subproblems. Lastly, we perform a numerical analysis comparing our algorithm with first-order methods, including ADMM and the Accelerated Proximal Gradient (APG) method, demonstrating the efficiency and robustness of our approach.

The remaining sections of this paper are organized as follows. Section 2 delves into the analysis of the proximal mapping associated with the GCNIR regularizer and its generalized Jacobian. Section 3 outlines the framework of the Ssnal algorithm and discusses its convergence properties when applied to the dual formulation of the GCNIR problem (1). In Section 4, we evaluate the performance of the Ssnal algorithm through numerical experiments. Finally, we conclude the paper in Section 5.

Notation. For any

z \in R^{n}

, “

Diag (z)

” represents a diagonal matrix with

z_{i}

in its i-th diagonal component. “

| z |

” refers to an absolute vector, where each entry i is

| z_{i} |

. “

sign (z)

” indicates the sign vector, i.e.,

sign (z_{i}) = 1

when

z_{i} > 0

,

sign (z_{i}) = - 1

when

z_{i} < 0

, and

sign (z_{i}) = 0

when

z_{i}

is equal to zero. Additionally, the notation “

Supp (z)

” refers to the support of the element z, specifically the collection of indices for which

z_{i}

is not equal to zero. For any positive integer n,

e_{1} = {(1, 0, 0, \dots, 0)}^{⊤}

and

e_{n} = {(0, 0, \dots, 0, 1)}^{⊤}

are the

n \times 1

unit column vectors.

I_{n} = (e_{1}, e_{2}, \dots, e_{n}) \in R^{n \times n}

, while

1_{n} = (1, 1, \dots, 1) \in R^{n}

.

D^{†}

denotes the Moore–Penrose pseudoinverse of the matrix

D \in R^{m \times n}

. Typically,

h^{*}

denotes the Fenchel conjugate of a given function h.

2. The Proximal Mapping of the GCNIR Regularizer and Its Generalized Jacobian

In this section, we shall present some results concerning the proximal mapping linked to the GCNIR regularizer along with its generalized Jacobian, which are necessary for later analysis.

Given any scalar

κ > 0

, for any proper closed convex function

p : R^{n} \to (- \infty, \infty]

, the proximal mapping and Moreau envelope [25] of p is defined by

\begin{matrix} {Prox}_{p} (w) & : = \underset{z}{argmin} \{\frac{1}{2} {∥ z - w ∥}^{2} + p (z)\}, \forall w \in R^{n}, \\ M_{p}^{κ} (w) & : = min_{z} \{\frac{1}{2 κ} {∥ z - w ∥}^{2} + p (z)\}, \forall w \in R^{n} . \end{matrix}

The Moreau identity [26] holds, i.e.,

{Prox}_{κ p} (w) + κ {Prox}_{p * / κ} (w / κ) = w, \forall w \in R^{n} .

According to [27],

M_{p}^{κ} (\cdot)

is convex and continuously differentiable, and

\nabla M_{p}^{κ} (w) = (w - {Prox}_{κ p} (w)) / κ, \forall w \in R^{n} .

Let

φ

be the GCNIR regularizer in (1), i.e.,

φ (z) = {λ ∥ z ∥}_{1} + g (z), \forall z \in R^{n},

where

g (z) = τ \sum_{i = 1}^{n - 1} {(z_{i} - z_{i + 1})}_{+}, \forall z \in R^{n}

.

Before diving into the proximal mapping associated with the GCNIR regularizer

φ

, we briefly introduce

\sum_{i = 1}^{n - 1} | z_{i} - z_{i + 1} |

and relevant results, which are discussed in [14].

Define

q (z) = \sum_{i = 1}^{n - 1} | z_{i} - z_{i + 1} {| = ∥ B z ∥}_{1}

, where B is defined by

\begin{matrix} B = {(\begin{matrix} 1 & - 1 & 0 & \dots & 0 \\ 0 & 1 & - 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 & - 1 \end{matrix})}_{(n - 1) \times n} . \end{matrix}

The proximal mapping

Z_{τ} (\cdot)

with respect to

τ q

is given as

Z_{τ} (w) : = \underset{z}{argmin} \{\frac{1}{2} {∥ z - w ∥}^{2} + τ q (z)\}, \forall w \in R^{n} .

Lemma 1.

(See [14], Lemma 1). For any given

τ \geq 0

, we have that

Z_{τ} (w) = w - B^{⊤} S_{τ} (B w)

, where

S_{τ} (w) : = \underset{s}{argmin} \{\frac{1}{2} ∥ B^{⊤} {s ∥}^{2} - 〈 s, w 〉 ∣ {∥ s ∥}_{\infty} \leq τ\}, \forall w \in R^{n - 1} .

(6)

On the basis of the above lemma, we can now explicitly calculate

{Prox}_{p} (\cdot)

. For later convenience, we define

w_{τ} = w - τ (e_{1} - e_{n}) / 2

.

Proposition 1.

For any given

λ, τ \geq 0

, it follows that for any

w \in R^{n}

,

{Prox}_{φ} (w) = {Prox}_{{λ ∥ \cdot ∥}_{1}} (Z_{τ / 2} (w_{τ})) = sign (Z_{τ / 2} (w_{τ})) \circ \max (|Z_{τ / 2} (w_{τ})| - λ, 0) .

Proof.

According to the definition of the proximal mapping, it holds that for any

w \in R^{n}

,

\begin{matrix} {Prox}_{φ} (w) & = \underset{z}{argmin} \{\frac{1}{2} {∥ z - w ∥}^{2} + φ (z)\} \\ = \underset{z}{argmin} \{\frac{1}{2} {∥ z - w ∥}^{2} + λ {∥ z ∥}_{1} + τ \sum_{i = 1}^{n - 1} {(z_{i} - z_{i + 1})}_{+}\} \\ = \underset{z}{argmin} \{\frac{1}{2} {∥ z - w ∥}^{2} + {λ ∥ z ∥}_{1} + \frac{τ}{2} \sum_{i = 1}^{n - 1} | z_{i} - z_{i + 1} | + \frac{τ}{2} 〈 z, e_{1} - e_{n} 〉\} \\ = \underset{z}{argmin} \{\frac{1}{2} {∥z - (w - \frac{τ}{2} (e_{1} - e_{n}))∥}^{2} + {λ ∥ z ∥}_{1} + \frac{τ}{2} \sum_{i = 1}^{n - 1} | z_{i} - z_{i + 1} |\} \\ = {Prox}_{{λ ∥ \cdot ∥}_{1} + \frac{τ}{2} q} (w - \frac{τ}{2} (e_{1} - e_{n})) . \end{matrix}

It follows from ([28], Corollary 4) that for any

w \in R^{n}

,

\begin{matrix} {Prox}_{{λ ∥ \cdot ∥}_{1} + \frac{τ}{2} q} (w - \frac{τ}{2} (e_{1} - e_{n})) & = {Prox}_{{λ ∥ \cdot ∥}_{1}} (Z_{τ / 2} (w_{τ})) \\ = sign (Z_{τ / 2} (w_{τ})) \circ \max (|Z_{τ / 2} (w_{τ})| - λ, 0) . \end{matrix}

This completes the proof. □

Next, we analyze the generalized Jacobian of

{Prox}_{φ} (\cdot)

, which is crucial for leveraging computational efficiency. We begin with presenting some findings concerning the generalized HS-Jacobian for

S_{τ} (\cdot)

, according to [14,29].

As noted in [14], the generalized HS-Jacobian for

S_{τ / 2}

at

B w_{τ}

is given by

Q (w_{τ}) = \{Q \in R^{(n - 1) \times (n - 1)} ∣ Q = {(Σ_{γ} B B^{⊤} Σ_{γ})}^{†}, γ \in Γ (w_{τ})\},

where

Σ_{γ} = Diag (σ_{γ}) \in R^{(n - 1) \times (n - 1)}

with

{(σ_{γ})}_{i} = \{\begin{matrix} 0 & if i \in Γ, \\ 1 & otherwise, i = 1, 2, \dots, n - 1, \end{matrix}

and

Γ (w_{τ}) = \{γ \subseteq {1, 2, \dots, n - 1} ∣ Supp (ξ) \subseteq γ \subseteq I (w_{τ})\},

where

ξ = B (Z_{τ / 2} (w_{τ}))

is an optimal Lagrangian multiplier for the constraint

{∥ s ∥}_{\infty} \leq τ / 2

and

I (w_{τ}) = \{i \subseteq {1, 2, \dots, n - 1} | {|S_{τ / 2} (B (w_{τ}))|}_{i} = τ / 2\}

(7)

is an active index set.

The multifunction

H : R^{n} ⇉ R^{n \times n}

is given by

H (w_{τ}) = \{H \in R^{n \times n} ∣ H = I_{n} - B^{⊤} Q B, Q \in Q (w_{τ})\} .

The subsequent proposition demonstrates that

Q (w_{τ})

and

H (w_{τ})

may be regarded as the generalized HS-Jacobian for

S_{τ / 2}

at

B w_{τ}

,

Z_{τ / 2}

at

w_{τ}

, respectively.

Proposition 2.

For all

w \in R^{n}

, there exists a neighborhood

X

of w such that for any

x \in X

,

\begin{matrix} \{\begin{matrix} Γ (x_{τ}) \subseteq Γ (w_{τ}), Q (x_{τ}) \subseteq Q (w_{τ}), H (x_{τ}) \subseteq H (w_{τ}), \\ S_{τ / 2} (B x_{τ}) = S_{τ / 2} (B w_{τ}) + Q B (x - w), \forall Q \in Q (x_{τ}), \\ Z_{τ / 2} (x_{τ}) = Z_{τ / 2} (w_{τ}) + H (x - w), \forall H \in H (x_{τ}), \end{matrix} \end{matrix}

where

x_{τ} = x - τ (e_{1} - e_{n}) / 2

.

Proof.

The results are derived from [14] (Proposition 2) and [29] (Lemma 2.1) with minor revisions. □

The multifunction

V : R^{n} ⇉ R^{n \times n}

is defined as

\begin{matrix} V (w) : = \{V \in R^{n \times n} | V = Θ H, \begin{matrix} Θ \in \partial_{B} {Prox}_{{λ ∥ \cdot ∥}_{1}} (Z_{τ / 2} (w_{τ})), \\ H \in H (w_{τ}) \end{matrix}\}, \end{matrix}

(8)

where

\begin{matrix} \partial_{B} {Prox}_{{λ ∥ \cdot ∥}_{1}} (η) = \{Diag (ω) | \begin{matrix} ω_{i} = 0 & | η_{i} | < λ, \\ ω_{i} \in {0, 1} & | η_{i} | = λ, \\ ω_{i} = 1 & | η_{i} | > λ, \end{matrix}\} . \end{matrix}

The mapping

V (w)

essentially acts as the generalized Jacobian for

{Prox}_{φ}

at w, which can be derived using the change-of-variables technique from previous work in [14] (Theorem 2).

Theorem 4.

Let λ and τ be non-negative real numbers, and let w be an element of

R^{n}

. The set-valued function

V

has the following properties: it is compact-valued, nonempty, and upper semicontinuous. For each

V \in V (w)

, it can be concluded that the matrix V is symmetric and positive semidefinite. Furthermore, there exists a neighborhood

X

of w that for any

x \in X

,

{Prox}_{φ} (x) - {Prox}_{φ} (w) - V (x - w) = 0, \forall V \in V (x) .

3. A Semismooth Newton-Based Augmented Lagrangian Algorithm

This section introduces the semismooth Newton-based augmented Lagrangian (Ssnal) algorithm, an efficient approach for solving the GCNIR problem in the high-dimension low-sample setting, i.e., the case of

m ≪ n

. Directly applying the Ssnal algorithm to the primal problem would require solving a linear system of

n \times n

dimension, leading to significant computational costs, particularly for large-scale problems. To overcome this, we solve the GCNIR problem from the dual perspective.

We can reformulate the GCNIR problem (1) as

min_{z \in R^{n}, v \in R^{m}} \{\frac{1}{2} {∥ v ∥}^{2} + φ (z) | A z - v = b\} .

(P)

The dual problem of (P) takes the following minimization form:

min_{s \in R^{m}, ζ \in R^{n}} \{\frac{1}{2} {∥ s ∥}^{2} + 〈 b, s 〉 + φ^{*} (ζ) | A^{⊤} s + ζ = 0\} .

(D)

The Lagrangian function of (D) is

l (s, ζ; z) = \frac{1}{2} {∥ s ∥}^{2} + 〈 b, s 〉 + φ^{*} (ζ) - 〈 z, A^{⊤} s + ζ 〉 .

Additionally, given

σ > 0

, the augmented Lagrangian function of (D) takes the form:

L_{σ} (s, ζ; z) = l (s, ζ; z) + \frac{σ}{2} {∥ A^{⊤} s + ζ ∥}^{2} .

3.1. The Framework of the Ssnal Algorithm

Below is the outline of the framework for an Ssnal algorithm designed to solve problem (D) (Algorithm 1).

Algorithm 1 (Ssnal) A semismooth Newton-based augmented Lagrangian algorithm for (D)

Initialization:

σ_{0} > 0

,

s^{0} \in R^{m}

,

ζ^{0} \in R^{n}

,

z^{0} \in R^{n}

,

j = 0 .

1:: Compute

$s^{j + 1} \approx \underset{s \in R^{m}}{argmin} {Φ_{j} (s)} .$

(9)
2:: Compute $ζ^{j + 1} = (z^{j} - σ_{j} A^{⊤} s^{j + 1} - {Prox}_{σ_{j} φ} (z^{j} - σ_{j} A^{⊤} s^{j + 1})) / σ_{j} .$
3:: Update $z^{j + 1} = z^{j} - σ_{j} (A^{⊤} s^{j + 1} + ζ^{j + 1}) .$
4:: Update $σ_{j + 1} ↑ σ_{\infty} \leq \infty$ , set $j \leftarrow j + 1$ , and go to Step 1.

The stopping criteria, which have been studied in [30,31] for approximately solving the solution to (9), can be stated as:

\begin{matrix} ∥ \nabla Φ_{j} (s^{j + 1}) ∥ & \leq ϵ_{j} / \sqrt{σ_{j}}, \sum_{j = 0}^{\infty} ϵ_{j} < \infty, ϵ_{j} > 0, \end{matrix}

(C1)

\begin{matrix} ∥ \nabla Φ_{j} (s^{j + 1}) ∥ & \leq ρ_{j} \sqrt{σ_{j}} ∥ A^{⊤} s^{j + 1} + ζ^{j + 1} ∥, \sum_{j = 0}^{\infty} ρ_{j} < \infty, ρ_{j} > 0, \end{matrix}

(C2)

\begin{matrix} ∥ \nabla Φ_{j} (s^{j + 1}) ∥ & \leq ρ_{j}^{'} ∥ A^{⊤} s^{j + 1} + ζ^{j + 1} ∥, 0 \leq ρ_{j}^{'} \to 0 . \end{matrix}

(C3)

Next, we present the convergence results of the Ssnal algorithm, addressing both global and local convergence. Since problem (P) is feasible, ref. [30] (Theorem 4) establishes that satisfying the stopping criterion (C1) guarantees the global convergence of the Ssnal algorithm for problem (D).

Theorem 5.

(Global convergence.) The infinite sequence

(s^{j}, ζ^{j}; z^{j})

generated by the Algorithm 1, in accordance with the stopping criteria specified in (C1), ensures that the sequence

z^{j}

converges to an optimal solution of problem (P) and the sequence

(s^{j}, ζ^{j})

converges to the unique optimal solution of problem (D).

Let the objective function of problem (P) be denoted by

f (z) : = \frac{1}{2} {∥ A z - b ∥}^{2} + φ (z)

. It is clear that

T_{f} (z) : = \partial f (z)

and

T_{l} (s, ζ; z) : = {(s^{'}, ζ^{'}; z^{'}) ∣ (s^{'}, ζ^{'}; - z^{'}) \in \partial l (s, ζ; z)}

are piecewise polyhedral multifunctions as described in [32]. This implies that both

T_{f}

and

T_{l}

satisfy error bound conditions at the original point with positive moduli

c_{f}

and

c_{l}

, respectively, as characterized in [33].

Following the analysis presented, we establish the local convergence of Algorithm 1, supported by the results in [12,30,31,34]. The proofs are analogous to those in [12] (Theorems 3.3). Therefore, we omit the detailed proofs here.

Theorem 3.

(Local convergence.) Given the conditions specified in (C1) and (C2), the sequence

(s^{j}, ζ^{j}, z^{j})

generated by the Algorithm 1 converges to

(s^{*}, ζ^{*}, z^{*})

, and for sufficiently large values of j, the following holds:

∥ z^{j + 1} - z^{*} ∥ \leq δ_{j} ∥ z^{j} - z^{*} ∥,

where

δ_{j} = (2 ρ_{j} + c_{f} / \sqrt{c_{f}^{2} + σ_{j}^{2}}) / (1 - ρ_{j})

and

{lim}_{j \to \infty} δ_{j} = c_{f} / \sqrt{c_{f}^{2} + σ_{\infty}^{2}} < 1

. Furthermore, if the condition (C3) is also applied, then for sufficiently large values of j,

∥ (s^{j + 1}, ζ^{j + 1}) - (s^{*}, ζ^{*}) ∥ \leq δ_{j}^{'} ∥ z^{j + 1} - z^{j} ∥,

where

δ_{j}^{'} = c_{l} (1 + ρ_{j}^{'}) / σ_{j}

→

c_{l} / σ_{\infty}

when

j \to \infty

.

3.2. Ssn Algorithm for Subproblem (9)

We shall present a highly effective semismooth Newton (Ssn) algorithm designed to tackle problem (9) in this subsection.

For any given

σ > 0

and a fixed

\tilde{z} \in R^{n}

, our goal is to solve the following problem:

\min_{s \in R^{m}} \{Φ (s) : = \inf_{ζ} L_{σ} (s, ζ; \tilde{z})\},

(11)

where the objective function of problem (11) is given by

Φ (s) = M_{φ^{*}}^{1 / σ} (\frac{\tilde{z}}{σ} - A^{⊤} s) - \frac{1}{2 σ} ∥ \tilde{z} ∥^{2} + 〈 s, b 〉 + \frac{1}{2} {∥ s ∥}^{2} .

It is well known that

Φ (\cdot)

is continuously differentiable with

\nabla Φ (s) = - A {Prox}_{σ φ} (\tilde{z} - σ A^{⊤} s) + b + s = - σ A {Prox}_{φ} (\frac{\tilde{z}}{σ} - A^{⊤} s) + b + s .

Since

Φ (\cdot)

is strongly convex, we can obtain the unique solution of problem (11) via solving the following nonsmooth equations:

\nabla Φ (s) = 0 .

(12)

For any

s \in R^{m}

, the multifunction is well defined and can be expressed as

N (s) : = \{N \in R^{m \times m} | N = I_{m} + σ A V A^{⊤}, V \in V (\frac{\tilde{z}}{σ} - A^{⊤} s)\},

where the operator

V

is defined in (8).

Remark 1.

Based on Theorem 4, we can deduce that

V

is not only nonempty but also has the properties of being compact-valued and upper semicontinuous. In addition, each component of the function

N (\cdot)

is symmetric and positive definite.

Before delving into the Ssn algorithm, we introduce the following lemma, the proof of which is analogous to the one presented in [15] (Remark 2.12).

Lemma 2.

For any positive constant r, the proximal mapping

{Prox}_{φ} (\cdot)

is r-order semismooth at

w \in R^{n}

with respect to the multifunction

V

. Similarly, the gradient

\nabla Φ

is r-order semismooth at

s \in R^{m}

with respect to the multifunction

N

.

Proof.

Given that

S_{τ / 2} (\cdot)

and

{Prox}_{{λ ∥ \cdot ∥}_{1}} (\cdot)

are piecewise affine functions as shown in [14], it follows that

Z_{τ / 2} (\cdot)

is Lipschitz continuous. Consequently, the proximal mapping

{Prox}_{φ} (\cdot)

is also a piecewise affine function that maintains Lipschitz continuity. As established in [35],

{Prox}_{φ} (\cdot)

is directionally differentiable at every point. Combining this with Theorem 4 and the definition of semismoothness [36,37,38,39], we can conclude that

{Prox}_{φ} (\cdot)

exhibits r-order semismoothness on

R^{n}

. Similarly, it can be inferred that

\nabla Φ

also demonstrates r-order semismoothness on

R^{m}

. This completes the proof. □

Given that the gradient

\nabla Φ (\cdot)

is nonsmooth, it is appropriate to employ the semismooth Newton (Ssn) algorithm instead of the standard Newton method to solve Equations (12). Building on the analysis provided, we are now ready to proceed with the development of an Ssn algorithm to solve (12) (Algorithm 2).

Algorithm 2 (Ssn) A semismooth Newton algorithm for (12)

Initialization:

s^{0} \in R^{n}

,

α \in (0, 1 / 2)

,

β \in (0, 1)

. Set

t = 0

.

1:: Select $N_{t} \in N (s^{t})$ . Solve the following linear system

$N_{t} d = - \nabla Φ (s^{t}) .$

(13)
2:: Set $μ_{t} = β^{m_{t}}$ , where $m_{t}$ is the first nonnegative integer m such that

$Φ (s^{t} + β^{m} d^{t}) \leq Φ (s^{t}) + α β^{m} 〈 \nabla Φ (s^{t}), d^{t} 〉 .$
3:: Update $s^{t + 1} = s^{t} + μ_{t} d^{t}$ , $t \leftarrow t + 1$ , and go to step 1.

For problem (13), we can employ the conjugate gradient algorithm to obtain

d^{t}

such that

∥ N_{t} d^{t} + \nabla Φ (s^{t}) ∥ \leq \min {ϱ, ∥ \nabla Φ (s^{t}) ∥^{1 + ξ}},

(14)

where

ϱ \in (0, 1)

,

ξ \in (0, 1]

. Regarding the convergence of Algorithm 2, it is confirmed in the work of Li [14] (Theorem 3), and we now present their result directly as follows.

Theorem 4.

The infinite sequence

s^{t}

, generated by the Algorithm 2, converges to the unique optimal solution

\bar{s}

to problem (11). Additionally, it follows that

∥ s^{t + 1} - \bar{s} ∥ = O (∥ s^{t} - \bar{s} ∥^{1 + ξ}),

where ξ is given in solving problem (14).

When applying Algorithm 2 to solve problem (12), the most computationally intensive step involves determining the search direction, which is derived from solving the linear system (13), i.e.,

(I_{m} + σ A V A^{⊤}) d = - \nabla Φ (s) .

(15)

Therefore, we aim to investigate the second-order information of the matrix V to reduce computation time. Firstly, let

Θ = Diag (θ)

with

\begin{matrix} θ_{i} = \{\begin{matrix} 0 & if {|Z_{τ / 2} (w_{τ})|}_{i} \leq λ \\ 1 & otherwise \end{matrix}, i = 1, 2, \dots, n, \end{matrix}

and denote

Σ = Diag (σ) \in R^{(n - 1) \times (n - 1)}

with

\begin{matrix} σ_{i} = \{\begin{matrix} 0 & if i \in I (w_{τ}) \\ 1 & otherwise \end{matrix}, i = 1, 2, \dots, n - 1, \end{matrix}

where

I (w_{τ})

is given as in (7). It is derived from [14] (Propositions 2 and 3) that

V_{0} \in V (w)

, i.e.,

V_{0} = Θ H = Θ (I_{n} - B^{⊤} {(Φ B B^{⊤} Φ)}^{†} B),

where

H \in H (w_{τ})

. Additionally, we can restructure H as a block diagonal matrix and exploit the generalized Jacobian’s sparse low-rank structure to substantially reduce computation time. Moreover, we have several options for solving the linear system (15), such as the Cholesky factorization, Sherman–Morrison–Woodbury (SMW) formula, or preconditioned conjugate gradient (PCG) method. These techniques further contribute to improving computational efficiency. Specific details can be found in [14,40], which we do not repeat here.

4. Numerical Experiments

In this section, we evaluate the efficiency of the Ssnal algorithm for solving the GCNIR problem by comparing it with the ADMM and APG algorithms using both synthetic and real datasets. The computational results were achieved by using MATLAB R2022b on a Dell desktop equipped with an Intel(R) Core(TM) i7-11700 CPU running at 2.50 GHz, along with 8 GB of RAM.

4.1. Some First-Order Methods for the GCNIR Problem

We provide a brief overview of the framework of ADMM and APG.

ADMM is recognized as a representative algorithm for addressing convex optimization problems [41], including the problem presented in (D). Here is a summary of the framework for the ADMM algorithm (see Algorithm 3):

Algorithm 3 ADMM for the dual problem (D)

Initialization: Set

σ > 0

,

c \in (0, (1 + \sqrt{5}) / 2)

,

ζ^{0} \in R^{n}

,

z^{0} \in R^{n}

, and initialize

j = 0

.

1:: Address

$s^{j + 1} \approx \underset{s \in R^{m}}{argmin} L_{σ} (s, ζ^{j}; z^{j}) .$

(16)

by utilizing either direct solvers or iterative methods like preconditioned conjugate gradient (PCG) method for solving this linear system to obtain $s^{j + 1}$ :

$(I_{m} + σ A A^{⊤}) s = A (z^{j} - σ ζ^{j}) - b .$
2:: Calculate $ζ^{j + 1} = \underset{ζ \in R^{n}}{argmin} L_{σ} (s^{j + 1}, ζ; z^{j}) = {Prox}_{φ^{*} / σ} (\frac{z^{j}}{σ} - A^{⊤} s^{j + 1}) .$
3:: Update $z^{j + 1} = z^{j} - c σ (A^{⊤} s^{j + 1} + ζ^{j + 1}) .$
4:: Set $j \leftarrow j + 1$ , and go to Step 1.

Let

L = λ_{\max} (A^{⊤} A)

, representing the Lipschitz constant for function

{∥ A z - b ∥}^{2} / 2

in the primal problem (P). Here is a summary of the framework for the APG algorithm (see Algorithm 4):

Algorithm 4 APG for the primal problem (P)

Initialization: Choose

ϵ > 0

, set

x^{0} = z^{0} \in R^{n}

,

r_{0} = 1

, and initialize

j = 0 .

1:: Calculate

$z^{j + 1} = {Prox}_{φ / L} (x^{j} - \frac{A^{⊤} (A x^{j} - b))}{L}) .$
2:: Compute $r_{j + 1} = (1 + \sqrt{1 + 4 r_{j}^{2}}) / 2 .$
3:: Update

$x^{j + 1} = z^{j + 1} + \frac{r_{j} - 1}{r_{j + 1}} (z^{j + 1} - z^{j}) .$
4:: Set $j \leftarrow j + 1$ , and go to Step 1.

4.2. Stopping Criteria

Utilizing the KKT conditions of problem (P) and (D), we can obtain the following relative KKT residual:

\begin{matrix} E_{P} : = \frac{∥ A z - b - s ∥}{1 + ∥ b ∥}, E_{D} : = \frac{∥ A^{⊤} s + ζ ∥}{1 + ∥ ζ ∥}, E_{K} : = \frac{∥ z - {Prox}_{φ} (z - A^{⊤} (A z - b)) ∥}{1 + ∥ z ∥ + ∥ A^{⊤} (A z - b) ∥} . \end{matrix}

In addition, let

{obj}_{P}

and

{obj}_{D}

represent the objective values of (P) and (D), respectively, i.e.,

\begin{matrix} {obj}_{P} : = \frac{1}{2} {∥ A z - b ∥}^{2} + {λ ∥ z ∥}_{1} + τ \sum_{i = 1}^{n - 1} {(z_{i} - z_{i + 1})}_{+}, {obj}_{D} : = - \frac{1}{2} {∥ s ∥}^{2} - 〈 b, s 〉, \end{matrix}

Then, the relative dual gap is defined by

\begin{matrix} E_{G} : = \frac{| {obj}_{P} - {obj}_{D} |}{1 + | {obj}_{P} | + | {obj}_{D} |} . \end{matrix}

In later experiments, we start the Ssnal algorithm with the parameters

(s^{0}, ζ^{0}, z^{0}) = (0, 0, 0)

and terminate it when

Res : = \max {E_{P}, E_{D}, E_{K}, E_{G}} \leq tol

with a given error tolerance “to”. To enhance convergence speed, it is essential to dynamically modify the penalty parameter

σ

in the Ssnal algorithm. Specifically, we initially set

σ_{0} = \min (1, 1 / \sqrt{λ_{\max} (A A^{⊤})})

and tune

σ_{j + 1}

every three steps, i.e.,

\begin{matrix} \begin{matrix} σ_{j + 1} = \{\begin{matrix} \max {0.001, 0.002 σ_{j}}, & E_{D}^{j} < 0.1 E_{P}^{j}, \\ \min {5000, 50 σ_{j}}, & E_{D}^{j} > 10 E_{P}^{j}, \\ σ_{j}, & otherwise, \end{matrix} \end{matrix} \end{matrix}

where

σ_{j}

,

E_{P}^{j}

, and

E_{D}^{j}

represent the values of

σ

,

E_{P}

, and

E_{D}

during the j-th iteration, respectively.

In the case of the ADMM algorithm, we set the initial point

(s^{0}, ζ^{0}, z^{0}) = (0, 0, 0)

and terminate the algorithm when

Res = E_{K} \leq tol

. For the APG algorithm, we start the initial point

(z^{0}, x^{0}) = (0, 0)

and terminate the algorithm when

Res = E_{K} \leq tol

.

Additionally, we set a tolerance level of

tol = 10^{- 6}

. The tested algorithms will terminate under two conditions: either upon reaching their maximum number of iterations (100 for Ssnal; 30,000 for ADMM and APG) or if their running time exceeds 3 h.

In the following tables, “nnz” and “mon” represent the counts of non-zero elements in z and

B z

, as derived from Ssnal through these estimations:

\begin{matrix} nnz & : = \min \{j ∣ \sum_{k = 1}^{j} {| z |}_{(k)} \geq 0.999 {∥ z ∥}_{1}\}, \\ mon & : = \min \{j ∣ \sum_{k = 1}^{j} {| (B z)}_{+} |_{(k)} \geq 0.999 {∥ {(B z)}_{+} ∥}_{1}\}, \end{matrix}

where

{| z |}_{(k)}

represents the k-th largest component in

| z |

, ordered as

{| z |}_{(1)} \geq {| z |}_{(2)} \geq \dots \geq {| z |}_{(n)}

. Time is shown in seconds.

4.3. Results on Synthetic Data

In this subsection, we evaluate the performance of three algorithms: Ssnal, ADMM, and APG on synthetic data.

In later experiments, we test five instances:

(m; n) = (300 k; 8000 k), k = 1, \dots, 5

. To create synthetic data, we employ the model

b = A z + \tilde{ξ} ϵ, ϵ \sim N (0, I),

where

A \in R^{m \times n}

is drawn from a Gaussian distribution

N (0, I_{n})

and

z \in R^{n}

is generated by a distributed random number. Following the way provided in [15], we set

\tilde{ξ}

to be

0.1 ∥ A z ∥ / ∥ ϵ ∥

for each case. The regularization parameters for the GCNIR problem (1) are specified as follows:

λ = 0.5 ϖ ∥ A^{⊤} {b ∥}_{\infty}, τ = λ,

(P1)

where

0 < ϖ < 2

. In total, we test 20 instances.

Table 1 presents a comparative analysis of the performance of three algorithms on synthetic datasets ranging from small to large scales. It is evident that the Ssnal algorithm exhibits both efficiency and robustness across a variety of parameter settings. Although all algorithms are capable of solving the problem with the required level of accuracy, Ssnal consistently outperforms the other two methods in terms of computational time. For instance, in the case of instance 5 with

ϖ = 0.6

, Ssnal completes the task in a mere 3 s, whereas ADMM and APG take over 60 and 80 s, respectively.

4.4. Results on Real Datasets

We evaluate the performances of three methods on three datasets: 10-K Corpus, StatLib, and UCI, which have been sourced from the LIBSVM datasets [42].

In our experiments with real datasets, we follow the methodology in [12] and utilize polynomial basis functions [43] to expand the original features across twelve datasets. For example, the number “10” in “mgscale10” indicates that we use a tenth-order polynomial to generate the basis functions. Furthermore, Table 2 provides statistical details for the twelve datasets under consideration, where “m” refers to the sample size and “n” denotes the number of features. For the GCNIR problem, the regularization parameters are set as the following three different strategies [13,14]: (P1) given in the previous section and

\begin{matrix} λ = 0.5 ϖ ∥ A^{⊤} {b ∥}_{\infty}, τ = 9 λ; \end{matrix}

(P2)

\begin{matrix} λ = 0.5 ϖ ∥ A^{⊤} {b ∥}_{\infty}, τ = λ / \sqrt{n} . \end{matrix}

(P3)

In our experiments, we use two values of

ϖ

for all datasets.

Table 3, Table 4 and Table 5 display the comparative results of the three algorithms across parameter sets (P1), (P2), and (P3), respectively. An analysis of these tables reveals that the Ssnal algorithm was able to solve all 72 test cases within a 70 s timeframe, with the majority of these cases being resolved in under 20 s. In comparison, ADMM encountered failures in 23 cases, while APG failed in 50 instances. These results underscore the superior performance of Ssnal, which not only consistently outperformed ADMM and APG in terms of speed but also demonstrated a higher success rate.

From Table 3, Ssnal not only succeeds in achieving the required accuracy but also takes less time than ADMM and APG. For instance, in the case of N7 with

ϖ = 1 \times 10^{- 4}

, the Ssnal algorithm takes

10.96

s to reach the high accuracy of

Res = 9.20 \times 10^{- 8}

, while the ADMM algorithm needs 4666 s to achieve the lower accuracy of

Res = 1.00 \times 10^{- 6}

, and the APG algorithm even requires 4022 s to achieve a relatively large error of

Res = 2.20 \times 10^{- 3}

. Therefore, the Ssnal algorithm outperforms the other two algorithms in addressing the GCNIR problem.

Table 4 shows that the number of reversed order coefficients in z is almost the fewest among all tables. This is because the regularization parameter

τ

, which enforces monotonicity, is larger than the regularization parameter

λ

, which enforces sparsity in (P2). From Table 4, it is also evident that Ssnal demands considerably less time than the other two methods on twelve cases. Moreover, for more challenging tests, such as N3 with

ϖ = 1 \times 10^{- 4}

, only Ssnal successfully solved this problem, while the other two algorithms did not meet the accuracy requirements. The results strongly indicate that our Ssnal algorithm can efficiently and reliably solve the GCNIR problem.

Table 5 further illustrates that Ssnal continues to outperform ADMM and APG by a significant margin. This advantage is particularly pronounced for large-scale problems. In particular, for the case N4 with

ϖ = 1 \times 10^{- 4}

, Ssnal solves it to the desired accuracy within 53 s, while ADMM and APG fail to solve it within 3 h.

Consequently, we can confidently state that our Ssnal algorithm can efficiently and robustly solve the GCNIR problem (1) on real datasets with high accuracy.

5. Conclusions

In this paper, we proposed a highly efficient semismooth Newton-based augmented Lagrangian method for solving the GCNIR problem from the dual perspective. The proximal mapping associated with the GCNIR regularizer and its generalized Jacobian have been derived, and we have utilized the second-order sparsity structure to achieve superior performance in solving the subproblem of the Ssnal algorithm. Numerical results have demonstrated the efficiency and robustness of our proposed algorithm compared to the widely used ADMM and APG methods on both the synthetic and real datasets. Looking ahead, we anticipate our algorithm to play a significant role in solving convex problems with the GCNIR regularizer, thereby facilitating data analysis in statistical learning.

Author Contributions

Conceptualization, Y.-J.L.; methodology, Y.X.; software, Y.X.; validation, L.L. and Y.-J.L.; formal analysis, Y.X., L.L. and Y.-J.L.; investigation, Y.X.; resources, Y.-J.L.; data curation, Y.X.; writing—original draft preparation, Y.X.; writing—review and editing, L.L. and Y.-J.L.; visualization, Y.X.; supervision, L.L. and Y.-J.L.; project administration, L.L. and Y.-J.L.; funding acquisition, L.L. and Y.-J.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 12271097), the Key Program of National Science Foundation of Fujian Province of China (Grant No. 2023J02007), the Central Guidance on Local Science and Technology Development Fund of Fujian Province (Grant No. 2023L3003), and the Fujian Alliance of Mathematics (Grant No. 2023SXLMMS01, 2025SXLMQN01).

Data Availability Statement

All data generated or analyzed during this study are included in this article.

Acknowledgments

The authors would very much like to thank the reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Matyasovszky, I. Estimating red noise spectra of climatological time series. Időjárás Q. J. Hung. Meteorol. Serv. 2013, 117, 187–200. [Google Scholar]
Yu, Y.L.; Xing, E. Exact algorithms for isotonic regression and related. J. Phys. 2016, 699, 012016. [Google Scholar] [CrossRef]
Matsuda, T.; Miyatake, Y. Generalized nearly isotonic regression. arXiv 2021, arXiv:2108.13010. [Google Scholar]
Obozinski, G.; Lanckriet, G.; Grant, C.; Jordan, M.I.; Noble, W.S. Consistent probabilistic outputs for protein function prediction. Genome Biol. 2008, 9, 247–254. [Google Scholar] [CrossRef] [PubMed]
Barlow, R.E.; Brunk, H.D. The isotonic regression problem and its dual. J. Am. Stat. Assoc. 1972, 67, 140–147. [Google Scholar] [CrossRef]
Tibshirani, R.J.; Hoefling, H.; Tibshirani, R. Nearly-isotonic regression. Technometrics 2011, 53, 54–61. [Google Scholar] [CrossRef]
Tibshirani, R.; Suo, X. An Ordered Lasso and Sparse Time-Lagged Regression. Technometrics 2016, 58, 415–423. [Google Scholar] [CrossRef] [PubMed]
Lin, L.; Liu, Y.J. An Efficient Hessian Based Algorithm for Singly Linearly and Box Constrained Least Squares Regression. J. Sci. Comput. 2021, 88, 26. [Google Scholar] [CrossRef]
Ayer, M.; Brunk, H.D.; Ewing, G.M.; Reid, W.T.; Silverman, E. An empirical distribution function for sampling with incomplete information. Ann. Math. Statist. 1955, 26, 641–647. [Google Scholar] [CrossRef]
Yu, Z.S.; Chen, X.Y.; Li, X.D. A dynamic programming approach for generalized nearly isotonic optimization. Math. Prog. Comp. 2023, 15, 195–225. [Google Scholar] [CrossRef]
Brian, R.; Gaines, J.K.; Zhou, H. Algorithms for fitting the constrained Lasso. J. Comput. Graph. Stat. 2018, 27, 861–871. [Google Scholar]
Li, X.D.; Sun, D.F.; Toh, K.C. A highly efficient semismooth Newton augmented Lagrangian method for solving Lasso problems. SIAM J. Optim. 2018, 28, 433–458. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, N.; Sun, D.; Toh, K.C. An efficient Hessian based algorithm for solving large-scale sparse group Lasso problems. Math. Program. 2018, 179, 223–263. [Google Scholar] [CrossRef]
Li, X.; Sun, D.; Toh, K.C. On efficiently solving the subproblems of a level-set method for fused Lasso problems. SIAM J. Optim. 2018, 28, 1842–1866. [Google Scholar] [CrossRef]
Lin, M.X.; Liu, Y.J.; Sun, D.F.; Toh, K.C. Efficient sparse semismooth Newton methods for the clustered Lasso problem. SIAM J. Optim. 2019, 29, 2026–2052. [Google Scholar] [CrossRef]
Sun, D.; Toh, K.C.; Yuan, Y. Convex clustering: Model, theoretical guarantee and efficient algorithm. J. Mach. Learn. Res. 2021, 22, 1–32. [Google Scholar]
Lin, L.; Liu, Y.J. An inexact semismooth Newton-based augmented Lagrangian algorithm for multi-task Lasso problems. Asia Pac. J. Oper. Res. 2024, 41, 2350027. [Google Scholar] [CrossRef]
Liu, Y.J.; Zhang, T. Sparse Hessian based semismooth Newton augmented Lagrangian algorithm for general ℓ₁ trend filtering. Pac. J. Optim. 2023, 19, 187–204. [Google Scholar]
Liu, Y.J.; Yu, J. A semismooth Newton-based augmented Lagrangian algorithm for density matrix least squares problems. J. Optim. Theory Appl. 2022, 195, 749–779. [Google Scholar] [CrossRef]
Fang, S.; Liu, Y.J.; Xiong, X. Efficient Sparse Hessian-Based Semismooth Newton Algorithms for Dantzig Selector. SIAM J. Sci. Comput. 2021, 43, 4147–4171. [Google Scholar] [CrossRef]
Liu, Y.J.; Yu, J. A semismooth Newton based dual proximal point algorithm for maximum eigenvalue problem. Comput. Optim. Appl. 2023, 85, 547–582. [Google Scholar] [CrossRef]
Liu, Y.J.; Zhu, Q. A semismooth Newton based augmented Lagrangian algorithm for Weber problem. Pac. J. Optim. 2022, 18, 299–315. [Google Scholar]
Liu, Y.J.; Xu, J.J.; Lin, L.Y. An easily implementable algorithm for efficient projection onto the ordered weighted ℓ₁ norm ball. J. Oper. Res. Soc. China 2023, 11, 925–940. [Google Scholar] [CrossRef]
Liu, Y.J.; Wan, Y.; Lin, L. An efficient algorithm for Fantope-constrained sparse principal subspace estimation problem. Appl. Math. Comput. 2024, 475, 128708. [Google Scholar] [CrossRef]
Moreau, J. Proximité et dualité dans un espace hilbertien. Bull. Société Mathématique Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
Rockafellar, R. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970; pp. 338–339. [Google Scholar]
Lemaréchal, C.; Sagastizábal, C. Practical aspects of the Moreau–Yosida regularization: Theoretical preliminaries. SIAM J. Optim. 1997, 7, 367–385. [Google Scholar] [CrossRef]
Yu, Y. On decomposing the proximal map. In Proceedings of the 27th International Conference on Neural Information Processing Systems, New York, NY, USA, 5–10 December 2013; pp. 91–99. [Google Scholar]
Han, J.; Sun, D. Newton and quasi-Newton methods for normal maps with polyhedral sets. J. Optim. Theory Appl. 1997, 94, 659–676. [Google Scholar] [CrossRef]
Rockafellar, R.T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1976, 1, 97–116. [Google Scholar] [CrossRef]
Rockafellar, R.T. Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 1976, 14, 877–898. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer: Berlin, Germany, 1998; p. 550. [Google Scholar]
Robinson, S.M. Some continuity properties of polyhedral multifunctions. In Mathematical Programming at Oberwolfach; König, H., Korte, B., Ritter, K., Eds.; Springer: Berlin, Germany, 1981; pp. 206–214. [Google Scholar]
Luque, F.J. Asymptotic convergence analysis of the proximal point algorithm. SIAM J. Control Optim. 1984, 22, 277–293. [Google Scholar] [CrossRef]
Facchinei, F.; Pang, J.S. Finite-Dimensional Variational Inequalities and Complementarity Problems; Springer: New York, NY, USA, 2003; p. 345. [Google Scholar]
Mifflin, R. Semismooth and semiconvex functions in constrained optimization. SIAM J. Control Optim. 1977, 15, 959–972. [Google Scholar] [CrossRef]
Kummer, B. Newton’s method for non-differentiable functions. In Advances in Mathematical Optimization; Guddat, J., Ed.; De Gruyter: Berlin, Germany, 1988; pp. 114–125. [Google Scholar]
Qi, L.; Sun, J. A nonsmooth version of Newton’s method. Math. Program. 1993, 58, 353–367. [Google Scholar] [CrossRef]
Sun, D.; Sun, J. Semismooth matrix-valued functions. Math. Oper. Res. 2002, 27, 150–169. [Google Scholar] [CrossRef]
Luo, Z.; Sun, D.; Toh, K.C.; Xiu, N. Solving the OSCAR and SLOPE models using a semismooth Newton-based augmented Lagrangian method. J. Mach. Learn. Res. 2019, 20, 1–25. [Google Scholar]
Gabay, D.; Mercier, B. A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Appl. Math. Comput. 1976, 2, 17–40. [Google Scholar] [CrossRef]
LIBSVM—A Library for Support Vector Machines. Available online: https://www.csie.ntu.edu.tw/~cjlin/libsvm/ (accessed on 15 January 2024).
Huang, L.; Jia, J.; Yu, B.; Chun, B.G.; Maniatis, P.; Naik, M. Predicting execution time of computer programs using sparse polynomial regression. In Proceedings of the 24th International Conference on Neural Information Processing Systems, New York, NY, USA, 6–9 December 2010; pp. 883–891. [Google Scholar]

Table 1. The performances of Ssnal, ADMM, and APG when applied to synthetic data.

i	$λ_{\max} ({AA}^{⊤})$	$ϖ$	nnz; mon	Time	Res
i	$λ_{\max} ({AA}^{⊤})$	$ϖ$	nnz; mon	$S SNAL ∣ ADMM ∣ APG$	$S SNAL ∣ ADMM ∣ APG$
1	$1.13 \times 10^{4}$	0.6	93; 69	0.26 \| 0.40 \| 1.56	$1.85 \times 10^{- 7}$ \| $9.42 \times 10^{- 7}$ \| $9.69 \times 10^{- 7}$
		0.7	57; 45	0.19 \| 0.32 \| 1.29	$7.75 \times 10^{- 8}$ \| $9.21 \times 10^{- 7}$ \| $9.99 \times 10^{- 7}$
		0.8	25; 20	0.11 \| 0.29 \| 1.12	$2.02 \times 10^{- 8}$ \| $8.89 \times 10^{- 7}$ \| $9.40 \times 10^{- 7}$
		0.9	9; 8	0.07 \| 0.29 \| 0.66	$5.60 \times 10^{- 7}$ \| $8.38 \times 10^{- 7}$ \| $9.41 \times 10^{- 7}$
2	$2.28 \times 10^{4}$	0.6	144; 100	0.53 \| 4.25 \| 13.38	$3.13 \times 10^{- 8}$ \| $9.97 \times 10^{- 7}$ \| $9.90 \times 10^{- 7}$
		0.7	73; 53	0.48 \| 4.20 \| 9.85	$7.98 \times 10^{- 9}$ \| $9.30 \times 10^{- 7}$ \| $9.69 \times 10^{- 7}$
		0.8	23; 16	0.29 \| 3.85 \| 7.11	$3.87 \times 10^{- 7}$ \| $9.36 \times 10^{- 7}$ \| $9.57 \times 10^{- 7}$
		0.9	8; 5	0.16 \| 3.72 \| 3.64	$1.07 \times 10^{- 7}$ \| $9.72 \times 10^{- 7}$ \| $9.77 \times 10^{- 7}$
3	$3.43 \times 10^{4}$	0.6	251; 170	1.57 \| 16.54 \| 41.17	$2.56 \times 10^{- 8}$ \| $9.91 \times 10^{- 7}$ \| $9.92 \times 10^{- 7}$
		0.7	134; 93	1.02 \| 16.24 \| 30.42	$6.80 \times 10^{- 7}$ \| $9.82 \times 10^{- 7}$ \| $9.66 \times 10^{- 7}$
		0.8	54; 37	0.78 \| 15.50 \| 23.71	$3.08 \times 10^{- 7}$ \| $9.29 \times 10^{- 7}$ \| $9.98 \times 10^{- 7}$
		0.9	14; 8	0.42 \| 14.51 \| 19.73	$1.09 \times 10^{- 7}$ \| $9.21 \times 10^{- 7}$ \| $9.83 \times 10^{- 7}$
4	$4.55 \times 10^{4}$	0.6	286; 205	2.01 \| 35.65 \| 63.67	$9.47 \times 10^{- 7}$ \| $9.52 \times 10^{- 7}$ \| $9.94 \times 10^{- 7}$
		0.7	145; 102	1.77 \| 33.64 \| 49.07	$3.31 \times 10^{- 7}$ \| $9.77 \times 10^{- 7}$ \| $9.94 \times 10^{- 7}$
		0.8	58; 40	1.12 \| 31.12 \| 39.93	$1.46 \times 10^{- 7}$ \| $9.60 \times 10^{- 7}$ \| $9.50 \times 10^{- 7}$
		0.9	12; 9	0.64 \| 28.96 \| 34.60	$7.32 \times 10^{- 8}$ \| $9.66 \times 10^{- 7}$ \| $9.47 \times 10^{- 7}$
5	$5.69 \times 10^{4}$	0.6	189; 133	2.62 \| 61.07 \| 83.39	$3.50 \times 10^{- 7}$ \| $9.68 \times 10^{- 7}$ \| $9.98 \times 10^{- 7}$
		0.7	69; 47	1.50 \| 58.16 \| 63.74	$1.24 \times 10^{- 7}$ \| $9.44 \times 10^{- 7}$ \| $9.59 \times 10^{- 7}$
		0.8	10; 9	1.10 \| 52.05 \| 53.83	$6.25 \times 10^{- 8}$ \| $9.16 \times 10^{- 7}$ \| $9.41 \times 10^{- 7}$
		0.9	4; 3	0.83 \| 50.61 \| 42.12	$2.10 \times 10^{- 8}$ \| $8.82 \times 10^{- 7}$ \| $9.61 \times 10^{- 7}$

Table 2. Summary of tested data sets.

No.	Proname	$m; n$	$λ_{\max} ({AA}^{⊤})$
N1	E2006.train	16,087;150,360	$1.91 \times 10^{5}$
N2	E2006.test	3308; 150,358	$4.79 \times 10^{4}$
N3	pyrim5	74; 201,376	$1.22 \times 10^{6}$
N4	triazines4	186; 635,376	$2.07 \times 10^{7}$
N5	abalone7	4177; 6435	$5.21 \times 10^{5}$
N6	bodyfat7	252; 116,280	$5.29 \times 10^{4}$
N7	housing7	506; 77,520	$3.28 \times 10^{5}$
N8	mpg7	392; 3432	$1.28 \times 10^{4}$
N9	space9	3107; 5005	$4.01 \times 10^{3}$
N10	mg10	1385; 8008	$5.11 \times 10^{3}$
N11	eunite2001_train6	336; 74,613	$3.54 \times 10^{4}$
N12	eunite2001_test6	31; 74,613	$2.14 \times 10^{3}$

Table 3. The performances of Ssnal, ADMM, and APG when applied to real datasets with the regularization parameters defined as in (P1).

No.	$ϖ$	nnz; mon	Time	Res
No.	$ϖ$	nnz; mon	$S SNAL ∣ ADMM ∣ APG$	$S SNAL ∣ ADMM ∣ APG$
N1	$1 \times 10^{- 5}$	3; 2	1.48 \| 918.82 \| 394.89	$2.11 \times 10^{- 7}$ \| $9.97 \times 10^{- 7}$ \| $5.89 \times 10^{- 7}$
	$1 \times 10^{- 6}$	30; 27	2.95 \| 10,425.34 \| 3574.08	$3.13 \times 10^{- 7}$ \| $1.64 * \times 10^{- 7}$ \| $6.47 \times 10^{- 7}$
N2	$1 \times 10^{- 5}$	1; 1	0.87 \| 5.35 \| 2.69	$1.78 \times 10^{- 7}$ \| $7.19 \times 10^{- 7}$ \| $1.44 \times 10^{- 10}$
	$1 \times 10^{- 6}$	53; 51	1.40 \| 1415.89 \| 1147.08	$3.39 \times 10^{- 7}$ \| $4.12 \times 10^{- 4}$ \| $2.79 \times 10^{- 4}$
N3	$1 \times 10^{- 3}$	188; 62	9.35 \| 1218.89 \| 1769.23	$2.19 \times 10^{- 9}$ \| $9.99 \times 10^{- 7}$ \| $1.14 \times 10^{- 3}$
	$1 \times 10^{- 4}$	229; 72	12.64 \| 1844.98 \| 1759.28	$1.99 \times 10^{- 8}$ \| $1.19 \times 10^{- 4}$ \| $3.58 \times 10^{- 3}$
N4	$1 \times 10^{- 3}$	813; 144	44.33 \| 10,800.13 \| 10,800.24	$7.25 \times 10^{- 7}$ \| $8.61 \times 10^{- 4}$ \| $2.67 \times 10^{- 3}$
	$1 \times 10^{- 4}$	932; 191	66.75 \| 10,800.39 \| 10,800.20	$5.07 \times 10^{- 8}$ \| $4.20 \times 10^{- 4}$ \| $1.51 \times 10^{- 2}$
N5	$1 \times 10^{- 3}$	31; 16	3.74 \| 203.73 \| 3056.66	$2.16 \times 10^{- 8}$ \| $9.95 \times 10^{- 7}$ \| $2.08 \times 10^{- 5}$
	$1 \times 10^{- 4}$	76; 57	8.13 \| 2050.13 \| 3028.76	$6.65 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $8.40 \times 10^{- 4}$
N6	$1 \times 10^{- 3}$	2; 2	4.03 \| 1718.95 \| 613.07	$1.01 \times 10^{- 9}$ \| $1.00 \times 10^{- 6}$ \| $6.60 \times 10^{- 7}$
	$1 \times 10^{- 4}$	4; 3	4.78 \| 1087.70 \| 2832.63	$1.10 \times 10^{- 9}$ \| $1.00 \times 10^{- 6}$ \| $9.89 \times 10^{- 7}$
N7	$1 \times 10^{- 3}$	192; 121	8.07 \| 1391.03 \| 3812.60	$2.83 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $5.54 \times 10^{- 5}$
	$1 \times 10^{- 4}$	337; 232	10.96 \| 4666.00 \| 4022.00	$9.20 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $2.20 \times 10^{- 3}$
N8	$1 \times 10^{- 3}$	57; 33	0.35 \| 0.96 \| 15.52	$4.17 \times 10^{- 9}$ \| $1.00 \times 10^{- 6}$ \| $1.68 \times 10^{- 6}$
	$1 \times 10^{- 4}$	169; 99	0.48 \| 27.31 \| 20.70	$9.73 \times 10^{- 7}$ \| $1.59 \times 10^{- 5}$ \| $5.51 \times 10^{- 5}$
N9	$1 \times 10^{- 3}$	23; 12	0.92 \| 105.33 \| 272.03	$3.45 \times 10^{- 7}$ \| $9.97 \times 10^{- 7}$ \| $8.64 \times 10^{- 7}$
	$1 \times 10^{- 4}$	47; 31	1.38 \| 1057.48 \| 1602.03	$1.26 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $5.78 \times 10^{- 6}$
N10	$1 \times 10^{- 3}$	28; 21	0.66 \| 37.79 \| 268.66	$1.26 \times 10^{- 7}$ \| $9.98 \times 10^{- 7}$ \| $9.92 \times 10^{- 7}$
	$1 \times 10^{- 4}$	87; 60	1.44 \| 505.11 \| 1062.37	$2.73 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $2.98 \times 10^{- 6}$
N11	$1 \times 10^{- 3}$	22; 7	3.17 \| 204.05 \| 2806.48	$2.80 \times 10^{- 8}$ \| $9.96 \times 10^{- 7}$ \| $1.94 \times 10^{- 5}$
	$1 \times 10^{- 4}$	142; 26	3.27 \| 321.88 \| 2704.28	$9.52 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $4.34 \times 10^{- 5}$
N12	$1 \times 10^{- 3}$	19; 5	0.75 \| 27.07 \| 241.41	$2.25 \times 10^{- 8}$ \| $9.98 \times 10^{- 7}$ \| $5.09 \times 10^{- 6}$
	$1 \times 10^{- 4}$	102; 27	0.88 \| 252.12 \| 248.56	$3.96 \times 10^{- 7}$ \| $9.87 \times 10^{- 6}$ \| $7.27 \times 10^{- 6}$

* The bolded data indicate that the corresponding value did not meet the precision requirements.

Table 4. The performances of Ssnal, ADMM, and APG when applied to real datasets with the regularization parameters defined as in (P2).

No.	$ϖ$	nnz; mon	Time	Res
No.	$ϖ$	nnz; mon	$S SNAL ∣ ADMM ∣ APG$	$S SNAL ∣ ADMM ∣ APG$
N1	$1 \times 10^{- 5}$	1; 1	1.03 \| 289.40 \| 2.51	$9.13 \times 10^{- 9}$ \| $9.63 \times 10^{- 7}$ \| $5.50 \times 10^{- 9}$
	$1 \times 10^{- 6}$	3; 2	2.16 \| 1142.71 \| 2192.27	$5.88 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $2.36 \times 10^{- 7}$
N2	$1 \times 10^{- 5}$	1; 1	0.33 \| 5.97 \| 0.70	$1.12 \times 10^{- 7}$ \| $7.18 \times 10^{- 7}$ \| $2.78 \times 10^{- 9}$
	$1 \times 10^{- 6}$	5; 4	0.64 \| 274.56 \| 1128.41	$1.92 \times 10^{- 7}$ \| $9.99 \times 10^{- 7}$ \| $2.16 * \times 10^{- 5}$
N3	$1 \times 10^{- 3}$	716; 46	5.70 \| 574.10 \| 1760.94	$1.00 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $5.91 \times 10^{- 4}$
	$1 \times 10^{- 4}$	776; 61	7.90 \| 1785.58 \| 1728.24	$8.79 \times 10^{- 7}$ \| $9.31 \times 10^{- 5}$ \| $5.62 \times 10^{- 3}$
N4	$1 \times 10^{- 3}$	2475; 119	26.30 \| 10,800.38 \| 10,800.40	$4.09 \times 10^{- 8}$ \| $2.41 \times 10^{- 3}$ \| $9.35 \times 10^{- 4}$
	$1 \times 10^{- 4}$	3525; 161	50.08 \| 10,800.09 \| 10,800.13	$2.56 \times 10^{- 7}$ \| $4.60 \times 10^{- 5}$ \| $8.83 \times 10^{- 3}$
N5	$1 \times 10^{- 3}$	96; 9	2.18 \| 232.40 \| 2917.46	$6.92 \times 10^{- 9}$ \| $9.95 \times 10^{- 7}$ \| $4.89 \times 10^{- 6}$
	$1 \times 10^{- 4}$	82; 21	4.39 \| 202.29 \| 2925.83	$6.72 \times 10^{- 9}$ \| $9.97 \times 10^{- 7}$ \| $1.76 \times 10^{- 4}$
N6	$1 \times 10^{- 3}$	9; 3	1.98 \| 1187.52 \| 1422.26	$6.39 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $9.71 \times 10^{- 7}$
	$1 \times 10^{- 4}$	8; 3	3.30 \| 1064.66 \| 2015.64	$3.20 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $9.62 \times 10^{- 7}$
N7	$1 \times 10^{- 3}$	347; 52	5.07 \| 711.36 \| 4618.18	$5.92 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $1.75 \times 10^{- 5}$
	$1 \times 10^{- 4}$	711; 125	8.65 \| 3054.37 \| 4570.98	$1.61 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $2.74 \times 10^{- 4}$
N8	$1 \times 10^{- 3}$	53; 13	0.24 \| 1.15 \| 13.51	$4.70 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $9.96 \times 10^{- 7}$
	$1 \times 10^{- 4}$	214; 51	0.52 \| 11.98 \| 19.47	$7.79 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $1.09 \times 10^{- 5}$
N9	$1 \times 10^{- 3}$	16; 8	0.74 \| 6.79 \| 44.64	$1.10 \times 10^{- 9}$ \| $8.36 \times 10^{- 7}$ \| $9.60 \times 10^{- 7}$
	$1 \times 10^{- 4}$	70; 18	1.27 \| 553.37 \| 1597.02	$3.19 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $9.91 \times 10^{- 7}$
N10	$1 \times 10^{- 3}$	34; 12	0.73 \| 5.65 \| 100.67	$3.00 \times 10^{- 7}$ \| $9.32 \times 10^{- 7}$ \| $9.97 \times 10^{- 7}$
	$1 \times 10^{- 4}$	151; 33	0.96 \| 219.02 \| 701.33	$6.53 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$
N11	$1 \times 10^{- 3}$	110; 3	1.89 \| 304.06 \| 874.43	$9.99 \times 10^{- 9}$ \| $9.99 \times 10^{- 7}$ \| $9.53 \times 10^{- 7}$
	$1 \times 10^{- 4}$	811; 19	3.24 \| 180.17 \| 2900.86	$8.66 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $4.28 \times 10^{- 5}$
N12	$1 \times 10^{- 3}$	218; 3	0.56 \| 10.26 \| 52.18	$1.81 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $9.41 \times 10^{- 7}$
	$1 \times 10^{- 4}$	491; 15	0.84 \| 266.41 \| 254.19	$3.78 \times 10^{- 7}$ \| $2.06 \times 10^{- 6}$ \| $3.66 \times 10^{- 6}$

* The bolded data indicate that the corresponding value did not meet the precision requirements.

Table 5. The performances of Ssnal, ADMM, and APG when applied to real datasets with the regularization parameters defined as in (P3).

No.	$ϖ$	nnz; mon	Time	Res
No.	$ϖ$	nnz; mon	$S SNAL ∣ ADMM ∣ APG$	$S SNAL ∣ ADMM ∣ APG$
N1	$1 \times 10^{- 5}$	2; 2	3.09 \| 1863.19 \| 2538.36	$5.75 \times 10^{- 7}$ \| $9.99 \times 10^{- 7}$ \| $9.56 \times 10^{- 7}$
	$1 \times 10^{- 6}$	95; 93	4.55 \| 9785.88 \| 3499.10	$3.32 \times 10^{- 7}$ \| $4.03 * \times 10^{- 4}$ \| $1.02 \times 10^{- 3}$
N2	$1 \times 10^{- 5}$	3; 3	1.97 \| 180.83 \| 1142.03	$3.34 \times 10^{- 8}$ \| $9.99 \times 10^{- 7}$ \| $1.98 \times 10^{- 5}$
	$1 \times 10^{- 6}$	170; 168	7.19 \| 1411.02 \| 1103.22	$6.32 \times 10^{- 7}$ \| $4.20 \times 10^{- 4}$ \| $5.08 \times 10^{- 4}$
N3	$1 \times 10^{- 3}$	84; 83	4.39 \| 1615.80 \| 1548.70	$3.38 \times 10^{- 7}$ \| $3.10 \times 10^{- 5}$ \| $9.85 \times 10^{- 4}$
	$1 \times 10^{- 4}$	79; 78	4.62 \| 1571.90 \| 1546.56	$9.00 \times 10^{- 7}$ \| $9.94 \times 10^{- 4}$ \| $3.05 \times 10^{- 3}$
N4	$1 \times 10^{- 3}$	231; 168	33.43 \| 10,800.08 \| 10,800.05	$7.56 \times 10^{- 7}$ \| $3.58 \times 10^{- 4}$ \| $4.47 \times 10^{- 3}$
	$1 \times 10^{- 4}$	230; 192	52.89 \| 10,800.22 \| 10,800.35	$3.57 \times 10^{- 7}$ \| $3.72 \times 10^{- 2}$ \| $1.33 \times 10^{- 2}$
N5	$1 \times 10^{- 3}$	29; 28	4.48 \| 972.66 \| 3166.79	$1.33 \times 10^{- 7}$ \| $1.00 \times 10^{- 6}$ \| $1.04 \times 10^{- 4}$
	$1 \times 10^{- 4}$	92; 86	16.39 \| 3582.72 \| 3013.39	$1.35 \times 10^{- 7}$ \| $1.37 \times 10^{- 5}$ \| $2.21 \times 10^{- 3}$
N6	$1 \times 10^{- 3}$	2; 2	3.31 \| 1644.35 \| 694.08	$1.87 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $3.33 \times 10^{- 7}$
	$1 \times 10^{- 4}$	8; 7	3.83 \| 846.32 \| 2765.49	$5.32 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $4.51 \times 10^{- 5}$
N7	$1 \times 10^{- 3}$	195; 187	7.29 \| 3279.67 \| 4712.21	$5.63 \times 10^{- 7}$ \| $9.99 \times 10^{- 7}$ \| $1.71 \times 10^{- 4}$
	$1 \times 10^{- 4}$	337; 331	7.50 \| 4727.07 \| 5025.99	$9.51 \times 10^{- 7}$ \| $1.94 \times 10^{- 5}$ \| $4.61 \times 10^{- 3}$
N8	$1 \times 10^{- 3}$	56; 54	0.28 \| 5.28 \| 16.26	$2.66 \times 10^{- 8}$ \| $1.00 \times 10^{- 6}$ \| $6.37 \times 10^{- 6}$
	$1 \times 10^{- 4}$	169; 140	0.41 \| 22.06 \| 16.95	$1.06 \times 10^{- 7}$ \| $1.65 \times 10^{- 4}$ \| $1.28 \times 10^{- 4}$
N9	$1 \times 10^{- 3}$	18; 15	0.99 \| 133.89 \| 531.25	$1.13 \times 10^{- 9}$ \| $9.97 \times 10^{- 7}$ \| $5.77 \times 10^{- 7}$
	$1 \times 10^{- 4}$	50; 39	1.71 \| 1936.72 \| 1632.34	$1.02 \times 10^{- 7}$ \| $2.45 \times 10^{- 6}$ \| $6.39 \times 10^{- 6}$
N10	$1 \times 10^{- 3}$	32; 28	0.96 \| 39.57 \| 471.36	$1.72 \times 10^{- 7}$ \| $9.93 \times 10^{- 7}$ \| $9.93 \times 10^{- 7}$
	$1 \times 10^{- 4}$	92; 86	1.41 \| 1105.05 \| 1022.00	$2.44 \times 10^{- 7}$ \| $2.31 \times 10^{- 5}$ \| $1.10 \times 10^{- 5}$
N11	$1 \times 10^{- 3}$	18; 13	2.49 \| 290.45 \| 2284.04	$1.04 \times 10^{- 8}$ \| $9.97 \times 10^{- 7}$ \| $7.13 \times 10^{- 6}$
	$1 \times 10^{- 4}$	100; 95	3.19 \| 1436.32 \| 2290.59	$6.20 \times 10^{- 9}$ \| $9.88 \times 10^{- 7}$ \| $1.05 \times 10^{- 4}$
N12	$1 \times 10^{- 3}$	23; 21	0.63 \| 110.50 \| 235.08	$1.74 \times 10^{- 7}$ \| $9.97 \times 10^{- 7}$ \| $1.17 \times 10^{- 5}$
	$1 \times 10^{- 4}$	47; 44	0.67 \| 243.67 \| 236.79	$6.24 \times 10^{- 7}$ \| $3.30 \times 10^{- 5}$ \| $1.41 \times 10^{- 5}$

* The bolded data indicate that the corresponding value did not meet the precision requirements.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, Y.; Lin, L.; Liu, Y.-J. A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem. Mathematics 2025, 13, 501. https://doi.org/10.3390/math13030501

AMA Style

Xu Y, Lin L, Liu Y-J. A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem. Mathematics. 2025; 13(3):501. https://doi.org/10.3390/math13030501

Chicago/Turabian Style

Xu, Yanmei, Lanyu Lin, and Yong-Jin Liu. 2025. "A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem" Mathematics 13, no. 3: 501. https://doi.org/10.3390/math13030501

APA Style

Xu, Y., Lin, L., & Liu, Y.-J. (2025). A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem. Mathematics, 13(3), 501. https://doi.org/10.3390/math13030501

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Semismooth Newton-Based Augmented Lagrangian Algorithm for the Generalized Convex Nearly Isotonic Regression Problem

Abstract

1. Introduction

2. The Proximal Mapping of the GCNIR Regularizer and Its Generalized Jacobian

3. A Semismooth Newton-Based Augmented Lagrangian Algorithm

3.1. The Framework of the Ssnal Algorithm

3.2. Ssn Algorithm for Subproblem (9)

4. Numerical Experiments

4.1. Some First-Order Methods for the GCNIR Problem

4.2. Stopping Criteria

4.3. Results on Synthetic Data

4.4. Results on Real Datasets

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI