Next Article in Journal
An Extension of a Formula of F. S. Rofe-Beketov
Previous Article in Journal
Joint Optimal Policy for Maintenance, Spare Unit Selection and Inventory Control Under a Partially Observable Markov Decision Process
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Proximal Alternating Direction Method of Multipliers for a Class of Nonlinear Constrained Optimization Problems

College of Science, University of Shanghai for Science and Technology, No. 516, Jun Gong Road, Shanghai 200093, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(3), 407; https://doi.org/10.3390/math13030407
Submission received: 26 December 2024 / Revised: 21 January 2025 / Accepted: 22 January 2025 / Published: 26 January 2025

Abstract

:
This paper presents a class of proximal alternating direction multiplier methods for solving nonconvex, nonsmooth optimization problems with nonlinear coupled constraints. The key feature of the proposed algorithm is that we use a linearized proximal technique to update the primary variables, followed by updating the dual variables using a discounting approach. This approach eliminates the requirement for an additional proxy function and then simplifies the optimization process. In addition, the algorithm maintains fixed parameter selection throughout the update process, removing the requirement to adjust parameters to ensure the decreasing nature of the generated sequence. Building on this framework, we establish a Lyapunov function with sufficient decrease and a lower bound, which is essential for analyzing the convergence properties of the algorithm. We rigorously prove both the subsequence convergence and the global convergence of the algorithm and ensure its robustness and effectiveness in solving complex optimization problems. Our paper provides a solid theoretical foundation for the practical application of this method in solving nonconvex optimization problems with nonlinear coupled constraints.

1. Introduction

This paper considers the following nonconvex, nonsmooth optimization problem with nonlinear coupled constraints:
min f ( x ) + g ( x ) + p ( y ) ,
s . t . h ( x ) + B y = 0 .
where x R n , y R m , f : R n R { } is a proper lower semicontinuous function, g : R n R is a continuously differentiable function with local Lipschitz continuous gradients, p : R m R is a function with local Lipschitz continuous gradients L p , where L p > 0 , and h : R n R q is a mapping with a locally continuous Jacobian matrix. B R q × m is a full-row rank matrix with the smallest eigenvalue λ ( B B T ) > 0 .
Problem (1) is equivalent to many important problems, such as the risk parity portfolio selection problem [1], the robust phase recovery problem [2], and generative adversarial networks (GANs) [3]. When B = I (where I is the identity matrix), problem (1) is equivalent to the following problem:
min f ( x ) + g ( x ) + p ( h ( x ) ) .
The typical example of problem (2) is the following Logistic Matrix Factorization problem [4]:
min i = 1 m j = 1 n ( 1 + c y i j y i j ) log 1 + exp ( u i v j T ) c y i j u i v j T + λ d 2 U F 2 + λ t 2 V F 2 ,
where elements y i j { 0 , 1 } , λ d , and λ t are regularization coefficients, and c > 0 is a given constant.
In previous studies, when problem (1) is convex and involves linear coupled constraints, that is, h ( x ) is an affine mapping, the alternating direction multiplier method (ADMM) has been extensively studied in many studies. In particular, when h ( x ) = A x and A and B are both full-rank, the global convergence of ADMM was established in references [5,6]. Researchers have found that many limitations arise when the problem is set as convex, while nonconvex formulations can better describe the problem model. So, the focus of research gradually shifted toward applying ADMM to nonconvex optimization problems. For example, references [7,8,9,10] established the global convergence of ADMM for nonconvex models with linear constraints.
Although the formulation of nonconvex problems has addressed some practical scenarios, the limitations of linear coupled constraint problems become evident as real-world constraints increase. As a result, researchers have shifted their focus to the study of nonlinear coupled constraint problems. For nonlinear coupled constraint problems, paper [11] proposed the concept of an information region within the Lagrange algorithm framework, where the boundedness of the multiplier sequence is a key assumption for the effect of the information region. The selection of parameters in the paper is related to the upper bound constant of the multiplier sequence, and in practice, determining the upper bound for the multipliers can be very challenging. Building on the idea of an information region from [11], paper [12] designed a proximal linearized alternating direction multiplier method with a backtracking process. Although convergence analysis does not rely on the boundedness of the multiplier sequence, it requires dynamically generated parameters during the backtracking process to ensure the boundedness of the generated sequence. Paper [13] used the upper bound minimization method to update the primal block variables. Building on this, paper [4] introduced inertia techniques and proposed a convergent alternating direction multiplier method with scaling factors to establish update rules. However, this requires the use of appropriate proxy functions to recover proximal points and proximal gradient descent steps, which is also challenging in practice.
In this paper, we propose a proximal linearized alternating direction multiplier method for problem (1). The core idea of this method is to use the proximal linearized alternating direction multiplier method and update the dual variables using a discount strategy. This method does not require selecting an additional proxy function and the parameter selection is fixed, eliminating the need to adjust the parameters to ensure the decreasing nature of the generated sequence, which leads to the convergence analysis of the algorithm.
The structure of this paper is as follows. Section 2 introduces the preliminary knowledge required for the convergence analysis of the algorithm. Section 3 describes the algorithm in detail and provides the convergence theorem. Section 4 presents the numerical experimental results of the algorithm for the non-negative matrix factorization problem. Section 5 concludes the paper and provides final remarks. In the Appendix A, we provide the proof process for some propositions.

2. Materials and Methods

In this section, we present the preliminaries needed for the convergence analysis. Based on the variational analysis of the problem (refer to [14]), we introduce the following constraint qualification conditions for problem (1). Here, h ( · ) R q × n represents the Jacobian matrix of h.
Definition 1
(Constraint Qualification [13]). A point x R n satisfies the constraint qualification (CQ) for problem (1) if the following conditions hold:
(i) 
The subdifferential of f in x is regular.
(ii) 
f ( x ) h ( x ) T = { 0 } , i.e., the limiting subdifferential of f at x does not overlap with the range of the transpose of the gradient of the mapping hat x.
The [CQ] conditions ensure the smoothness and regularity of the constraint set, providing the necessary subdifferential calculus rules to establish the necessary first-order optimality conditions for problem (1).
Lemma 1
(First-Order Optimality Condition). Let ( x * , y * ) R n × R m be a local minimum of problem (1), and let x * satisfy the [CQ] condition. Then, we have the following:
(i) 
h ( x * ) + B y * = 0 .
(ii) 
There exists u * R q such that
0 x ( f ( x * ) + g ( x * ) ) + x h ( x * ) T u * , q ( y * ) + B T u * = 0 , h ( x * ) + B y * = 0 .
This paper mainly considers the nonconvex and nonsmooth problem (1). Therefore, we provide some definitions and lemmas for convergence analysis [14,15,16].
Definition 2
(Subdifferential). Let E be a Euclidean vector space, and let Ψ : E ( , ] be a proper lower semicontinuous function. If z d o m ( Ψ ) , then we have the following:
(i) 
The Fréchet (regular) subdifferential of Ψ at Z, denoted by Ψ ^ ( z ) , is defined as follows: For any vector v E , we have
Ψ ( x ) Ψ ( z ) + v , x z + o ( x z ) .
(ii) 
The limiting subdifferential of Ψ at Z, denoted by Ψ ( z ) , is defined as follows: For any vector v E , there exists a sequence { z k } k N and { v k } k N such that z k z , Ψ ( z k ) Ψ ( z ) , v k Ψ ^ ( z k ) , and v k v .
(iii) 
The horizontal subdifferential of Ψ at Z, denoted by Ψ ( z ) , is similar to the definition in (ii), but for some real number sequence t k 0 , we do not have v k v , but instead, t k v k v . For z d o m ( Ψ ) , we define Ψ ^ ( z ) = Ψ ( z ) = Ψ ( z ) = .
Proposition 1.
Let Ψ : R ( , ] be an extended value function, and φ : R d R be a smooth function. Then, we have the following:
( Ψ + φ ) = Ψ ( x ) + φ ( x ) , x R d .
In problem (1), we consider the concept of Lipschitz functions and local Lipschitz continuity. Next, we provide the relevant concepts.
Definition 3.
Let S X be a non-empty set, and let φ : S Y be a continuous map on S. Then, we have the following:
(i) 
If φ ( x ) φ ( z ) L x z for all x , z S and L 0 , then φ is L-Lipschitz continuous on S.
(ii) 
If for each z S , there exist ϵ ( z ) > 0 , L ϵ ( z ) 0 , and a neighborhood N ϵ ( z ) : = { x S : x z < ϵ ( z ) } such that φ is L ϵ ( z ) -Lipschitz continuous on N ϵ ( z ) , i.e.,
φ ( x ) φ ( z ) L ϵ ( z ) x z , x , z N ϵ ( z ) .
When (i) or (ii) holds for S X , φ is called L-Lipschitz continuous or locally Lipschitz continuous, respectively.
Lemma 2
(Local Lipschitz Continuity of ϕ [12]). The function ϕ ( x , y , u ) : = g ( x ) + u , h ( x ) + B y + ρ 2 h ( x ) + B y 2 satisfies that for every non-empty and compact set C R n × R m × R q , there exists L C 0 such that
ϕ ( x ) ϕ ( z ) L C x z , x , z C .
This means that ϕ is locally Lipschitz continuous.
Proposition 2
(Local Lipschitz Continuity and Compact Sets [17]). Let S X be a non-empty set, and let the mapping φ : X Y be locally Lipschitz continuous on S. Then, for every non-empty compact set C S , there exists L 0 such that φ is L-Lipschitz continuous on C, i.e.,
φ ( x ) φ ( z ) L C x z , x , z C .
Proposition 3
(Differential Mapping and Lipschitz Continuity [18]). Let φ : X Y be a C 1 -mapping. Then, the following statements hold:
(i) 
φ is locally Lipschitz continuous.
(ii) 
Let B X be a closed ball, i.e., B = { x : x z r } for some z X and r ( 0 , ] . If φ is L-Lipschitz continuous on B, with L B 0 , then φ ( x ) L B , for all x B .

3. Results

3.1. Algorithm

First, we define the augmented Lagrangian function associated with problem (1), L : R n × R m × R q ( , ] , with penalty parameter ρ > 0 :
L ρ ( x , y , u ) : = f ( x ) + g ( x ) + p ( y ) + u , h ( x ) + B y + ρ 2 h ( x ) + B y 2 ,
where · denotes the standard Euclidean l 2 -norm.
To distinguish between the smooth and nonsmooth components in L, we define ϕ : R n × R m × R q R as
ϕ ( x , y , u ) : = g ( x ) + u , h ( x ) + B y + ρ 2 h ( x ) + B y 2 .
Thus, L ρ ( x , y , u ) = f ( x ) + p ( y ) + ϕ ( x , y , u ) .
Next, we present Algorithm 1 proposed in this paper.
Algorithm 1 Proximal Linearized ADMM (PADMM)
Input: Initial values ( x 0 , y 0 , u 0 ) R n × R m × R q , θ > 0 , η > 0 , τ [ 0 , 1 ) , and let k 0 .
   Repeat:
   Primal update:
x k + 1 arg min x R n φ k ( x ) : = f ( x ) + x ϕ ( x k , y k , u k ) , x + η 2 x x k 2
y k + 1 arg min y R m ψ k ( y ) : = ϕ ( x k + 1 , y , u k ) + p ( y k ) , y + θ 2 y y k 2
   Dual update:
u k + 1 = ( 1 τ ) u k + ρ h ( x k + 1 ) + B y k + 1
   Until the convergence criterion is satisfied.
Remark 1.
For Equation (6), we first provide the properties of the proximal mapping [14]: when ξ : R d ( , ] be a proper lower semicontinuous function, with inf R d ξ > , let x R d . When t > 0 , the proximal mapping prox ξ / t ( x ) is defined as
p r o x ξ / t ( x ) = arg min ξ ( z ) + t 2 z x 2 : z R d .
So, Equation (6) is essentially a proximal gradient update step:
x k + 1 a r g m i n x R n f ( x ) + x ϕ ( x k , y k , u k ) , x + η 2 x x k 2 = prox η 1 f x k η 1 x ϕ ( x k , y k , u k ) .
For Equation (7), we note that ψ is a strongly convex function. Since for any θ > 0 matrix ρ B B + θ I > 0 , we can derive the explicit update expression for y k + 1 from the first-order optimality condition of (7):
y k + 1 = ( ρ B B + θ I ) 1 θ y k B ( u k + ρ h ( x k + 1 ) ) q ( y k ) .
For Equation (8), we adopt a discount update scheme:
u k + 1 = ( 1 τ ) u k + ρ ( h ( x k + 1 ) + B y k + 1 ) = ( 1 τ ) 2 u k 1 + ( 1 τ ) ρ ( h ( x k ) + B y k ) + ρ ( h ( x k + 1 ) + B y k + 1 ) = ( 1 τ ) k + 1 u 0 + l = 0 k ( 1 τ ) k l ρ h ( x l + 1 ) + B y l + 1 .
In contrast, the dual update for the ADMM method with nonlinear coupling constraints is typically
u k + 1 = u k + ρ h ( x k + 1 ) + B y k + 1 = u k 1 + ρ h ( x k ) + B y k + ρ h ( x k + 1 ) + B y k + 1 = u 0 + l = 0 k ρ h ( x l + 1 ) + B y l + 1 .
In the proximal alternating direction method of multipliers (proximal ADMM), the primal and dual updates alternate until the stopping criteria are met.

3.2. Convergence Analysis

Our analysis will focus on the function ϕ . For the convenience of subsequent proofs, we first present some formulas:
x ϕ ( x , y , u ) = g ( x ) + h T ( x ) ( u + ρ ( h ( x ) + B y ) ) ,
y ϕ ( x , y , u ) = B T ( u + ρ ( h ( x ) + B y ) ) ,
u ϕ ( x , y , u ) = h ( x ) + B y .
We also state the assumptions required for our proof. We assume that the functions g and p have lower bounds, that is, g ( x ) > , x R n and p ( y ) > , y R m .
First, we provide the relationship between y and u.
Proposition 4.
Assume u ^ k : = u k + ρ ( h ( x k + 1 ) + B y k + 1 ) , for all k 1 ; then, we have
u k + 1 u k 2 1 λ θ y k + 1 y k + ( L q + θ ) y k y k 1 2 .
Proof of Proposition 4.
We defer the proof to Appendix A.1.    □
Let L ρ + ( x k + 1 , y k + 1 , u k + 1 ) = L ρ ( x k + 1 , y k + 1 , u k + 1 ) τ 2 ρ u k + 1 2 . For all k 1 , let this be the regularized AL function. We have the following proposition to quantify the change in the regularized AL function over successive iterations.
Proposition 5.
For all k 1 , we have
L ρ + ( x k + 1 , y k + 1 , u k + 1 ) L ρ + ( x k , y k , u k ) η L C 2 x k + 1 x k 2 2 θ L q 2 ( 2 τ ) θ 2 ρ λ min ( B B T ) y k + 1 y k 2 + ( 2 τ ) ( θ + L h ) 2 ρ λ min ( B B T ) y k y k 1 2 .
Proof of Proposition 5.
We defer the proof to Appendix A.2.    □
In particular, the authors of [19] built a general framework to establish convergence for nonconvex settings, comprising two key steps: (1) identifying a so-called sufficiently decreasing Lyapunov function and (2) establishing the lower boundness property of the Lyapunov function. The augmented Lagrangian (AL) function has often been used as the Lyapunov function in nonconvex settings. In Proposition 5, L ρ + ( x , y , u ) is the regularized AL function. We observe that the sufficient descent property of the AL function can only hold when the primal update restricts the dual update, i.e., u k + 1 u k 2 , for the sufficient descent property of the AL function to hold. First, based on the ascent–descent relationship established in Proposition 6, we introduce the Lyapunov function Γ β : R n × R m × R q × R m ( , ] , defined as Γ β = L ρ + ( x , y , u ) + β 2 y w 2 , where β > 0 is a constant parameter required to ensure the sufficient descent and lower bound properties of the Lyapunov function. Next, we establish the sufficient descent property of the Lyapunov function. For convenience, we define the sequence { z k ( x k , y k , u k , y k 1 ) } k 1 as a descent sequence for the Lyapunov function Γ β . Based on Proposition 4 and Proposition 5, the following proposition can be easily obtained.
Proposition 6.
For k 1 ,
Γ β ( z k + 1 ) Γ β ( z k ) η L C 2 x k + 1 x k 2 α 1 y k + 1 y k 2 α 2 y k y k 1 2 ,
where λ = λ min ( B B T ) , α 1 = 2 θ L p β 2 ( 2 τ ) θ 2 ρ λ , and α 2 = β 2 ( 2 τ ) ( θ + L h ) 2 ρ λ .
This implies that when η L C 2 > 0 , α 1 > 0 , and α 2 > 0 , the Lyapunov function we constructed maintains sufficient descent properties. Another key step for the convergence of the algorithm is to establish the lower bound properties of the Lyapunov function. To do so, we first prove the lower bound property of the Lagrange multipliers generated by the discounted dual update scheme.
Proposition 7.
Let Δ k : = h ( x k ) + B y k be the constraint residual at iteration k, and let Δ m a x = max h ( x k ) + B y k represent the maximum constraint residual over the feasible set. Algorithm 1, starting from any given initial dual variable u 0 , guarantees that u k is bounded, i.e.,
u k u 0 + τ 1 ρ Δ m a x ,
or equivalently,
u k 2 2 u 0 2 + 2 τ 2 ρ 2 Δ m a x 2 .
Proof. 
From Equation (12), we have
u k = ( 1 τ ) k + 1 u 0 + l = 0 k ρ ( 1 τ ) k l Δ l + 1 ( 1 τ ) k + 1 u 0 + l = 0 k ρ ( 1 τ ) k l Δ l + 1 ( 1 τ ) k + 1 u 0 + ρ Δ max 1 ( 1 τ ) k τ u 0 + τ 1 ρ Δ max .
   □
The first inequality follows from the triangle inequality, the second inequality comes from Δ k Δ max , and the final inequality holds because τ ( 0 , 1 ) . Furthermore, by applying the inequality a + b 2 2 a 2 + 2 b 2 , we can establish the lower bound property of the Lyapunov function.
Proposition 8.
For the sequence { z k } k K generated by the algorithm, we have Γ β ( z k ) > for all k K .
Proof. 
Observe the construction of Γ β ( z k ) = L ρ ( x k , y k , u k ) τ 2 ρ u k 2 + β 2 y k 1 y k 2 2 . Clearly, β 2 y k y k 1 2 is non-negative and has a lower bound, and, by Proposition 7, since u k 2 is bounded, τ 2 ρ u k 2 is also bounded. Therefore, we need to prove the boundedness of L ρ ( x k , y k , u k ) . Observe that L ρ ( x k , y k , u k ) , based on our assumptions, has g > and p > , f > , and ρ 2 h ( x ) + B y 2 is non-negative and has a lower bound. Thus, we only need to prove the lower bound of u , h ( x ) + B y . Based on dual update (10), we have
u k + 1 , h ( x k + 1 ) + B y k + 1 = u k + 1 , u k + 1 ( 1 τ ) u k ρ = u k + 1 , 1 τ ρ ( u k + 1 u k ) + τ ρ u k + 1 = τ ρ u k + 1 2 + 1 τ ρ u k + 1 , u k + 1 u k = τ ρ u k + 1 2 + 1 τ 2 ρ u k + 1 u k 2 + u k + 1 2 u k 2
We know that u k 2 is bounded, so we conclude that L ρ ( x k , y k , u k ) has a lower bound. The proof is complete.    □

3.3. Main Result

To demonstrate the convergence result of the algorithm, we first define the notion of an approximate stationary solution.
Definition 4.
For all ϵ, when ( x * , y * , u * ) is an ϵ-approximate stationary solution of problem (1), we have
d i s t f ( x * ) + g ( x * ) + h T ( x * ) u ^ * + N ( x * ) , 0 + d i s t y p ( y * ) + B T u ^ * , 0 + h ( x * ) + B y * ϵ .
Theorem 1.
Assume that the parameters used in the algorithm satisfy the conditions under which the lemmas hold. Then, we have the following:
(i) 
The sequences { x k } , { y k } , and { u k } generated by the algorithm are bounded and converge, i.e.,
x k + 1 x k 0 , y k + 1 y k 0 , u k + 1 u k 0 .
(ii) 
Suppose there is a limit point ( x * , y * , u * ) . Then, when u ^ * = ( 1 + τ ) u * , the point ( x * , y * , u ^ * ) is a τ ρ 1 u * -approximate stationary solution to problem (1).
Proof. 
From Proposition 6, we have
k = 1 K Γ β ( z k ) Γ β ( z k + 1 ) η L C 2 k = 1 K x k + 1 x k 2 + α 1 k = 1 K y k + 1 y k 2 + α 2 k = 1 K y k y k 1 2 .
Let K , and we obtain
Γ β ( z 1 ) lim K Γ β ( z k + 1 ) η L C 2 k = 1 x k + 1 x k 2 + α 1 k = 1 y k + 1 y k 2 + α 2 k = 1 y k y k 1 2 .
Since Γ β ( z k + 1 ) > , we have
> η L C 2 k = 1 x k + 1 x k 2 + α 1 k = 1 y k + 1 y k 2 + α 2 k = 1 y k y k 1 2 .
This implies that
x k + 1 x k 0 , y k + 1 y k 0 , u k + 1 u k 0 .
Therefore, sequences { x k } , { y k } , and { u k } converge to x * , y * , and u * , i.e., as K , we have x k + 1 x * , y k + 1 y * , and u k + 1 u * , and also x k + 1 x k , y k + 1 y k , and u k + 1 u k .
From the dual updates, ( x * , y * , u * ) satisfies
h ( x * ) + B y * = τ ρ 1 u * .
As we defined u ^ k : = u k + ρ ( h ( x k + 1 ) + B y k + 1 ) , we know that u ^ k ( 1 + τ ) u * . Thus, by setting u ^ * = ( 1 + τ ) u * , we have u ^ k u ^ * .
Next, for part (ii), we consider the first-order optimality conditions for the update when k 1 , and by taking k , we have
0 x f ( x k + 1 ) + x ϕ ( x k , y k , u k ) + η ( x k + 1 x k ) = x f ( x * ) + g ( x * ) + h T ( x * ) u ^ * .
0 = y p ( y * ) + B T u ^ * , h ( x * ) + B y * = τ ρ 1 u * .
From (19), according to [7], there exists v k + 1 N x ( x k + 1 ) such that
0 = x f ( x k + 1 ) + x ϕ ( x k , y k , u k ) + η ( x k + 1 x k ) + v k + 1 .
Thus, we obtain
0 f ( x * ) + g ( x * ) + h T ( x * ) u ^ * + N ( x * ) .
This leads to
d i s t f ( x * ) + g ( x * ) + h T ( x * ) u ^ * + N ( x * ) , 0 = 0 , dist y p ( y * ) + B T u ^ * , 0 = 0 .
Therefore, from (18) and (21), we obtain
d i s t f ( x * ) + g ( x * ) + h T ( x * ) u ^ * + N ( x * ) , 0 + dist y p ( y * ) + B T u ^ * , 0 + h ( x * ) + B y * τ ρ 1 u * .
Next, we consider the convergence of u * depending on τ and ρ , which we can still achieve by setting the initial points and parameters. First, from Proposition 7, we know that
Γ β ( z k + 1 ) Γ β ( z 0 ) ,
and from (17), we obtain
G a m m a β ( z k + 1 ) = f ( x k + 1 ) + g ( x k + 1 ) + p ( y k + 1 ) + ρ 2 h ( x k + 1 ) + B y k + 1 2 + τ 2 ρ u k + 1 2 + β 2 y k y k 1 2 + 1 τ 2 ρ u k + 1 u k 2 + u k + 1 2 u k 2 .
Thus, we have
1 τ 2 ρ u k + 1 2 u k 2 + τ 2 ρ u k + 1 2 Γ β ( z 0 ) .
This is because f ( x k + 1 ) 0 , g ( x k + 1 ) 0 , p ( y k + 1 ) 0 , and the remaining terms are non-negative. Next, we prove by induction that τ 2 ρ u k + 1 2 Γ β ( z 0 ) . For k = 0 , we can appropriately choose the initial point to satisfy the inequality. For k, assume τ 2 ρ u k 2 Γ β ( z 0 ) and consider two cases for the k + 1 -th iteration: if u k + 1 2 u k 2 , we can directly conclude that τ 2 ρ u k + 1 2 τ 2 ρ u k 2 Γ β ( z 0 ) . If u k + 1 2 u k 2 , based on (22), we can again conclude the desired result. Therefore, we can infer that u * 2 2 ρ τ 1 Γ β ( z 0 ) . Clearly, there exists a constant c such that Γ β ( z 0 ) < c . Thus, there exists an ϵ such that τ 2 ρ 2 u * 2 ϵ 2 . Therefore, we conclude the following:
d i s t f ( x * ) + g ( x * ) + h T ( x * ) u ^ * + N ( x * ) , 0 + dist y p ( y * ) + B T u ^ * , 0 + h ( x * ) + B y * τ ρ 1 u * .
   □

4. Discussion

In this study, we tested the algorithm proposed in this paper on problem (3). All tests were conducted using Matlab R2023b on a Macbook Air M2.
We primarily focused on the application of the proposed algorithm to the Logistic Matrix Factorization problem. Problem (3) is written in the same form as problem (1):
min   g ( U , V ) + p ( W ) s . t . U V W = 0 .
where p ( W ) = i , j ( 1 + c y i j y i j ) log ( 1 + exp ( W i j ) ) c y i j W i j , and g ( U , V ) = λ d 2 U F 2 + λ t 2 V F 2 , f = 0 , h = U V . The augmented Lagrangian function for the problem is
L ρ ( U , V , W , u ) = g ( U , V ) + q ( W ) + u , U V W + ρ 2 U V W 2 .
From Equation (6), U is updated as follows:
U k + 1 arg min g ( U k ) + h ( U k ) T ( u k + ρ ( U k V k W k ) ) , U + η 2 U U k 2 = 1 η U k λ d U k + ( V k ) T ( u k + ρ ( U k V k W k ) ) .
Similarly, for V,
V k + 1 arg min g ( V k ) + h ( V k ) T ( u k + ρ ( U k V k W k ) ) , V + η 2 V V k 2 = 1 η V k λ t V k + ( U k ) T ( u k + ρ ( U k V k W k ) ) .
W is updated using the explicit expression in Equation (7):
W k + 1 = ( ρ + θ ) 1 θ W k B T ( u k + ρ ( U k + 1 V k + 1 ) ) q ( W k ) .
Finally, u k + 1 is updated as
u k + 1 = ( 1 τ ) u k + ρ ( U k + 1 V k + 1 W k + 1 ) .
To generate sparse data W R m × n , where each element w i j { 0 , 1 } , we used the Matlab command sprand(m, n, s) and assigned them to W such that W ( W > 0 ) = 1 . In subsequent experiments, for each ( m , n ) { ( 200 , 200 ) , ( 200 , 1000 ) , ( 1000 , 200 ) } , we randomly generated a matrix Y and set s = 0.1 , meaning that 90% of the elements in Y were 0, with parameters r = 100 , c = 1 , λ d = λ t = 1 8 , and ρ = 1 . For each Y, we randomly generated initial points and ran each algorithm with the same initial points and running time. The running time for ( m , n ) = ( 200 , 200 ) was set to 10 s, and for ( m , n ) { ( 200 , 1000 ) , ( 1000 , 200 ) } , the running time was set to 20 s.
We conducted a comparative analysis of the proposed algorithm against two traditional methods: ADMM and the GD alternating gradient multiplier method. The GD alternating gradient multiplier method works by alternately updating matrices U and V using gradient descent steps. In our experiments, we evaluated the performance of the algorithm for different values of the regularization parameter τ , specifically τ = 0.1 , 0.2 , 0.5 , 0.8 .
To assess the performance of each method, we calculated the objective value of problem (3) throughout the iterations, and we present the change in this objective value over time in Figure 1 and Table 1. The experimental results clearly show that the proposed algorithm consistently outperforms both the ADMM and GD alternating gradient multiplier methods on all values tested of τ .
In particular, we observe that for τ = 0.8 , the proposed algorithm achieves the best performance, with the objective value converging more rapidly and reaching a lower final value compared to the other methods. This suggests that the choice of τ plays a crucial role in optimizing the algorithm’s performance. Based on the experimental findings, we conclude that τ = 0.8 provides the optimal balance between efficiency and accuracy, making it the most effective choice for solving the problem at hand. This was also validated in subsequent experiments with real-world data.
Similarly, we applied the proposed algorithm to a dataset containing medulloblastoma matrix data [20] and a real-world biomedical dataset and analyzed several key factors, including the relationship between the training values and the number of iterations, as well as the difference between the training value and the optimal value. This allowed us to evaluate how effectively the algorithm converges and whether it consistently approaches the optimal solution.
In Figure 2, we present the results of this analysis, which clearly highlight the superior performance of the PADMM algorithm in addressing real-world problems, particularly in the context of complex medical data. The graph shows that the proposed algorithm converges quickly and accurately, maintaining a small gap between the training value and the optimal value over time.
From these results, we can further conclude that, similarly to our earlier experiments, the regularization parameter τ = 0.8 delivers the best performance in this real-world scenario as well. The results validate the robustness and efficiency of the proposed algorithm, confirming that it can effectively handle practical, large-scale datasets while providing high accuracy and fast convergence.

5. Conclusions

We present a proximal linear alternating direction multiplier method for solving nonconvex, nonsmooth optimization problems with nonlinear coupling constraints. In this problem, the differentiable parts of the objective and constraints have locally Lipschitz continuous gradients. Our algorithm provides an alternating direction multiplier method for problem (1) that updates the variables using a linearized proximal technique and updates the dual variables through a discounting approach. By fixing the parameter selection, the algorithm does not require an additional proxy function or parameter adjustments to ensure the monotonicity of the generated sequence, and numerical experiments were conducted to demonstrate its effectiveness.

Author Contributions

Conceptualization, R.L.; methodology, R.L.; software, R.L.; validation, R.L.; formal analysis, R.L.; investigation, R.L.; resources, Z.Y.; data curation, R.L.; writing—original draft preparation, R.L.; writing—review and editing, Z.Y.; visualization, R.L.; supervision, Z.Y.; project administration, Z.Y.; funding acquisition, Z.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by National Natural Science Foundation of China (Grant No.: 12371308).

Data Availability Statement

Our data are available at http://nimfa.biolab.si/nimfa.examples.medulloblastoma.html (accessed on 1 December 2024).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Appendix A.1

Proof for Proposition 4.
From (13), we know that
y ϕ ( x k + 1 , y k + 1 , u k ) = B T ( u k + ρ ( h ( x k + 1 ) + B y k + 1 ) ) = B T u ^ k .
By applying the first-order optimality condition to (7), we obtain
0 = B T u ^ k + q ( y k ) + θ ( y k + 1 y k ) ,
which holds for all k 0 . Since the matrix B B T is positive definite, we define λ : = λ min ( B B T ) > 0 . For all u R q , we have
λ 1 B T u 2 = λ 1 B T u , B T u = λ 1 u , B B T u λ 1 u , λ u = u 2 .
For all k 1 ,
u ^ k + 1 u ^ k λ 1 / 2 B T ( u ^ k + 1 u ^ k ) .
Thus,
u ^ k u ^ k 1 2 1 λ q ( y k ) q ( y k 1 ) + θ ( y k + 1 y k ) θ ( y k y k 1 ) 2 , 1 λ q ( y k ) q ( y k 1 ) + θ y k + 1 y k + θ y k y k 1 2 , 1 λ θ y k + 1 y k + ( L q + θ ) y k y k 1 2 .
Additionally,
u ^ k u ^ k 1 2 = u k + 1 u k + τ ( u k u k 1 ) 2 ,
Therefore,
u k + 1 u k 2 u k + 1 u k 2 + τ 2 u k u k 1 2 u ^ k u ^ k 1 2 .

Appendix A.2

Proof for Proposition 5.
First, we compute the difference between L ρ ( x k + 1 , y k + 1 , u k + 1 ) and L ρ ( x k + 1 , y k + 1 , u k ) :
L ρ ( x k + 1 , y k + 1 , u k + 1 ) L ρ ( x k + 1 , y k + 1 , u k ) = u k + 1 u k , h ( x k + 1 ) + B y k + 1 = u k + 1 u k , u k + 1 ( 1 τ ) u k ρ = u k + 1 u k , 1 τ ρ ( u k + 1 u k ) + τ ρ u k + 1 = 1 τ ρ u k + 1 u k 2 + τ 2 ρ u k + 1 u k 2 + u k + 1 2 u k 2 = 2 τ 2 ρ u k + 1 u k 2 + τ 2 ρ u k + 1 2 τ 2 ρ u k 2 .
Next, we compute the difference between L ρ ( x k + 1 , y k + 1 , u k ) and L ρ ( x k + 1 , y k , u k ) :
L ρ ( x k + 1 , y k + 1 , u k ) L ρ ( x k + 1 , y k , u k ) = p ( y k + 1 ) p ( y k ) + B T u k , y k + 1 y k + ρ 2 h ( x k + 1 ) + B y k + 1 2 h ( x k + 1 ) + B y k 2 = p ( y k + 1 ) p ( y k ) + ( u k + 1 + τ u k ) ρ 2 B ( y k + 1 y k ) , y k + 1 y k p ( y k + 1 ) p ( y k ) B T ( u k + 1 + τ u k ) , y k + 1 y k ρ λ min ( B T B ) 2 y k + 1 y k 2 .
Since p has an L p -Lipschitz continuous gradient, we have
p ( y k + 1 ) p ( y k ) p ( y k ) , y k + 1 y k L p 2 y k + 1 y k 2 .
By (16) and λ min ( B T B ) 0 , we obtain
L ρ ( x k + 1 , y k + 1 , u k ) L ρ ( x k + 1 , y k , u k ) p ( y k + 1 ) p ( y k ) p ( y k ) , y k + 1 y k ( 2 θ + ρ λ min ( B T B ) ) 2 y k + 1 y k 2 ( 2 θ L p ) 2 y k + 1 y k 2 .
Using the update step (6) for x, we have
f ( x k + 1 ) + x ϕ ( x k , y k , u k ) , x k + 1 x k + η 2 x k + 1 x k 2 f ( x k ) .
From Proposition 4, we have
L ρ ( x k + 1 , y k , u k ) L ρ ( x k , y k , u k ) = f ( x k + 1 ) f ( x k ) + ϕ ( x k + 1 , y k , u k ) ϕ ( x k , y k , u k ) ϕ ( x k + 1 , y k , u k ) ϕ ( x k , y k , u k ) x ϕ ( x k , y k , u k ) , x k + 1 x k η 2 x k + 1 x k 2 η L C 2 x k + 1 x k 2 .
By adding (A3)–(A5), we obtain
L ρ ( x k + 1 , y k + 1 , u k + 1 ) τ 2 ρ u k + 1 2 L ρ ( x k , y k , u k ) τ 2 ρ u k 2 2 τ 2 ρ u k + 1 u k 2 ( 2 θ L q ) 2 y k + 1 y k 2 η L C 2 x k + 1 x k 2 η L C 2 x k + 1 x k 2 2 θ L q 2 ( 2 τ ) θ 2 ρ λ min ( B B T ) y k + 1 y k 2 + ( 2 τ ) ( θ + L h ) 2 ρ λ min ( B B T ) y k y k 1 2 .

References

  1. Maillard, S.; Roncalli, T.; Teïletche, J. The properties of equally weighted risk contribution portfolios. J. Portf. Manag. 2010, 36, 60. [Google Scholar] [CrossRef]
  2. Duchi, J.C.; Ruan, F. Solving (most) of a set of quadratic equalities: Composite optimization for robust phase retrieval. Inf. Inference J. IMA 2019, 8, 471–529. [Google Scholar] [CrossRef]
  3. Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.; Wang, Z.; Paul Smolley, S. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
  4. Hien, L.T.K.; Papadimitriou, D. An inertial ADMM for a class of nonconvex composite optimization with nonlinear coupling constraints. J. Glob. Optim. 2024, 89, 927–948. [Google Scholar] [CrossRef]
  5. Chen, C.; Chan, R.H.; Ma, S.; Yang, J. Inertial proximal ADMM for linearly constrained separable convex optimization. SIAM J. Imaging Sci. 2015, 8, 2239–2267. [Google Scholar] [CrossRef]
  6. Lin, T.; Ma, S.; Zhang, S. Global convergence of unmodified 3-block ADMM for a class of convex minimization problems. J. Sci. Comput. 2018, 76, 69–88. [Google Scholar] [CrossRef]
  7. Yang, Y.; Jia, Q.S.; Xu, Z.; Guan, X.; Spanos, C.J. Proximal admm for nonconvex and nonsmooth optimization. Automatica 2022, 146, 110551. [Google Scholar] [CrossRef]
  8. Candes, E.J.; Eldar, Y.C.; Strohmer, T.; Voroninski, V. Phase retrieval via matrix completion. SIAM Rev. 2015, 57, 225–251. [Google Scholar] [CrossRef]
  9. Chen, Y.; Wang, S.; Peng, C.; Hua, Z.; Zhou, Y. Generalized nonconvex low-rank tensor approximation for multi-view subspace clustering. IEEE Trans. Image Process. 2021, 30, 4022–4035. [Google Scholar] [CrossRef] [PubMed]
  10. Wang, Y.; Yin, W.; Zeng, J. Global convergence of ADMM in nonconvex nonsmooth optimization. J. Sci. Comput. 2019, 78, 29–63. [Google Scholar] [CrossRef]
  11. Bolte, J.; Sabach, S.; Teboulle, M. Nonconvex Lagrangian-based optimization: Monitoring schemes and global convergence. Math. Oper. Res. 2018, 43, 1210–1232. [Google Scholar] [CrossRef]
  12. Cohen, E.; Hallak, N.; Teboulle, M. A dynamic alternating direction of multipliers for nonconvex minimization with nonlinear functional equality constraints. J. Optim. Theory Appl. 2022, 193, 324–353. [Google Scholar] [CrossRef]
  13. Hien, L.T.K.; Papadimitriou, D. Multiblock ADMM for nonsmooth nonconvex optimization with nonlinear coupling constraints. Optimization 2024, 1–26. [Google Scholar] [CrossRef]
  14. Rockafellar, R.T.; Wets, R.J.B. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 2009; Volume 317. [Google Scholar]
  15. AMordukhovich, B.S. Variational Analysis and Generalized Differentiation II: Applications; Springer: Berlin/Heidelberg, Germany, 2006; Volume 331, p. 610. [Google Scholar]
  16. AMordukhovich, B.S. Variational Analysis and Applications; Springer: Cham, Switzerland, 2018; Volume 30. [Google Scholar]
  17. Cobzaş, Ş.; Miculescu, R.; Nicolae, A. Lipschitz Functions; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
  18. Clarke, F.H. Optimization and Nonsmooth Analysis; Society for Industrial and Applied Mathematics: Philadelphia, PA, USA, 1990. [Google Scholar]
  19. Cai, X.; Han, D.; Yuan, X. On the convergence of the direct extension of ADMM for three-block separable convex minimization models with one strongly convex function. Comput. Optim. Appl. 2017, 66, 39–73. [Google Scholar] [CrossRef]
  20. Brunet, J.P.; Tamayo, P.; Golub, T.R.; Mesirov, J.P. Metagenes and molecular pattern discovery using matrix factorization. Proc. Natl. Acad. Sci. USA 2004, 101, 4164–4169. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Evolution of the mean logarithm of the objective value over time. (a) Mean logarithm of the objective value over time for ( m , n ) = ( 200 , 200 ) . (b) Mean logarithm of the objective value over time for ( m , n ) = ( 1000 , 200 ) . (c) Mean logarithm of the objective value over time for ( m , n ) = ( 200 , 1000 ) .
Figure 1. Evolution of the mean logarithm of the objective value over time. (a) Mean logarithm of the objective value over time for ( m , n ) = ( 200 , 200 ) . (b) Mean logarithm of the objective value over time for ( m , n ) = ( 1000 , 200 ) . (c) Mean logarithm of the objective value over time for ( m , n ) = ( 200 , 1000 ) .
Mathematics 13 00407 g001aMathematics 13 00407 g001b
Figure 2. Evolution of the mean logarithm of the objective value over time for medulloblastoma data.
Figure 2. Evolution of the mean logarithm of the objective value over time for medulloblastoma data.
Mathematics 13 00407 g002
Table 1. Objective function value.
Table 1. Objective function value.
Algorithm/(m,n)(200,200)(1000,200)(200,1000)
ADMM7.4626 × 1026.2803 × 1036.3880 × 103
PADMM ( τ = 0.1 )3.2967 × 1024.8188 × 1034.7594 × 103
PADMM ( τ = 0.2 )3.6218 × 1025.1283 × 1034.6801 × 103
PADMM ( τ = 0.5 )3.4799 × 1024.7661 × 1034.4719 × 103
PADMM ( τ = 0.8 )3.6689 × 1034.7492 × 1034.2894 × 103
GD4.5551 × 1025.5551 × 1035.4918 × 103
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Luo, R.; Yu, Z. The Proximal Alternating Direction Method of Multipliers for a Class of Nonlinear Constrained Optimization Problems. Mathematics 2025, 13, 407. https://doi.org/10.3390/math13030407

AMA Style

Luo R, Yu Z. The Proximal Alternating Direction Method of Multipliers for a Class of Nonlinear Constrained Optimization Problems. Mathematics. 2025; 13(3):407. https://doi.org/10.3390/math13030407

Chicago/Turabian Style

Luo, Ruiling, and Zhensheng Yu. 2025. "The Proximal Alternating Direction Method of Multipliers for a Class of Nonlinear Constrained Optimization Problems" Mathematics 13, no. 3: 407. https://doi.org/10.3390/math13030407

APA Style

Luo, R., & Yu, Z. (2025). The Proximal Alternating Direction Method of Multipliers for a Class of Nonlinear Constrained Optimization Problems. Mathematics, 13(3), 407. https://doi.org/10.3390/math13030407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop