Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem

Zhang, Yule; Wu, Jia; Zhang, Jihong; Liu, Haoyang

doi:10.3390/math13121946

Open AccessEditor’s ChoiceArticle

Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem

¹

School of Science, Dalian Maritime University, Dalian 116085, China

²

Institute of Operations Research and Control Theory, School of Mathematical Sciences, Dalian University of Technology, Dalian 116024, China

³

School of Science, Shenyang Ligong University, Shenyang 110159, China

⁴

School of Finance, Dongbei University of Finance and Economics, Dalian 116025, China

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(12), 1946; https://doi.org/10.3390/math13121946

Submission received: 9 May 2025 / Revised: 7 June 2025 / Accepted: 9 June 2025 / Published: 11 June 2025

(This article belongs to the Section D: Statistics and Operational Research)

Download Versions Notes

Abstract

The convergence rate of the augmented Lagrangian method (ALM) for solving the nonlinear semidefinite optimization problem is studied. Under the Jacobian uniqueness conditions, when a multiplier vector

(π, Y)

and the penalty parameter

σ

are chosen such that

σ

is larger than a threshold

σ^{*} > 0

and the ratio

∥ (π, Y) - (π^{*}, Y^{*}) ∥ / σ

is small enough, it is demonstrated that the convergence rate of the augmented Lagrange method is linear with respect to

∥ (π, Y) - (π^{*}, Y^{*}) ∥

and the ratio constant is proportional to

1 / σ

, where

(π^{*}, Y^{*})

is the multiplier corresponding to a local minimizer. Furthermore, by analyzing the second-order derivative of the perturbation function of the nonlinear semidefinite optimization problem, we characterize the rate constant of local linear convergence of the sequence of Lagrange multiplier vectors produced by the augmented Lagrange method. This characterization shows that the sequence of Lagrange multiplier vectors has a Q-linear convergence rate when the sequence of penalty parameters

{σ_{k}}

has an upper bound and the convergence rate is superlinear when

{σ_{k}}

is increasing to infinity.

Keywords:

nonlinear semidefinite optimization problem; Jacobian uniqueness conditions; augmented Lagrange method; convergence rate; perturbation function

MSC:

90C30

1. Introduction

Consider the following nonlinear semidefinite optimization problem

\begin{matrix} min & f (x) \\ s . t . & h (x) = 0, \\ g (x) ⪯ 0, \end{matrix}

(1)

where function

f : R^{n} \to R

, mapping

h : R^{n} \to R^{q}

and mapping

g : R^{n} \to S^{p}

are assumed to be twice continuously differentiable in a neighborhood of a given feasible point

\bar{x} \in R^{n}

, where

S^{p}

is the space of

p \times p

real symmetric matrices.

The augmented Lagrange method was initiated by Powell [1] and Hestenes [2] for solving equality constrained nonlinear programming problems and was extended by Rockafellar [3] to inequality constrained optimization problems. For convex programming, Rockafellar [3,4] adopted the augmented Lagrange function for establishing a saddle point theorem and demonstrated the global convergence of the augmented Lagrange method when the penalty parameter is chosen as an arbitrary positive number. Rockafellar [5] gave a deep study of the augmented Lagrange method for convex optimization.

The study of local convergence properties for the augmented Lagrange method is fairly comprehensive. For the optimization problems with equality constraints, Powell [1] proved the local linear convergence of the augmented Lagrange method to a local minimum point when the second-order sufficient condition and the linear independence constraint qualification are satisfied. This result was extended by Bertsekas ([6], Chapter 3) to the optimization problem with inequality constraints under the strict complementarity condition, in which the linear rate constant is proportional to

1 / σ

. If the strict complementarity condition is not satisfied, Ito and Kunisch [7], Conn et al. [8] and Contesse-Becker [9] proved that the augmented Lagrange method has linear convergence rate.

The Lagrange function for Problem (1) can be written as

L (x, π, Y) = f (x) + π^{T} h (x) + 〈 Y, g (x) 〉 where (x, π, Y) \in R^{n} \times R^{q} \times S^{p} .

(2)

The augmented Lagrange function for (1) is defined by

L_{σ} (x, π, Y) = f (x) + 〈 π, h (x) 〉 + \frac{σ}{2} {∥ h (x) ∥}^{2} + \frac{1}{2 σ} [∥ Π_{S_{+}^{p}} {(Y + σ g (x)) ∥}_{F}^{2} - {∥ Y ∥}_{F}^{2}],

(3)

where

Π_{S_{+}^{p}}

denotes the projection operator to

S_{+}^{p}

. The augmented Lagrange method for Problem (1) can be expressed in the following form:

Step 0: Given parameter $σ_{0} > 0$ , initial point $x^{0} \in R^{n}$ , initial multiplier $(π^{0}, Y^{0}) \in R^{q} \times S_{+}^{p}$ and $k = 0$ .
Step 1: If

$∥ \nabla_{x} L (x^{k}, π^{k}, Y^{k}) ∥ + ∥ Y^{k} - Π_{S_{+}^{p}} (Y^{k} + g (x^{k})) ∥ + ∥ h (x^{k}) ∥ = 0,$

then stop and $(x^{k}, π^{k}, Y^{k})$ is a Karush–Kuhn–Tucker (KKT) pair.
Step 2: Solve the following problem

$x^{k + 1} \in argmin L_{σ_{k}} (x, π^{k}, Y^{k})$

and calculate

$Y^{k + 1} = Π_{S_{+}^{p}} (Y^{k} + σ_{k} g (x^{k + 1})) and π^{k + 1} = π^{k} + σ_{k} h (x^{k + 1}) .$
Step 3: Update $σ_{k + 1}$ , set $k + 1$ to k, and go to Step 1.

For nonlinear semidefinite optimization problem, in the appendix of Sun et al. [10], they used a direct way to derive the linear rate of convergence when the strict complementarity condition holds. However, this result on the rate of convergence of the augmented Lagrange method obtained in [10] has the possibility for improvement. For example, can we obtain a similar result to ([6], Chapter 3) for equality constrained optimization problems when

∥ (π^{0}, Y^{0}) - (π^{*}, Y^{*}) ∥ / σ

is very small? How can we characterize the rate constant of the local linear convergence of the augmented Lagrangian method? In this paper, we will give positive answers to these two questions.

It should be noted that there are a lot of important applications for augmented Lagrangian methods in different types of optimization problems; for examples, see [11,12,13].

The paper is organized as follows. In the next section, we develop properties of the augmented Lagrange function under the Jacobian uniqueness conditions for the semidefinite optimization problem (1), which will be required to prove results on the convergence rate of the augmented Lagrange method. In Section 3, we demonstrate the linear rate of convergence of the augmented Lagrangian method for the semidefinite optimization problem when the Jacobian uniqueness conditions are satisfied. In Section 4, we establish the asymptotical convergence rate of Lagrange multipliers, which shows that the sequence of Lagrange multiplier vectors produced by the augmented Lagrange method is convergent to the optimal Lagrange multiplier superlinearly when the sequence

{σ_{k}}

is increasing to ∞. Finally, we draw a conclusion in Section 5.

We list two technical results at the end of this section, which will be used in developing properties of the augmented Lagrange function for proving the main theorem about the convergence rate of the ALM. The first technical result is a variant of [14] and the second result is an implicit function theorem from page 12 of Bertsekas [6].

Lemma 1.

Let X and Y be two finite dimensional Hilbert spaces and

ψ : X \to R

be continuous and positive homogeneous of degree 2, namely

ψ (t w) = t^{2} ψ (w) \forall t \geq 0 a n d w \in X .

Suppose that there exists a real number

κ_{0} > 0

such that

ψ (w) \geq κ_{0} {∥ w ∥}^{2}

for any w satisfying

L w = 0

, where

L : X \mapsto Y

is a linear mapping. Then, there exist positive real numbers

\underset{̲}{κ} \in (0, κ_{0}]

and

c_{0} > 0

such that

ψ (w) + c_{0} 〈 L w, L w 〉 \geq \underset{̲}{κ} 〈 w, w 〉 \forall w \in X .

Lemma 2.

Assume that

O

be an open subset of

R^{p + q}

, Σ be a nonempty compact subset of

R^{p}

, and

g : O \to R^{q}

be a mapping and

g \in C^{κ}

on

O

for some

κ \geq 0

. Assume that

J_{w} g (z, w)

exists, and it is continuous on

O

. Assume that

\bar{w} \in R^{q}

is a vector such that

g (\bar{z}, \bar{w}) = 0

for

(\bar{z}, \bar{w}) \in Σ

, and the Jacobian

J_{w} g (\bar{z}, \bar{w})

is nonsingular for all

\bar{z} \in Σ

. Then, there exist scalars

ε > 0

,

δ > 0

and a mapping

ψ : B (Σ, ε) \to B (\bar{w}, δ)

such that

ψ \in C^{κ}

on

B (Σ, ε)

,

\bar{w} = ψ (\bar{z})

for all

\bar{z} \in Σ

, and

g [z, ψ (z)] = 0

for all

z \in B (Σ, ε)

. The mapping ψ is unique in the sense that if

g (z, w) = 0

,

z \in B (Σ, ε)

and

w \in B (\bar{w}, δ)

, then

w = ψ (z)

. Furthermore, if

κ \geq 1

, then we have

J ψ (z) = - J_{w} g {(z, ψ (z))}^{- 1} J_{z} g (z, ψ (z)), \forall z \in B (Σ, ε) .

2. Properties of the Augmented Lagrangian

Assume that

\bar{x}

is a given feasible point of Problem (1) and f, h and g are twice differentiable in a neighborhood of

\bar{x}

. The following conditions, which are called Jacobian uniqueness conditions, are needed in our analysis.

Definition 1.

Jacobian uniqueness conditions at

(\bar{x}, \bar{π}, \bar{Y}) \in R^{n} \times R^{q} \times S^{p}

are the following conditions:

(i): The point $(\bar{x}, \bar{π}, \bar{Y})$ satisfies the Karush–Kuhn–Tucker condtions:

$\nabla_{x} L (\bar{x}, \bar{π}, \bar{Y}) = 0, h (\bar{x}) = 0, 0 ⪯ \bar{Y} ⊥ g (\bar{x}) ⪯ 0 .$
(ii): The constraint nondegeneracy condition is satisfied at $\bar{x}$ :

$(\begin{matrix} R^{q} \\ S^{p} \end{matrix}) = (\begin{matrix} J h (\bar{x})} \\ D g (\bar{x}) \end{matrix}) R^{n} + (\begin{matrix} {0} \\ lin (T_{S_{-}^{p}} (g (\bar{x})) \end{matrix}),$

where $lin (K)$ denotes the linearity space of a closed convex cone.
(iii): The strict complementarity condition at $(\bar{x}, \bar{Y})$ holds, namely $\bar{Y} - g (\bar{x}) ≻ 0$ .
(iv): At $(\bar{x}, \bar{π}, \bar{Y})$ , the second-order sufficiency optimality conditions holds, namely for any $d \neq 0$ satisfying $d \in C (\bar{x})$ ,

$0 < d^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d - 2 〈 \bar{Y}, D g (\bar{x}) d {[g (\bar{x})]}^{†} D g (\bar{x}) d 〉,$

where ${[g (\bar{x})]}^{†}$ is the Moore–Penrose pseudoinverse of $g (\bar{x})]$ and $C (\bar{x})$ is the critical cone at $\bar{x}$ defined by

$C (\bar{x}) = {d \in R^{n} : \nabla f {(\bar{x})}^{T} d \leq 0, J h (\bar{x}) d = 0, D g (\bar{x}) d \in T_{S_{-}^{p}} (g (\bar{x}))} .$

In this section, we will give some properties of the Jacobian uniqueness conditions of Problem (1) and properties of the augmented Lagrange function under this set of conditions. These properties are crucial for studying the convergence rate of augmented Lagrange method.

Let

(\bar{x}, \bar{π}, \bar{Y})

be a KKT pair. Assume that (iii) holds; then,

g (\bar{x}) + \bar{Y}

is nonsingular. Let the eigenvalues of

g (\bar{x}) + \bar{Y}

be

λ_{1}, \dots, λ_{p}

and

γ = {i : λ_{i} < 0}, α = {i : λ_{i} > 0} .

Then, an orthogonal matrix

P \in R^{p}

exists such that

g (\bar{x}) + \bar{Y} = P Λ P^{T}, \bar{Y} = P [\begin{matrix} Λ_{α} \\ 0 \end{matrix}] P^{T} = P_{α} Λ_{α} P_{α}^{T}, g (\bar{x}) = P [\begin{matrix} 0 \\ Λ_{γ} \end{matrix}] P^{T} = P_{γ} Λ_{γ} P_{γ}^{T},

where

Λ_{α} = Diag (λ_{i} : i \in α), Λ_{γ} = Diag (λ_{i} : i \in γ), P_{α} = (p_{i} : i \in α), P_{γ} = (p_{i} : i \in γ) .

If Jacobian uniqueness conditions (i)–(iii) hold, then the cone

C (\bar{x})

is reduced to the following subspace

C (\bar{x}) = {d \in R^{n} : P_{α}^{T} D g (\bar{x}) d P_{α} = 0, J h (\bar{x}) d = 0} .

If Jacobian uniqueness condition (iv) holds, then there exists

β_{0} > 0

such that

β_{0} {∥ d ∥}^{2} \leq d^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d - 2 〈 \bar{Y}, D g (\bar{x}) d {[g (\bar{x})]}^{†} D g (\bar{x}) d 〉, \forall d \in C (\bar{x}) .

(4)

In fact, if this is not true, then a sequence

d^{j} \in C (\bar{x})

with

∥ d^{j} ∥ = 1

exists such that

\frac{1}{j} = \frac{1}{j} {∥ d^{j} ∥}^{2} \geq 〈 d^{j}, \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d^{j} 〉 - 2 〈 \bar{Y}, D g (\bar{x}) d^{j} {[g (\bar{x})]}^{†} D g (\bar{x}) d^{j} 〉 .

(5)

There exists a subsequence

j_{k}

and

d \in R^{n}

with

∥ d ∥ = 1

such that

d^{j_{k}} \to d

. The closedness of

C (\bar{x})

implies

d \in C (\bar{x})

. Taking the limit of (5) along the subsequence

j_{k}

, we obtain

\begin{matrix} 〈 d^{j_{k}}, \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d^{j_{k}} 〉 - 2 〈 \bar{Y}, D g (\bar{x}) d^{j_{k}} {[g (\bar{x})]}^{†} D g (\bar{x}) d^{j_{k}} 〉 \\ \to d^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d - 2 〈 \bar{Y}, D g (\bar{x}) d {[g (\bar{x})]}^{†} D g (\bar{x}) d 〉 \leq 0, \end{matrix}

which contradicts Jacobian uniqueness condition (iv).

Define

Φ (x, π, Y) = [\begin{matrix} \nabla_{x} L (x, π, Y) \\ h (x) \\ Π_{S_{+}^{p}} (Y + g (x)) - Y \end{matrix}] .

(6)

Then, the Jacobian of

Φ (x, π, Y)

, denoted by

M_{0} (x, π, Y)

, is expressed as

M_{0} (x, π, Y) = [\begin{matrix} \nabla_{x x}^{2} L (x, π, Y) & J h {(x)}^{T} & D g {(x)}^{*} \\ J h (x) & 0 & 0 \\ D Π_{S_{+}^{p}} (Y + g (x)) D g (x) & 0 & - I + D Π_{S_{+}^{p}} (Y + g (x)) \end{matrix}] .

(7)

Lemma 3.

Let

\bar{x} \in R^{n}

be a given point and f, h and g be twice continuously differentiable in a neighborhood of

\bar{x}

. Let the Jacobian uniqueness conditions at

(\bar{x}, \bar{π}, \bar{Y}) \in R^{n} \times R^{q} \times S^{p}

are satisfied. Then,

M_{0} (\bar{x}, \bar{π}, \bar{Y})

is a nonsingular linear operator.

Proof.

Consider the equation

M_{0} (\bar{x}, \bar{π}, \bar{Y}) (\begin{matrix} d_{x} \\ d_{π} \\ d_{Y} \end{matrix}) = 0,

where

d_{x} \in R^{n}, d_{π} \in R^{q}, d_{Y} \in S^{p}

. This equation is equivalent to

\begin{matrix} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} + J h {(\bar{x})}^{T} d_{π} + D g {(\bar{x})}^{*} d_{Y} = 0, \\ J h (\bar{x}) d_{x} = 0, \\ D Π_{S_{+}^{p}} (\bar{Y} + g (\bar{x})) D g (\bar{x}) d_{x} - d_{Y} + D Π_{S_{+}^{p}} (\bar{Y} + g (\bar{x})) d_{Y} = 0 . \end{matrix}

(8)

From the third equality of (8) we have for

H = D g (\bar{x}) d_{x}

that

P [\begin{matrix} 0 & {\tilde{Ω}}_{α, γ} \circ P_{α}^{T} d_{Y} P_{γ} \\ {\tilde{Ω}}_{α, γ}^{T} \circ P_{γ}^{T} d_{Y} P_{α} & P_{γ}^{T} d_{Y} P_{γ} \end{matrix}] P^{T} = P [\begin{matrix} P_{α}^{T} H P_{α} & Ω_{α, γ} \circ P_{α}^{T} H P_{γ} \\ Ω_{α, γ}^{T} \circ P_{γ}^{T} H P_{α} & 0 \end{matrix}] P^{T},

where

{\tilde{Ω}}_{i j} = \frac{| λ_{j} |}{λ_{i} + | λ_{j} |}, i \in α, j \in γ, Ω_{i j} = \frac{λ_{i}}{λ_{i} + | λ_{j} |} .

This implies the following relations

\begin{matrix} P_{α}^{T} H P_{α} = 0, P_{γ}^{T} d_{Y} P_{γ} = 0, \\ P_{α}^{T} d_{Y} P_{γ} = Θ_{α, γ} \circ P_{α}^{T} H P_{γ}, Θ_{i j} = - \frac{λ_{i}}{λ_{j}}, i \in α, j \in γ . \end{matrix}

(9)

From

J h (\bar{x}) d_{x} = 0

and

P_{α}^{T} H P_{α} = 0

, we have

d_{x} \in C (\bar{x})

. Multiplying

d_{x}^{T}

to the first equality of (8) we obtain

\begin{matrix} 0 & = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}) d_{x} + d_{x}^{T} J h {(\bar{x})}^{T} d_{π} + 〈 D g (\bar{x}) d_{x}, d_{Y} 〉 \\ = 〈 P^{T} D g (\bar{x}) d_{x} P, P^{T} d_{Y} P 〉 + d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} \\ = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} + 2 〈 P_{α}^{T} D g (\bar{x}) d_{x} P_{γ}, P_{α}^{T} d_{Y} P_{γ} 〉 \\ = 2 〈 P_{α}^{T} D g (\bar{x}) d_{x} P_{γ}, Θ_{α, γ} \circ P_{α}^{T} D g (\bar{x}) d_{x} P_{γ} 〉 + d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} \\ = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} - 2 〈 \bar{Y}, D g (\bar{x}) d_{x} {[g (\bar{x})]}^{†} D g (\bar{x}) d_{x} 〉, \end{matrix}

which implies

d_{x} = 0

from Jacobian uniqueness condition (iv). This comes from the fact that

d_{x} \neq 0

implies

d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} - 2 〈 \bar{Y}, D g (\bar{x}) d_{x} {[g (\bar{x})]}^{†} D g (\bar{x}) d_{x} 〉 > 0

from Jacobian uniqueness condition (iv). Then, from the first equality of (8) we obtain

J h {(\bar{x})}^{T} d_{π} + D g {(\bar{x})}^{*} d_{Y} = 0,

which is equivalent to

(\begin{matrix} 〈 \frac{\partial g}{\partial x_{1}} (\bar{x}), d_{Y} 〉 \\ ⋮ \\ 〈 \frac{\partial g}{\partial x_{n}} (\bar{x}), d_{Y} 〉 \end{matrix}) + \sum_{j = 1}^{q} {[d_{π}]}_{j} \nabla h_{j} (\bar{x}) = 0 .

This, from

P_{γ}^{T} d_{Y} P_{γ} = 0

, implies

\sum_{j = 1}^{q} {[d_{π}]}_{j} \nabla h_{j} (\bar{x}) + (\begin{matrix} 〈 P_{α}^{T} \frac{\partial g}{\partial x_{1}} (\bar{x}) P_{α}, P_{α}^{T} d_{Y} P_{α} 〉 \\ ⋮ \\ 〈 P_{α}^{T} \frac{\partial g}{\partial x_{n}} (\bar{x}) P_{α}, P_{α}^{T} d_{Y} P_{α} 〉 \end{matrix}) + 2 (\begin{matrix} 〈 P_{α}^{T} \frac{\partial g}{\partial x_{1}} (\bar{x}) P_{γ}, P_{α}^{T} d_{Y} P_{γ} 〉 \\ ⋮ \\ 〈 P_{α}^{T} \frac{\partial g}{\partial x_{n}} (\bar{x}) P_{γ}, P_{α}^{T} d_{Y} P_{γ} 〉 \end{matrix}) = 0 .

From Jacobian uniqueness condition (ii) we obtain

d_{π} = 0 and P_{α}^{T} d_{Y} P_{α \cup γ} = 0,

and this implies

d_{π} = 0

and

d_{Y} = 0

. Combining

d_{x} = 0

, we obtain that

M_{0} (\bar{x}, \bar{π}, \bar{Y})

is a nonsingular linear operator. □

Proposition 1.

Let

(\bar{x}, \bar{π}, \bar{Y})

be a Karush–Kuhn–Tucker point of Problem (1) and the Jacobian uniqueness conditions are satisfied at

(\bar{x}, \bar{π}, \bar{Y})

. Then there exist positive numbers

δ_{0} > 0

and

σ_{0}^{*} > 0

such that

\nabla_{x x}^{2} L_{σ} (x, π, Y)

is positively definite when

σ \geq σ_{0}^{*}

and

(x, π, Y) \in B_{δ_{0}} (\bar{x}, \bar{π}, \bar{Y})

.

Proof.

It is easy to obtain

\nabla_{x} L_{σ} (x, π, Y) = \nabla f (x) + D g {(x)}^{*} Π_{S_{+}^{p}} (Y + σ g (x)) + J h {(x)}^{T} (π + σ h (x)) .

If

Y + σ g (x)

is nonsingular, then

Π_{S_{+}^{p}}

is differentiable at

Y + σ g (x)

and

\nabla_{x x}^{2} L_{σ} (x, π, Y) = \nabla_{x x}^{2} L (x, π + σ h (x), Π_{S_{+}^{p}} (Y + σ g (x))) + σ D g {(x)}^{*} D Π_{S_{+}^{p}} (Y + σ g (x)) D g (x) + σ J h {(x)}^{T} J h (x) .

Define

{\bar{σ}}_{0}^{*} = \frac{4}{η_{0}^{2} β_{0}} \sum_{i \in α, j \in γ} \sum_{k = 1}^{n} {(p_{i}^{T} \frac{\partial g (\bar{x})}{\partial x_{k}} p_{j})}^{2}, η_{0} = min \{\frac{| λ_{j} |}{λ_{i}} : i \in α, j \in γ\} .

Then, from (4), we obtain for any vector

d \in R^{n}

,

\begin{matrix} d^{T} \nabla_{x x}^{2} L_{σ} (\bar{x}, \bar{π}, \bar{Y}) d = d^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y})) d + σ ∥ J h (\bar{x}) {d ∥}^{2} + σ 〈 D g (\bar{x}) d, D Π_{S_{+}^{p}} (\bar{Y} + σ g (\bar{x})) D g (\bar{x}) d 〉 \\ = σ ∥ J h (\bar{x}) {d ∥}^{2} + σ {∥ P_{α}^{T} D g (\bar{x}) d P_{α} ∥}^{2} + \{d^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d - 2 〈 \bar{Y}, D g (\bar{x}) d {[g (\bar{x})]}^{†} D g (\bar{x}) d 〉\} \\ + 2 \sum_{i \in α, j \in γ} (\frac{σ λ_{i}}{λ_{i} + σ | λ_{j} |} + \frac{λ_{i}}{λ_{j}}) {(p_{i}^{T} D g (\bar{x}) d p_{j})}^{2} \\ \geq β_{0} {∥ d ∥}^{2} + \sum_{i \in α, j \in γ} (\frac{- 2 λ_{i}^{2}}{(λ_{i} + σ | λ_{j} |) | λ_{j} |}) {(p_{i}^{T} D g (\bar{x}) d p_{j})}^{2} + σ ∥ J h (\bar{x}) {d ∥}^{2} + σ {∥ P_{α}^{T} D g (\bar{x}) d P_{α} ∥}^{2} \\ \geq β_{0} {∥ d ∥}^{2} - \frac{2}{η_{0} (1 + σ η_{0})} {∥ d ∥}^{2} \sum_{i \in α, j \in γ} \sum_{k = 1}^{n} {(p_{i}^{T} \frac{\partial g (\bar{x})}{\partial x_{k}} p_{j})}^{2} + σ ∥ J h (\bar{x}) {d ∥}^{2} + σ {∥ P_{α}^{T} D g (\bar{x}) d P_{α} ∥}^{2} \\ \geq \frac{β_{0}}{2} {∥ d ∥}^{2} + σ ∥ J h (\bar{x}) {d ∥}^{2} + σ {∥ P_{α}^{T} D g (\bar{x}) d P_{α} ∥}^{2} . \end{matrix}

This implies for any vector

d \in C (\bar{x})

that

d^{T} \nabla_{x x}^{2} L_{σ} (\bar{x}, \bar{π}, \bar{Y}) d \geq \frac{β_{0}}{2} {∥ d ∥}^{2} .

It follows from (ii) that the linear mapping

d \to (J h (\bar{x}) d, P_{α}^{T} D g (\bar{x}) d P_{α})

is onto. Then, we have from Lemma 1 that there exists

σ_{0}^{*} > {\bar{σ}}_{0}^{*}

such that

\nabla_{x x}^{2} L_{σ} (\bar{x}, \bar{π}, \bar{Y})

is positively definite if

σ \geq σ_{0}^{*}

. Therefore, there exists a positive real number

δ_{0} > 0

such that

\nabla_{x x}^{2} L_{σ} (x, π, Y)

is positively definite if

(x, π, Y) \in B_{δ_{0}} (\bar{x}, \bar{π}, \bar{Y})

and

σ \geq σ_{0}^{*}

. □

Suppose that

Z (x, Y, t) = g (x) + (t + 1) Y

is nonsingular such that

Π_{S_{+}^{p}}

is differentiable at

Z (x, Y, t)

. In this case, we define a linear operator:

M (x, π, Y, t) = [\begin{matrix} \nabla_{x x}^{2} L (x, π, Y) & J h {(x)}^{T} & D g {(x)}^{*} \\ J h (x) & - t I_{q} & 0 \\ D Π_{S_{+}^{p}} (Z (x, Y, t)) D g (x) & 0 & - (t + 1) I + D Π_{S_{+}^{p}} (Z (x, Y, t)) \end{matrix}] .

(10)

Proposition 2.

Let

(\bar{x}, \bar{π}, \bar{Y})

be a Karush–Kuhn–Tucker point of Problem (1) and the Jacobian uniqueness conditions are satisfied at

(\bar{x}, \bar{π}, \bar{Y})

. Then, there exists a positive real number

σ_{1}^{*} > σ_{0}^{*}

large enough, such that

M (\bar{x}, \bar{π}, \bar{Y}, t)

is nonsingular and

∥ M {(\bar{x}, \bar{π}, \bar{Y}, t)}^{- 1} ∥ \leq β_{0}

for some positive constant

β_{0} > 0

if

t \in [0, t_{1}^{*}]

, where

t_{1}^{*} = {[σ_{1}^{*}]}^{- 1}

.

Proof.

We divide the proof into two steps.

Step 1:: we prove that for $t > 0$ sufficiently small $M (\bar{x}, \bar{π}, \bar{Y}, t)$ is nonsingular.

Since

M (\bar{x}, \bar{π}, \bar{Y}, 0) = M_{0} (\bar{x}, \bar{π}, \bar{Y})

we have from Lemma 3 that

M (\bar{x}, \bar{π}, \bar{Y}, 0)

is nonsingular. Now, we consider the case where

t > 0

. Consider the equation

M (\bar{x}, \bar{π}, \bar{Y}, t) (\begin{matrix} d_{x} \\ d_{π} \\ d_{Y} \end{matrix}) = 0,

where

d_{x} \in R^{n}, d_{π} \in R^{q}, d_{Y} \in S^{p}

. This equation is equivalent to

\begin{matrix} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} + J h {(\bar{x})}^{T} d_{π} + D g {(\bar{x})}^{*} d_{Y} = 0, \\ J h (\bar{x}) d_{x} - t d_{π} = 0, \\ D Π_{S_{+}^{p}} ((1 + t) \bar{Y} + g (\bar{x})) D g (\bar{x}) d_{x} - (1 + t) d_{Y} + D Π_{S_{+}^{p}} ((1 + t) \bar{Y} + g (\bar{x})) d_{Y} = 0 . \end{matrix}

(11)

From the second equality of (11), we have

d_{π} = t^{- 1} J h (\bar{x}) d_{x} .

From the third equality of (11), we have for

H = D g (\bar{x}) d_{x}

that

P [\begin{matrix} t P_{α}^{T} d_{Y} P_{α} & {\tilde{Ω}}_{α, γ} (t) \circ P_{α}^{T} d_{Y} P_{γ} \\ {\tilde{Ω}}_{α, γ} {(t)}^{T} \circ P_{γ}^{T} d_{Y} P_{α} & (1 + t) P_{γ}^{T} d_{Y} P_{γ} \end{matrix}] P^{T} = P [\begin{matrix} P_{α}^{T} H P_{α} & Ω_{α, γ} (t) \circ P_{α}^{T} H P_{γ} \\ Ω_{α, γ} {(t)}^{T} \circ P_{γ}^{T} H P_{α} & 0 \end{matrix}] P^{T},

where

Ω_{i j} (t) = \frac{(1 + t) λ_{i}}{(1 + t) λ_{i} + | λ_{j} |}, {\tilde{Ω}}_{i j} (t) = \frac{(1 + t) [t λ_{i} + | λ_{j} |]}{(1 + t) λ_{i} + | λ_{j} |}, i \in α, j \in γ .

This implies the following relations

\begin{matrix} P_{α}^{T} d_{Y} P_{α} = t^{- 1} P_{α}^{T} H P_{α}, P_{γ}^{T} d_{Y} P_{γ} = 0, \\ P_{α}^{T} d_{Y} P_{γ} = Θ_{α, γ} (t) \circ P_{α}^{T} H P_{γ}, Θ_{i j} (t) = \frac{λ_{i}}{t λ_{i} + | λ_{j} |}, i \in α, j \in γ . \end{matrix}

(12)

Then, multiplying

d_{x}^{T}

to the first equality of (11), we obtain

\begin{matrix} 0 & = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} + d_{x}^{T} J h {(\bar{x})}^{T} d_{π} + 〈 D g (\bar{x}) d_{x}, d_{x} 〉 \\ = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} + t^{- 1} ∥ J h (\bar{x}) d_{x} ∥^{2} + t^{- 1} {∥ P_{α}^{T} H P_{α} ∥}^{2} + 2 \sum_{i \in α, j \in γ} Θ_{i j} (t) H_{i j}^{2} \\ = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} - 2 \sum_{i \in α, j \in γ} \frac{λ_{i}}{λ_{j}} H_{i j}^{2} + t^{- 1} {∥ J h (\bar{x}) d_{x} ∥}^{2} \\ + 2 \sum_{i \in α, j \in γ} [Θ_{i j} (t) + \frac{λ_{i}}{λ_{j}}] H_{i j}^{2} + t^{- 1} {∥ P_{α}^{T} H P_{α} ∥}^{2} \\ = d_{x}^{T} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) d_{x} - 2 〈 Y^{*}, D g (\bar{x}) d_{x} {[g (\bar{x})]}^{†} D g (\bar{x}) d_{x} 〉 \\ + t^{- 1} ∥ J h (\bar{x}) d_{x} ∥^{2} + 2 \sum_{i \in α, j \in γ} [\frac{t λ_{i}^{2}}{t λ_{i} λ_{j} - λ_{j}^{2}}] H_{i j}^{2} + t^{- 1} {∥ P_{α}^{T} H P_{α} ∥}^{2} \\ = d_{x}^{T} \nabla_{x x}^{2} L_{t^{- 1}} (\bar{x}, \bar{π}, \bar{Y}) d_{x} . \end{matrix}

which implies

d_{x} = 0

from Proposition 1 when

t < {[σ_{0}^{*}]}^{- 1}

. Therefore, we obtain

d_{x} = 0

,

d_{π} = 0

, and

d_{Y} = 0

so that

M (\bar{x}, \bar{π}, \bar{Y}, t)

is nonsingular when

t < {[σ_{0}^{*}]}^{- 1}

.

Step 2:: We prove that

$∥ M {(\bar{x}, \bar{π}, \bar{Y}, t)}^{- 1} ∥ \leq β_{0}$

for some positive constant $β_{0} > 0$ if $t \geq 0$ small enough.

Noting, for

\bar{Z} = \bar{Y} + g (\bar{x})

we have

Z (\bar{x}, \bar{Y}, t) = t \bar{Y} + \bar{Z}

and

Z (\bar{x}, \bar{Y}, 0) = \bar{Z}

. Therefore, we get

\begin{matrix} M (\bar{x}, \bar{π}, \bar{Y}, t) - M (\bar{x}, \bar{π}, \bar{Y}, 0) \\ = [\begin{matrix} 0 & 0 & 0 \\ 0 & - t I_{q} & 0 \\ [D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] D g (\bar{x}) & 0 & - t I + [D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] . \end{matrix}] \end{matrix}

(13)

For any

(d_{x}, d_{π}, d_{Y}) \in R^{n} \times R^{q} \times S^{p}

we have that

[M (\bar{x}, \bar{π}, \bar{Y}, t) - M (\bar{x}, \bar{π}, \bar{Y}, 0)] (\begin{matrix} d_{x} \\ d_{π} \\ d_{Y} \end{matrix}) = [\begin{matrix} 0 \\ - t d_{π} \\ [D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] [D g (\bar{x}) d_{x} + d_{Y}] - t d_{Y} \end{matrix}] .

(14)

For any matrix

H \in S^{p}

, we have

[D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] H = P [\begin{matrix} 0 & {\bar{Ω}}_{α, γ} (t) \circ P_{α}^{T} H P_{γ} \\ {\bar{Ω}}_{α, γ} {(t)}^{T} \circ P_{γ}^{T} H P_{α} & 0 \end{matrix}] P^{T},

where

{\bar{Ω}}_{i j} (t) = \frac{t λ_{i} | λ_{j} |}{(λ_{i} + | λ_{j} |) ((t + 1) λ_{i} + | λ_{j} |)} \leq \frac{t}{4}, i \in α, j \in γ .

(15)

We have from (14) and (15) that

\begin{matrix} {∥[M (\bar{x}, \bar{π}, \bar{Y}, t) - M (\bar{x}, \bar{π}, \bar{Y}, 0)] (\begin{matrix} d_{x} \\ d_{π} \\ d_{Y} \end{matrix})∥}^{2} \\ = t^{2} ∥ d_{π} ∥^{2} + ∥ [D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] D g (\bar{x}) d_{x} ∥^{2} + {∥ [D Π_{S_{+}^{p}} (\bar{Z} + t \bar{Y}) - D Π_{S_{+}^{p}} (\bar{Z})] d_{Y} - t d_{Y} ∥}^{2} \\ \leq t^{2} ∥ d_{π} ∥^{2} + \frac{t^{2}}{4} ∥ D g (\bar{x}) ∥^{2} ∥ d_{x} ∥^{2} + t^{2} {∥ d_{Y} ∥}^{2} \\ \leq max \{1, \frac{1}{4} {∥ D g (\bar{x}) ∥}^{2}\} t^{2} {∥(\begin{matrix} d_{x} \\ d_{π} \\ d_{Y} \end{matrix})∥}^{2} . \end{matrix}

Thus, we have, for

t \in [0, {[c_{0}^{*}]}^{- 1})

, that

∥ [M (\bar{x}, \bar{π}, \bar{Y}, t) - M (\bar{x}, \bar{π}, \bar{Y}, 0) ∥ \leq max \{1, \frac{1}{2} ∥ D g (\bar{x}) ∥\} t .

Therefore, there exists a sufficiently large positive number

σ_{1}^{*} > σ_{0}^{*}

, for

t_{1}^{*} = {[σ_{1}^{*}]}^{- 1}

, if

t \in [0, t_{1}^{*}]

; then,

M (\bar{x}, \bar{π}, \bar{Y}, t)

is nonsingular and

∥ M {(\bar{x}, \bar{π}, \bar{Y}, t)}^{- 1} ∥ \leq β_{0}

for some positive constant

β_{0} > 0

. The proof is complete. □

Proposition 3.

The corresponding Löwner operator F is twice (continuously) differentiable at X if and only if f is twice (continuously) differentiable at

λ_{i} (X)

,

i = 1, \dots, p

.

Proposition 4.

Let

(\bar{x}, \bar{π}, \bar{Y})

be a Karush–Kuhn–Tucker point of Problem (1) and the Jacobian uniqueness conditions are satisfied at

(\bar{x}, \bar{π}, \bar{Y})

. Then, there exist

β_{1} \geq β_{0}

,

δ_{1} \in (0, δ_{0})

, and

σ_{2}^{*} \geq σ_{1}^{*}

, for

t_{2}^{*} = {[σ_{2}^{*}]}^{- 1}

,

M (x, π, Y, t)

is nonsingular and

{∥ M (x, π, Y, t)}^{- 1} ∥ \leq β_{1}

if

(x, π, Y) \in B_{δ_{1}} (\bar{x}, \bar{π}, \bar{Y})

and

t \in [0, t_{2}^{*})

.

Proof.

We have from Proposition 2 that the operator

M (\bar{x}, \bar{π}, \bar{Y}, t_{1}^{*})

is nonsingular. Since the norm of

D Π_{S_{+}^{p}}

is less than 1 and

\begin{matrix} [M (x, π, Y, t) - M (\bar{x}, \bar{π}, \bar{Y}, t)] \\ = [\begin{matrix} \nabla_{x x}^{2} L (x, π, Y) - \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) & J h {(x)}^{T} - J h {(\bar{x})}^{T} & D g {(x)}^{*} - D g {(\bar{x})}^{*} \\ J h (x) - J h (\bar{x}) & 0 & 0 \\ D Π_{S_{+}^{p}} (Z (x, Y, t)) D g (x) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t) D g (\bar{x}) & 0 & D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t) \end{matrix}], \end{matrix}

we have

\begin{matrix} ∥ M (x, π, Y, t) - M (\bar{x}, \bar{π}, \bar{Y}, t) ∥ \\ \leq ∥ \nabla_{x x}^{2} L (x, π, Y) - \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) ∥ + 2 ∥ J h (x) - J h (\bar{x}) ∥ + ∥ D g {(x)}^{*} - D g {(\bar{x})}^{*} ∥ \\ + ∥ D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t) ∥ + ∥ D Π_{S_{+}^{p}} (Z (x, Y, t)) D g (x) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t)) D g (\bar{x}) ∥ \\ \leq ∥ \nabla_{x x}^{2} L (x, π, Y) - \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) ∥ + 2 ∥ J h (x) - J h (\bar{x}) ∥ + ∥ D g {(x)}^{*} - D g {(\bar{x})}^{*} ∥ \\ + ∥ D Π_{S_{+}^{p}} (Z (x, Y, t)) [D g (x) - D g (\bar{x})] + [D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t))] D g (\bar{x}) ∥ \\ + ∥ D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t) ∥ \\ \leq ∥ \nabla_{x x}^{2} L (x, π, Y) - \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) ∥ + 2 ∥ J h (x) - J h (\bar{x}) ∥ \\ + 2 ∥ [D g (x) - D g (\bar{x})] ∥ + ∥ [D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t))] D g (\bar{x}) ∥ \\ + ∥ D Π_{S_{+}^{p}} (Z (x, Y, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t) ∥ . \end{matrix}

Since

Π_{S_{+}^{p}}

is twice continuously differentiable at

Z (\bar{x}, \bar{Y}, t)

we obtain

lim_{(x, π, Y) \to (\bar{x}, \bar{π}, \bar{Y})} M (x, π, Y, t) = M (\bar{x}, \bar{π}, \bar{Y}, t)

(16)

for

t \in [0, t_{0}^{*})

. For

H \in S^{p}

we obtain

[D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t_{1}^{*})] H = P [\begin{matrix} 0 & {\tilde{Ω}}_{α, γ} \circ P_{α}^{T} H P_{γ} \\ {\tilde{Ω}}_{α, γ}^{T} \circ P_{α}^{T} H P_{γ}^{T} & 0 \end{matrix}] P^{T},

where

{\tilde{Ω}}_{α, γ} = ({\tilde{Ω}}_{i j} : i \in α, j \in γ), {\tilde{Ω}}_{i j} = \frac{(1 + t) λ_{i}}{| λ_{j} | + (1 + t) λ_{i}} - \frac{(1 + t_{1}^{*}) λ_{i}}{| λ_{j} | + (1 + t_{1}^{*}) λ_{i}} .

Note for

t \in [0, t_{1}^{*})

that

\begin{matrix} | {\tilde{Ω}}_{i j} | & = \frac{| t - t_{1}^{*} | λ_{i} | λ_{j} |}{(| λ_{j} | + (1 + t) λ_{i}) (| λ_{j} | + (1 + t_{1}^{*}) λ_{i})} \\ < \frac{| t - t_{1}^{*} | λ_{i} | λ_{j} |}{(| λ_{j} {| + λ_{i})}^{2}} \\ \leq \frac{t_{1}^{*} - t}{4}, \end{matrix}

we have

∥ [D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t_{1}^{*})] ∥ \leq (t_{1}^{*} - t)) / 2 .

Therefore, from

\begin{matrix} M (\bar{x}, \bar{π}, \bar{Y}, t)] - M (\bar{x}, \bar{π}, \bar{Y}, t_{1}^{*}) \\ = [\begin{matrix} 0 & 0 & 0 \\ 0 & [t_{1}^{*} - t] I_{q} & 0 \\ [D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t_{1}^{*})] D g (\bar{x}) & 0 & [t_{1}^{*} - t] I + [D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t)) - D Π_{S_{+}^{p}} (Z (\bar{x}, \bar{Y}, t_{1}^{*})] \end{matrix}] \end{matrix}

we get

∥ M (\bar{x}, \bar{π}, \bar{Y}, t)] - M (\bar{x}, \bar{π}, \bar{Y}, t_{0}) ∥ \leq α_{0} [t_{0} - t] .

(17)

where

α_{0} = max {3 / 2, ∥ D g (\bar{x}) ∥ / 2} .

As

M (x, π, Y, t)

can be expressed as

M (x, π, Y, t) = M (\bar{x}, \bar{π}, \bar{Y}, t_{1}^{*}) + [M (x, π, Y, t) - M (\bar{x}, \bar{π}, \bar{Y}, t)] + [M (\bar{x}, \bar{π}, \bar{Y}, t)] - M (\bar{x}, \bar{π}, \bar{Y},, t_{1}^{*})],

we have from (16) and (17) that there exist

δ_{1} \in (0, δ_{0})

and

β_{1} > 0

and for sufficiently small

t_{2}^{*} \in (0, t_{1}^{*}]

,

M (x, π, Y, t)

is nonsingular if

\forall (x, π, Y) \in B_{δ_{1}} (\bar{x}, \bar{π}, \bar{Y}), \forall t \in [0, t_{2}^{*})

, and

{∥ M (x, π, Y, t)}^{- 1} ∥ \leq β_{1}

if

(x, π, Y) \in B_{δ_{1}} (\bar{x}, \bar{π}, \bar{Y}), \forall t \in [0, t_{2}^{*})

. □

3. The Convergence Rate of the Augmented Lagrange Method

In this section, we focus on the local convergence of the augmented Lagrange method for nonlinear semidefinite optimization problems under the Jacobian uniqueness conditions. Now we estimate the solution error of the augmented Lagrange subproblem

min_{x} L_{σ} (x, π, Y)

and the error for the updating multiplier when

(π, Y)

is around

(\bar{π}, \bar{Y})

. The local convergence and the linear rate of multipliers can be obtained by using these estimates.

For a real number

s > 0

define

D (σ^{*}, s) = \{(π, Y, σ) : ∥ (π, Y) - (\bar{π}, \bar{Y}) ∥ \leq s σ, σ \geq σ^{*}\} .

Theorem 1.

Let Jacobian uniqueness conditions be satisfied at

(\bar{π}, \bar{Y})

. Then, there exist

ϵ > 0

and

β > 0

and

δ > 0

such that for any

(π, Y, σ) \in D (σ^{*}, δ)

the problem

\begin{matrix} min & L_{σ} (x, π, Y) \\ s . t . & x \in B_{ϵ} (\bar{x}) \end{matrix}

(18)

has a unique solution

x (π, Y, σ)

, which is smooth on

int D (σ^{*}, δ)

. Moreover, for

\forall (π, Y, σ) \in int D (σ^{*}, δ)

,

\begin{matrix} ∥ x (π, Y, σ) - \bar{x} ∥ \leq \frac{β}{σ} ∥ (π, Y) - (\bar{π}, \bar{Y}) ∥, \\ ∥ \tilde{π} (π, Y, σ) - \bar{π} ∥ \leq \frac{β}{σ} ∥ (π, Y) - (\bar{π}, \bar{Y}) ∥, \\ ∥ \tilde{Y} (π, Y, σ) - \bar{Y} ∥ \leq \frac{β}{σ} ∥ (π, Y) - (\bar{π}, \bar{Y}) ∥ . \end{matrix}

(19)

where

\tilde{π} (π, Y, σ) = π + σ h (x (π, Y, σ)), \tilde{Y} (π, Y, σ) = Π_{S_{+}^{p}} (Y + σ g (x (π, Y, σ))) .

Proof.

If x is a local minimum point of Problem

L_{σ} (\cdot, π, Y),

then, from the definition of

(\tilde{π}, \tilde{Y})

, we obtain

\begin{matrix} \nabla f (x) + J h {(x)}^{T} \tilde{π} + D g {(x)}^{*} \tilde{Y} = 0, \\ h (x) + \frac{1}{σ} (π - \tilde{π}) = 0, \\ Π_{S_{+}^{p}} (g (x) + \frac{1}{σ} Y) - \frac{1}{σ} \tilde{Y} = 0 . \end{matrix}

(20)

Define

ζ = \frac{1}{σ} [π - \bar{π}]

,

U = \frac{1}{σ} [Y - \bar{Y}]

and

t = \frac{1}{σ}

, note

\begin{matrix} Π_{S_{+}^{p}} (g (x) + \frac{1}{σ} Y) = \frac{1}{σ} \tilde{Y} & ⟺ Π_{S_{+}^{p}} (\frac{1}{σ} \tilde{Y} + (g (x) + \frac{1}{σ} Y - \frac{1}{σ} \tilde{Y})) = \frac{1}{σ} \tilde{Y} \\ ⟺ \frac{1}{σ} \tilde{Y} \in N_{S_{-}^{p}} (g (x) + \frac{1}{σ} Y - \frac{1}{σ} \tilde{Y}) \\ ⟺ (1 + \frac{1}{σ}) \tilde{Y} \in N_{S_{-}^{p}} (g (x) + \frac{1}{σ} Y - \frac{1}{σ} \tilde{Y}) \\ ⟺ g (x) + \frac{1}{σ} Y - \frac{1}{σ} \tilde{Y} = Π_{S_{-}^{p}} (g (x) + \frac{1}{σ} Y + \tilde{Y}) \\ = (g (x) + \frac{1}{σ} Y + \tilde{Y}) - Π_{S_{+}^{p}} (g (x) + \frac{1}{σ} Y + \tilde{Y}) \\ ⟺ \tilde{Y} + \frac{1}{σ} \tilde{Y} = Π_{S_{+}^{p}} (g (x) + \frac{1}{σ} Y + \tilde{Y}), \end{matrix}

then, the system (20) is equivalent to

F (x, \tilde{π}, \tilde{Y}; ζ, U, t) = 0

, where

F (x, \tilde{π}, \tilde{Y}; ζ, U, t) = [\begin{matrix} \nabla f (x) + J h {(x)}^{T} \tilde{π} + D g {(x)}^{*} \tilde{Y} \\ h (x) + ζ + t \bar{π} - t \tilde{π} \\ Π_{S_{+}^{p}} (g (x) + \tilde{Y} + U + t \bar{Y}) - (1 + t) \tilde{Y} \end{matrix}] .

Obviously, we have

F (\bar{x}, \bar{π}, \bar{Y}; 0, 0, t) = 0, \forall t \in [0, t_{2}^{*} / 2],

and

D_{(x, \tilde{π}, \tilde{Y})} F (\bar{x}, \bar{π}, \bar{Y}; 0, 0, t) = [\begin{matrix} \nabla_{x x}^{2} L (\bar{x}, \bar{π}, \bar{Y}) & J h {(\bar{x})}^{T} & D g {(\bar{x})}^{*} \\ J h (\bar{x}) & - t I & 0 \\ D Π_{S_{+}^{p}} (g (\bar{x}) + (1 + t) \bar{Y}) D g (\bar{x}) & 0 & - (1 + t) I + D Π_{S_{+}^{p}} (g (\bar{x}) + (1 + t) \bar{Y}) \end{matrix}] .

Obviously, from the definition

M (x, π, Y, t)

in (10), we have

D_{(x, \tilde{π}, \tilde{Y})} F (\bar{x}, \bar{π}, \bar{Y}; 0, 0, t) = M (\bar{x}, \bar{π}, \bar{Y}, t)

. Then, from Proposition 4, we have that

D_{(x, \tilde{π}, \tilde{Y})} F (\bar{x}, \bar{π}, \bar{Y}; 0, 0, t)

is nonsingular when

t \in [0, t_{2}^{*})

.

Define

t^{*} = t_{2}^{*} / 2

and

σ^{*} = {[t^{*}]}^{- 1}

and

K = {0} \times {0} \times [0, t^{*}] \subset R^{n} \times S^{p} \times R .

From the implicit function theorem, we have that there exists

δ \in (0, t_{2}^{*} / 2)

with

δ < δ_{1}

,

0 < ϵ < δ_{1}

and mapping

(\hat{x} (\cdot), \hat{π} (\cdot), \hat{Y} (\cdot)) : B (K, δ) \to B_{ϵ} ((\bar{x}, \bar{π}, \bar{Y})),

which is smooth on

int B (K, δ)

and satisfies

\begin{matrix} (\bar{x}, \bar{π}, \bar{λ}) = (\hat{x} (0, 0, t), \hat{π} (0, 0, t), \hat{Y} (0, 0, t)), \\ F (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t); ζ, U, t) = 0 \forall (ζ, U, t) \in B (K, δ) . \end{matrix}

(21)

From Propositions 1 and 4, we may choose

δ > 0

and

ϵ > 0

small enough such that constraint nondegeneracy condition holds at

\hat{x} (ζ, U, t)

,

\nabla_{x x}^{2} L_{t^{- 1}} (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t)

,

\hat{Y} (ζ, U, t))

is positively definite and

∥ M {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t), t)}^{- 1} ∥ \leq β_{1}

for all

(ζ, U, t) \in B (K, δ)

.

Differentiating the three equations in (21) with respect to

(ζ, U, t)

, we obtain

\begin{matrix} M (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t), t) D_{(ζ, U, t)} [\begin{matrix} \hat{x} (ζ, U, t) \\ \hat{π} (ζ, U, t) \\ \hat{Y} (ζ, U, t) \end{matrix}] \\ = [\begin{matrix} 0 & 0 & 0 \\ - I_{q} & 0 & \hat{π} (ζ, U, t) - \bar{π} \\ 0 & - D Π_{S_{+}^{p}} ({\hat{Z}}_{t} (ζ, U, t)) & \hat{Y} - D Π_{S_{+}^{p}} ({\hat{Z}}_{t} (ζ, U, t)) \bar{Y} \end{matrix}], \end{matrix}

(22)

where

{\hat{Z}}_{t} (ζ, U, t) = g (\hat{x} (ζ, U, t)) + \hat{Y} (ζ, U, t) + U + t \bar{Y}

. Define

z (s) = (s ζ, s U, s t)

and

\hat{Z} (s) = g (\hat{x} (z (s))) + \hat{Y} (z (s)) + U + s \bar{Y}

. Then, we have from (22), for

t \in [0, t_{2}^{*} / 2]

that

\begin{matrix} [\begin{matrix} \hat{x} (ζ, Y, t) - \bar{x} \\ \hat{π} (ζ, Y, t) - \bar{π} \\ \hat{Y} (ζ, Y, t) - \bar{Y} \end{matrix}] = [\begin{matrix} \hat{x} (ζ, Y, t) - \hat{x} (0, 0, 0) \\ \hat{π} (ζ, Y, t) - \hat{π} (0, 0, 0) \\ \hat{Y} (ζ, Y, t) - \hat{Y} (0, 0, 0) \end{matrix}] = \int_{0}^{1} M {(\hat{x} (z (s)), \hat{π} (z (s)), \hat{Y} (z (s)), s)}^{- 1} \times \\ \times [\begin{matrix} 0 & 0 & 0 \\ - I & 0 & \hat{π} (z (s)) - \bar{π} \\ 0 & - D Π_{S_{+}^{p}} (Z (s)) & \hat{Y} - D Π_{S_{+}^{p}} (Z (s)) \bar{Y} \end{matrix}] [\begin{matrix} ζ \\ U \\ t \end{matrix}] d s \\ = \int_{0}^{1} M {(\hat{x} (z (s)), \hat{π} (z (s)), \hat{Y} (z (s)), s)}^{- 1} [\begin{matrix} 0 \\ - ζ + (\hat{π} (z (s)) - \bar{π}) t \\ - D Π_{S_{+}^{p}} (Z (s)) U + (\hat{Y} (z (s)) - \bar{Y}) t \end{matrix}] d s \\ + \int_{0}^{1} M {(\hat{x} (z (s)), \hat{π} (z (s)), \hat{Y} (z (s)), s)}^{- 1} [\begin{matrix} 0 \\ 0 \\ t D Π_{S_{-}^{p}} (Z (s)) \bar{Y} \end{matrix}] d s \end{matrix}

(23)

Noting that

∥ M {(\hat{x} (z (s)), \hat{π} (z (s)), \hat{Y} (z (s)), s)}^{- 1} ∥ \leq β_{1}

for

(ζ, U, t) \in B (K, δ)

and

s \in [0, 1]

, we obtain from (23) and

∥ D Π_{S_{+}^{p}} (g (\hat{x} (z (s)) + \hat{Y} (z (s)) + U + s \bar{Y}) ∥ \leq 1

that

\begin{matrix} ∥ \hat{x} (ζ, Y, t) - \bar{x} ∥^{2} + ∥ \hat{π} (ζ, Y, t) - \bar{π} ∥^{2} + {∥ \hat{Y} (ζ, Y, t) - \bar{Y} ∥}^{2} \\ \leq 4 β_{1}^{2} \int_{0}^{1} [{∥ ζ ∥}^{2} + ∥ \hat{π} (z (s)) - \bar{π} ∥^{2} t^{2} + {∥ U ∥}^{2} + {∥ \hat{Y} (z (s)) - \bar{Y} ∥}^{2} t^{2}] d s \\ + 2 β_{1}^{2} \int_{0}^{1} [∥ (D Π_{S_{-}^{p}} (g (\hat{x} (z (s))) + \hat{Y} (z (s)) + U + s \bar{Y}) \bar{Y}) ∥^{2} t^{2}] d s . \end{matrix}

(24)

Noting that

Π_{S_{-}^{p}}

is twice continuously differentiable at

g (\bar{x}) + (s + 1) \bar{Y}

, we have

\begin{matrix} D Π_{S_{-}^{p}} (g (\hat{x} (z (s))) + \hat{Y} (z (s)) + U + s \bar{Y}) \bar{Y} \\ = D Π_{S_{-}^{p}} (g (\bar{x}) + (s + 1) \bar{Y}) \bar{Y} + D^{2} Π_{S_{-}^{p}} (g (\bar{x}) + (s + 1) \bar{Y}) [\bar{Y}, g (\hat{x} (z (s))) + U - g (\bar{x}) + \hat{Y} (z (s)) - \bar{Y}] \\ + o (∥ g (\hat{x} (z (s))) + U - g (\bar{x}) + \hat{Y} (z (s)) - \bar{Y} ∥) . \end{matrix}

It is easy to check the equality

D Π_{S_{-}^{p}} (g (\bar{x}) + (s + 1) \bar{Y}) \bar{Y} = 0

. Then, when

δ > 0

is chosen small enough, there exists a positive constant

κ_{0} > 0

such that

\begin{matrix} ∥ D Π_{S_{-}^{p}} (g (\hat{x} (z (s))) + \hat{Y} (z (s)) + U + s \bar{Y}) \bar{Y} ∥ \\ \leq 2 ∥ D^{2} Π_{S_{-}^{p}} (g (\bar{x}) + (s + 1) \bar{Y}) [\bar{Y}, g (\hat{x} (z (s))) + U - g (\bar{x}) + \hat{Y} (z (s)) - \bar{Y}] ∥ \\ \leq κ_{0} [∥ U ∥ + ∥ \hat{x} (z (s)) - \bar{x} ∥ + ∥ \hat{Y} (z (s)) - \bar{Y} ∥] \end{matrix}

when

(ζ, U, t) \in B (K, δ)

and

s \in [0, 1]

.

Combining this estimate with (24), we obtain

\begin{matrix} ∥ \hat{x} (ζ, Y, t) - \bar{x} ∥^{2} + ∥ \hat{π} (ζ, Y, t) - \bar{π} ∥^{2} + {∥ \hat{Y} (ζ, Y, t) - \bar{Y} ∥}^{2} \\ \leq 4 β_{1}^{2} {∥ ζ ∥}^{2} + 4 β_{1}^{2} [1 + κ_{0}^{2} t^{2}] {∥ U ∥}^{2} + \\ + (1 + 2 κ_{0}^{2}) {∥ \hat{Y} (z (s)) - \bar{Y} ∥}^{2} + 4 β_{1}^{2} t^{2} \int_{0}^{1} [∥ \hat{π} (z (s)) - \bar{π} ∥^{2} + 2 κ_{0}^{2} {∥ \hat{x} (z (s)) - \bar{x} ∥}^{2}] d s . \end{matrix}

(25)

Substituting

(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t)), \hat{Y} (ζ, U, t))

by

(\hat{x} (z (s)), \hat{π} (z (s)), \hat{Y} (z (s)))

in (25) yields

\begin{matrix} ∥ \hat{x} (z (s)) - \bar{x} ∥^{2} + ∥ \hat{π} (z (s)) - \bar{π} ∥^{2} + {∥ \hat{Y} (z (s)) - \bar{Y} ∥}^{2} \\ \leq 4 β_{1}^{2} {∥ ζ ∥}^{2} + 4 β_{1}^{2} [1 + κ_{0}^{2} t^{2}] {∥ U ∥}^{2} + \\ + (1 + 2 κ_{0}^{2}) {∥ \hat{Y} (z (s)) - \bar{Y} ∥}^{2} + 4 β_{1}^{2} t^{2} \int_{0}^{1} [∥ \hat{π} (z (s)) - \bar{π} ∥^{2} + 2 κ_{0}^{2} {∥ \hat{x} (z (s)) - \bar{x} ∥}^{2}] d s . \end{matrix}

(26)

Since the choice of

s \in [0, 1]

in (26) is arbitrary, we obtain

\begin{matrix} max_{0 \leq s \leq 1} \{∥ \hat{x} (z (s)) - \bar{x} ∥^{2} + ∥ \hat{Y} (z (s)) - \bar{Y} ∥^{2} + {∥ \hat{π} (z (s)) - \bar{π} ∥}^{2}\} \\ \leq 4 β_{1}^{2} {∥ ζ ∥}^{2} + 4 β_{1}^{2} [1 + κ_{0}^{2} t^{2}] {∥ U ∥}^{2} \\ + 4 β_{1}^{2} (1 + 2 κ_{0}^{2}) t^{2} \cdot max_{0 \leq s \leq 1} \{∥ \hat{x} (z (s)) - \bar{x} ∥^{2} + ∥ \hat{Y} (z (s)) - \bar{Y} ∥^{2} + {∥ \hat{π} (z (s)) - \bar{π} ∥}^{2}\}, \end{matrix}

which implies

\begin{matrix} ∥ \hat{x} (ζ, Y, t) - \bar{x} ∥^{2} + ∥ \hat{π} (ζ, Y, t) - \bar{π} ∥^{2} + {∥ \hat{Y} (ζ, Y, t) - \bar{Y} ∥}^{2} \\ \leq \frac{4 β_{1}^{2} [1 + κ_{0}^{2} t^{2}]}{1 - 4 β_{1}^{2} (1 + 2 κ_{0}^{2}) t^{2}} [∥ ζ ∥ + {∥ U ∥]}^{2}, \forall (ζ, U, t) \in B (K, δ) \end{matrix}

or

∥(\begin{matrix} \hat{x} (ζ, U, t) - \bar{x} \\ \hat{π} (ζ, U, t) - \bar{π} \\ \hat{Y} (ζ, U, t) - \bar{Y} \end{matrix})∥ \leq \frac{2 β_{1} \sqrt{1 + κ_{0}^{2} t^{2}}}{\sqrt{1 - 4 β_{1}^{2} (1 + 2 κ_{0}^{2}) t^{2}}} [∥ ζ ∥ + ∥ U ∥], \forall (ζ, U, t) \in B (K, δ) .

(27)

Define

x (π, Y, σ) = \hat{x} (ζ, U, t), \tilde{π} (π, Y, σ) = \hat{π} (ζ, U, t), \tilde{Y} (π, U, σ) = \hat{Y} (ζ, U, t), \forall (ζ, U, t) \in B (K, δ) .

From the definitions of

D (σ_{*}, δ)

and K, we have that

(π, Y, σ) \in D (σ_{*}, δ) ⟹ (ζ, U, t) \in B (K, δ) f o r (ζ, U, t) = (\frac{π - \bar{π}}{σ}, \frac{Y - \bar{Y}}{σ}, \frac{1}{σ}) .

It follows from (21) that

(\bar{x}, \bar{π}, \bar{Y}) = (x (\bar{π}, \bar{Y}, σ), \tilde{π} (\bar{π}, \bar{Y}, σ), \tilde{Y} (\bar{π}, \bar{Y}, σ))

and

\begin{matrix} \nabla_{x} L_{σ} (x (π, Y, σ), π, Y) = \nabla_{x} L (x (π, Y, σ), \tilde{π} (π, Y, σ), \tilde{Y} (π, Y, σ)) = 0, \\ \tilde{Y} (π, Y, σ) = Π_{S_{+}^{p}} (Y + σ g (x (π, Y, σ))), \tilde{π} (π, Y, σ) = π + σ h (x (π, Y, σ)) . \end{matrix}

Noting that

(x (π, Y, σ), \tilde{π} (π, Y, σ), \tilde{Y} (π, Y, σ)) \in B_{ϵ} (\bar{x}, \bar{π}, \bar{Y})

and

ϵ < δ_{1} \leq δ_{0}

and

σ \geq σ^{*} \geq σ_{0}^{*}

we have from Proposition 1 that

\nabla_{x x}^{2} L_{σ} (x (π, Y, σ), π, Y) ≻ 0 .

Thus,

x (π, U, σ)

is the unique solution of Problem (18) and differentiable on

int D (σ^{*}, δ)

. Without loss of generality, suppose

σ^{*} > \sqrt{κ_{0}^{2} + 8 β_{1}^{2} (1 + 2 κ_{0}^{2})}

and define

β = 4 β_{1}

. The for any

(π, Y, σ) \in D (σ^{*}, δ)

, we obtain from (27) that

∥(\begin{matrix} x (π, Y, σ) - \bar{x} \\ \hat{π} (π, Y, σ) - \bar{π} \\ \hat{Y} (π, Y, σ) - \bar{Y} \end{matrix})∥ \leq \frac{β}{σ} [∥ π - \bar{π} ∥ + ∥ Y - \bar{Y} ∥],

(28)

which implies the estimates (19). □

According to the above theorem, it is easy for us to prove the local convergence properties of the augmented Lagrange method for the nonlinear semidefinite optimization problem.

Proposition 5.

Let

(\bar{x}, \bar{π}, \bar{λ})

satisfy Jacobian uniqueness conditions. Let

ϵ > 0

and

δ > 0

be given in Theorem 1. Suppose that

σ_{0} > σ^{*}

,

σ_{k} ↗ σ_{\infty}

and

(x^{0}, π^{0}, Y^{0}) \in R^{n} \times R^{q} \times S^{p}

satisfy

β / σ_{0} \leq 1 / 4, ∥ x^{0} - \bar{x} ∥ \leq ϵ, ∥ (π^{0}, Y^{0}) - (\bar{π}, \bar{Y}) ∥ \leq δ σ_{0}, δ < \frac{ϵ}{4 β} .

Then, the sequence

{(x^{k}, π^{k}, Y^{k})}

generated by the ALM is convergent to

(\bar{x}, \bar{π}, \bar{Y})

with

\underset{k \to \infty}{lim sup} \frac{∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥}{∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥} \leq \frac{β}{σ_{\infty}}

(29)

if

σ_{\infty} < + \infty

. The sequence

{(π^{k}, Y^{k})}

converges superlinearly to

(\bar{π}, \bar{Y})

when

σ_{\infty} = + \infty

.

Proof.

For the sequence

{(x^{k}, π^{k}, Y^{k})}

generated by the ALM, we obtain from Theorem 1 that

\begin{matrix} ∥ x^{1} - \bar{x} ∥ \leq \frac{β}{σ_{0}} ∥ (π^{0}, Y^{0}) - (\bar{π}, \bar{Y}) ∥, \\ ∥ (π^{1}, Y^{1}) - (\bar{π}, \bar{Y}) ∥ \leq \frac{β}{σ_{0}} ∥ (π^{0}, Y^{0}) - (\bar{π}, \bar{Y}) ∥, \end{matrix}

which implies

∥ x^{1} - \bar{x} ∥ \leq β δ < ϵ / 4

and

σ_{1}^{- 1} ∥ (π^{1}, Y^{1}) - (\bar{π}, \bar{Y}) ∥ \leq σ_{0}^{- 1} ∥ (π^{1}, Y^{1}) - (\bar{π}, \bar{Y}) ∥ \leq \frac{1}{σ_{0}} \frac{β}{σ_{0}} (σ_{0} δ) \leq \frac{1}{σ_{0}} (β δ) \leq \frac{δ}{4} .

Suppose that

(x^{k}, π^{k}, Y^{k})

satisfies

σ_{k}^{- 1} ∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥ \leq δ

and

∥ x^{k} - \bar{x} ∥ \leq ϵ

, then for

σ_{k + 1} \geq σ_{k}

, from Theorem 1 we have that

\begin{matrix} ∥ x^{k + 1} - \bar{x} ∥ \leq \frac{β}{σ_{k}} ∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥, \\ ∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥ \leq \frac{β}{σ_{k}} | (π^{k}, λ^{k}) - (\bar{π}, \bar{λ}) ∥ \end{matrix}

(30)

which implies

∥ x^{k + 1} - \bar{x} ∥ \leq β δ < ϵ / 4

and

σ_{k + 1}^{- 1} ∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥ \leq σ_{k}^{- 1} ∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥ \leq \frac{1}{σ_{k}} \frac{β}{σ_{k}} [[| (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥] \leq \frac{β}{σ_{k}} δ \leq \frac{δ}{4} .

Therefore, by induction, we obtain that for any

k = 1, 2, \dots,

σ_{k}^{- 1} ∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥ \leq δ

and

∥ x^{k} - \bar{x} ∥ \leq ϵ

. Then for

k > 1

, we obtain

\begin{matrix} ∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥ \leq \frac{β}{σ_{k}} \times \frac{β}{σ_{k - 1}} \times \dots \times \frac{β}{σ_{1}} \times \frac{β}{σ_{0}} ∥ (π^{0}, Y^{0}) - (\bar{π}, \bar{Y}) ∥, \end{matrix}

(31)

which implies

∥ x^{k + 1} - \bar{x} ∥ \leq \frac{β}{σ_{k}} \times \frac{β}{σ_{k - 1}} \times \dots \times \frac{β}{σ_{1}} \times \frac{β}{σ_{0}} ∥ (π^{0}, Y^{0}) - (\bar{π}, \bar{Y}) ∥ .

Noting that

β / σ_{0} \leq 1 / 4

and

σ_{k}

is increasing, we obtain from the above inequality that

(x^{k}, π^{k}, Y^{k}) \to (\bar{x}, \bar{π}, \bar{Y})

. The estimate in (29) comes from (30) and the rate of convergence is superlinear when

c_{\infty} = + \infty

. □

4. Asymptotical Superlinear Convergence of Multipliers

In Theorem 1, the convergence rate of the augmented Lagrange method is characterized by (19), which involves a constant

β

. The means by which to give an estimate of

β

are an important topic. In this section, we estimate

β

using the eigenvalues of the second-order derivative of the perturbation function of Problem (1).

Let

(\bar{x}, \bar{π}, \bar{Y})

be a Kurash–Kuhn–Tucker point of Problem (1), consider the following system of equations in

(x, π, Y, u)

,

\begin{matrix} \nabla f (x) + J h {(x)}^{T} π + D g {(x)}^{*} Y & = 0, \\ h (x) + u_{h} & = 0, \\ Π_{S_{+}^{p}} (σ (g (x) + u_{g}) + Y) - Y & = 0, \end{matrix}

(32)

then,

(\bar{x}, \bar{π}, \bar{Y})

is a solution of (32) for any where

σ > 0

. According to the implicit function theorem, there exist a constant

δ > 0

and mappings

(x (\cdot), π (\cdot), Y (\cdot)) \in C^{1} (B_{δ} (0))

such that

x (0) = \bar{x}, π (0) = \bar{π}, Y (0) = \bar{Y},

and for

∥ u ∥ \leq δ

, where

u = (u_{h}, u_{g}) \in R^{q} \times S^{p}

,

\begin{matrix} \nabla f (x (u)) + J h {(x (u))}^{T} π (u) + D g {(x (u))}^{*} Y (u) & = 0, \\ h (x (u)) + u_{h} & = 0, \\ Π_{S_{+}^{p}} (σ (g (x (u)) + u_{g}) + Y (u)) - Y (u) & = 0 \end{matrix}

(33)

Moreover, there exists

ε > 0

such that

x (u) \in B_{ε} (\bar{x}), π (u) \in B_{ε} (\bar{π}), Y (u) \in B_{ε} (\bar{Y})

for

∥ u ∥ < δ

. Define the function

p : B_{δ} (0) \to R

as follows

p (u) = f (x (u)), u \in B_{δ} (0) .

(34)

In view of the Jacobian uniqueness conditions,

ε

and

δ

can be taken small enough so that

x (u)

is actually a local minimum point in

int B_{ε} (\bar{x})

of the following perturbed problem

min_{x \in R^{n}} \{f (x) : h (x) + u_{h} = 0, g (x) + u_{g} ⪯ 0\} .

(35)

Thus, the function p is actually the following perturbation function:

p (u) = f (x (u)) = min_{x \in B_{ε} (\bar{x})} \{f (x) : h (x) + u_{h} = 0, g (x) + u_{g} ⪯ 0\} .

(36)

Lemma 4.

Suppose that Jacobian uniqueness conditions hold and ε and δ are taken sufficiently small such that

(x (u), y (u))

is a local minimum point of Problem (35). Then,

D p {(u)}^{*} = (\begin{matrix} π (u) \\ Y (u) \end{matrix}), \forall u \in B_{δ} (0) .

(37)

Proof.

We use

L

to denote the Lagrange function of Problem (35), namely

L (x, π, Y; u) = f (x) + π^{T} (h (x) + u_{h}) + 〈 Y, g (x) + u_{g} 〉 .

Then,

p (u)

is expressed as follows

p (u) = f (x (u)) + 〈 Y (u), g (x (u)) + u_{g} 〉 + 〈 π (u), h (x (u)) + u_{h} 〉 = L (x (u), π (u), Y (u); u) .

Noting

\nabla_{x} L (x (u), π (u), Y (u); u) = 0

and

\nabla_{(π, Y)} L (x (u), π (u), Y (u); u) = 0

, from the above expression of

p (u)

we obtain

\begin{matrix} D p {(u)}^{*} & = & D_{u} x {(u)}^{*} \nabla_{x} L (x (u), π (u), Y (u); u) + D π {(u)}^{*} \nabla_{π} L (x (u), π (u), Y (u); u) \\ + D Y {(u)}^{*} \nabla_{Y} L (x (u), π (u), Y (u); u) + D_{u} L {(x (u), π (u), Y (u); u)}^{*} \\ = & D_{u} L {(x (u), π (u), Y (u); u)}^{*} \\ = & (\begin{matrix} π (u) \\ Y (u) \end{matrix}) . \end{matrix}

The proof is complete. □

Lemma 5.

Suppose that Jacobian uniqueness conditions hold and ε and δ are taken sufficiently small so that

(x (u), y (u))

is a local minimum point of the perturbed problem (35). Then,

\begin{matrix} σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] = \{[\begin{matrix} 0 & 0 \\ 0 & \frac{1}{σ} (I - D Π_{S_{+}^{p}} (Z_{σ} (u))) \end{matrix}] \\ {+ [\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}] \nabla_{x x}^{2} L_{σ} {(x (u), π (u), Y (u))}^{- 1} {[\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}]}^{*}\}}^{- 1} . \end{matrix}

(38)

Proof.

Differentiating (33), we obtain

\nabla_{x x}^{2} L (x (u), π (u), Y (u)) D_{u} x (u) + J_{x} h {(x (u))}^{T} D_{u} π (u) + D_{x} g {(x (u))}^{*} D_{u} Y (u) = 0

(39)

and

\begin{matrix} J_{x} h (x (u)) D_{u} x (u) = [- I_{q} 0], \\ D Π_{S_{+}^{p}} (σ (g (x (u)) + u_{g}) + Y (u)) [σ D g (x (u)) D_{u} x (u) + σ [0 I] + D_{u} Y (u)] - D_{u} Y (u) = 0 . \end{matrix}

(40)

Denote

Z_{σ} (u) = σ (g (x (u)) + u_{g}) + Y (u)

. Then, the Equations (39) and (40) can be written as

[\begin{matrix} \nabla_{x x}^{2} L (x (u), π (u), Y (u)) & J {(x (u))}^{T} & D g {(x (u))}^{*} \\ J h (x (u)) & 0 & 0 \\ σ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) & 0 & - I + D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}] [\begin{matrix} D_{u} x (u) \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] = [\begin{matrix} 0 & 0 \\ - I_{q} & 0 \\ 0 & - σ D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}]

or

({\bar{K}}_{σ} (x (u), π (u), Y (u)) + [\begin{matrix} 0 & 0 & 0 \\ 0 & σ^{- 1} I_{q} & 0 \\ 0 & 0 & σ^{- 1} D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}]) [\begin{matrix} D_{u} x (u) \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] = [\begin{matrix} 0 & 0 \\ - I_{q} & 0 \\ 0 & - D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}],

(41)

where

{\bar{K}}_{σ} (x (u), π (u), Y (u)) = [\begin{matrix} \nabla_{x x}^{2} L (x (u), π (u), Y (u)) & J h {(x (u))}^{T} & D g {(x (u))}^{*} \\ J h (x (u)) & - σ^{- 1} I_{q} & 0 \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) & 0 & - σ^{- 1} I \end{matrix}] .

Thus, Equation (41) is equivalent to

{\bar{K}}_{σ} (x (u), π (u), Y (u)) [\begin{matrix} D_{u} x (u) \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] = - [\begin{matrix} I & 0 & 0 \\ 0 & I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}] \{[\begin{matrix} 0 & 0 \\ I_{q} & 0 \\ 0 & I \end{matrix}] + σ^{- 1} [\begin{matrix} 0 \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}]\} .

Therefore, we get that

[\begin{matrix} D_{u} x (u) \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] = - σ^{- 1} {\bar{K}}_{σ} {(x (u), π (u), Y (u))}^{- 1} [\begin{matrix} I & 0 & 0 \\ 0 & I_{q} & 0 \\ 0 & 0 & D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}] \{σ [\begin{matrix} 0 & 0 \\ I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} 0 \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}]\}

which implies

\begin{matrix} σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] \\ = - σ^{- 1} [\begin{matrix} 0 & I_{q} & 0 \\ 0 & 0 & I \end{matrix}] {\bar{K}}_{σ} {(x (u), π (u), Y (u))}^{- 1} [\begin{matrix} I & 0 & 0 \\ 0 & I_{q} & 0 \\ 0 & 0 & D Π_{S_{+}^{p}} (Z_{σ} (u)) \end{matrix}] \{σ [\begin{matrix} 0 & 0 \\ I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} 0 \\ D_{u} π (u) \\ D_{u} Y (u) \end{matrix}]\} \\ + σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] . \end{matrix}

(42)

It follows from Page 20 of [15] that the inverse of

{\bar{K}}_{σ} (x (u), π (u), Y (u))

can be expressed as

\begin{matrix} {\bar{K}}_{σ} {(x (u), π (u), Y (u))}^{- 1} = - [\begin{matrix} 0 & 0 & 0 \\ 0 & σ I_{q} & 0 \\ 0 & 0 & σ I \end{matrix}] \\ + [\begin{matrix} - I \\ - σ J h (x (u)) \\ - σ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}] {({\bar{K}}_{σ} (x (u), π (u), Y (u)) / (\begin{matrix} - σ^{- 1} I_{q} & 0 \\ 0 & - σ^{- 1} I \end{matrix}))}^{- 1} {[\begin{matrix} - I \\ - σ J h (x (u)) \\ - σ D g (x (u)) \end{matrix}]}^{*} . \end{matrix}

It is easy to check

{\bar{K}}_{σ} (x (u), π (u), Y (u)) / (\begin{matrix} - σ^{- 1} I_{q} & 0 \\ 0 & - σ^{- 1} I \end{matrix}) = \nabla_{x x}^{2} L_{σ} (x (u), π (u), Y (u))

which implies

\begin{matrix} {\bar{K}}_{σ} {(x (u), π (u), Y (u))}^{- 1} \\ = [\begin{matrix} \nabla_{x x}^{2} L_{σ}^{- 1} (x (u), π (u), Y (u)) & σ \nabla_{x x}^{2} L_{σ}^{- 1} (x (u), π (u), Y (u)) {[\begin{matrix} J h (x (u)) \\ D g (x (u)) \end{matrix}]}^{*} \\ σ [\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}] \nabla_{x x}^{2} L_{σ}^{- 1} (x (u), π (u), Y (u)) & Δ_{σ} (u) \end{matrix}], \end{matrix}

(43)

where

Δ_{σ} (u) = - σ I + σ^{2} [\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}] \nabla_{x x}^{2} L_{σ}^{- 1} (x (u), π (u), Y (u)) [J h {(x (u))}^{T} D g {(x (u))}^{*}] .

Therefore, we have from (42) and (43) that

\begin{matrix} σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} D_{u} π (u) \\ D_{u} Y (u) \end{matrix}] \\ = {\{[\begin{matrix} 0 & 0 \\ 0 & \frac{1}{σ} (I - D Π_{S_{+}^{p}} (Z_{σ} (u))) \end{matrix}] + [\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}] \nabla_{x x}^{2} L_{σ} {(x (u), π (u), Y (u))}^{- 1} {[\begin{matrix} J h (x (u)) \\ D Π_{S_{+}^{p}} (Z_{σ} (u)) D g (x (u)) \end{matrix}]}^{*}\}}^{- 1}; \end{matrix}

namely, the equality (38) holds. □

Corollary 1.

Let Jacobian uniqueness conditions be satisfied at

(\bar{x}, \bar{π}, \bar{λ})

. Then,

\begin{matrix} \nabla^{2} p (0) = - σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] \\ + {\{[\begin{matrix} 0 & 0 \\ 0 & \frac{1}{σ} D Π_{S_{-}^{p}} (Z_{σ}^{*}) \end{matrix}] + [\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (Z_{σ}^{*}) D g (\bar{x}) \end{matrix}] \nabla_{x x}^{2} L_{σ} {(\bar{x}, \bar{π}, \bar{Y})}^{- 1} {[\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (Z_{σ}^{*}) D g (\bar{x}) \end{matrix}]}^{*}\}}^{- 1}, \end{matrix}

(44)

where

Z_{σ}^{*} = Z_{σ} (0) = \bar{Y} + σ g (\bar{x})

.

Proof.

The equality (38) is valid for all u with

∥ u ∥ < δ

and all

σ

large enough. For

u = 0

, we have

\begin{matrix} σ [\begin{matrix} I_{q} & 0 \\ 0 & I \end{matrix}] + [\begin{matrix} D_{u} π (0) \\ D_{u} Y (0) \end{matrix}] \\ = {\{[\begin{matrix} 0 & 0 \\ 0 & \frac{1}{σ} (I - D Π_{S_{+}^{p}} (Z_{σ} (0))) \end{matrix}] + [\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (Z_{σ} (0)) D g (\bar{x}) \end{matrix}] \nabla^{2} L_{σ} {(\bar{x}, \bar{π}, \bar{Y})}^{- 1} {[\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (Z_{σ} (0)) D g (\bar{x}) \end{matrix}]}^{*}\}}^{- 1}; \end{matrix}

which implies (44) from (37). □

By using the above properties, we are able to analyze the rate of convergence of multipliers generated by the augmented Lagrange method. For this purpose, we first give an equivalent expression of

(\begin{matrix} \tilde{π} (π, Y, σ) - \bar{π} \\ \tilde{Y} (π, Y, σ) - \bar{Y} \end{matrix}),

which is a key property for analyzing the superlinear rate of the sequence of multipliers.

Theorem 2.

Let the Jacobian uniqueness conditions be satisfied at

(\bar{x}, \bar{π}, \bar{λ})

. Let

σ^{*} > 0

, δ and ε be given by Theorem 1. Then, for all

(π, Y, σ) \in D (σ^{*}, δ)

,

(\begin{matrix} \tilde{π} (π, Y, σ) - \bar{π} \\ \tilde{Y} (π, Y, σ) - \bar{Y} \end{matrix}) = \int_{0}^{1} Υ_{σ} (\bar{π} + s (π - \bar{π}), \bar{Y} + s (Y - \bar{Y})) (\begin{matrix} π - \bar{π} \\ Y - \bar{Y} \end{matrix}) d s,

(45)

where

Υ_{σ} (π, Y)

is defined by

\begin{matrix} Υ_{σ} (π, Y) = [\begin{matrix} I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (Z_{σ} (π, Y)) \end{matrix}] - σ [\begin{matrix} J h (x (π, Y, σ)) \\ D Π_{S_{+}^{p}} (Z_{σ} (π, Y)) D g (x (π, Y, σ)) \end{matrix}] \times \\ \times \nabla_{x x}^{2} L_{σ} {(x (π, Y, σ), \tilde{π} (π, Y, σ), \tilde{Y} (π, Y, σ))}^{- 1} {[\begin{matrix} J h (x (π, Y, σ)) \\ D Π_{S_{+}^{p}} (Z_{σ} (π, Y)) D g (x (π, Y, σ)) \end{matrix}]}^{*} \end{matrix}

(46)

and

Z_{σ} (π, Y) = σ g (x (π, Y, σ)) + Y

.

Proof.

Define

F_{o} (x, \tilde{π}, \tilde{Y}; ζ, U, t) = [\begin{matrix} \nabla f (x) + J h {(x)}^{T} \tilde{π} + D g {(x)}^{*} \tilde{Y} \\ h (x) + ζ + t \bar{π} - t \tilde{π} \\ Π_{S_{+}^{p}} (g (x) + U + t \bar{Y}) - t \tilde{Y} \end{matrix}] .

Noting that

F_{o} (x, \tilde{π}, \tilde{Y}; ζ, U, t) = 0

is equivalent to

F (x, \tilde{π}, \tilde{Y}; ζ, U, t)

= 0, we have

F_{o} (\hat{x} (ζ, U, t), \tilde{π} (ζ, U, t), \tilde{Y} (ζ, U, t); ζ, U, t) = 0 .

(47)

Differentiating the last three equations in (47) with respect to

(ζ, U, t)

, we obtain

\begin{matrix} D_{(x, \tilde{π}, \tilde{Y})} F_{o} (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t); ζ, U, t) D_{(ζ, U, t)} [\begin{matrix} \hat{x} (ζ, U, t) \\ \hat{π} (ζ, U, t) \\ \hat{Y} (ζ, U, t) \end{matrix}] \\ = [\begin{matrix} 0 & 0 & 0 \\ - I_{q} & 0 & \hat{π} (ζ, U, t) - \bar{π} \\ 0 & - D Π_{S_{+}^{p}} (g (\hat{x} (ζ, U, t)) + U + t \bar{Y}) & \hat{Y} - D Π_{S_{+}^{p}} (g (\hat{x} (ζ, U, t)) + U + t \bar{Y}) \bar{Y} \end{matrix}] . \end{matrix}

(48)

Denoting

A (ζ, U, t) = D_{(x, \tilde{π}, \tilde{Y})} F_{o} (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t), t)

and

Z_{o} (ζ, U, t) = g (\hat{x} (ζ, U, t)) + U + t \bar{Y}

, we have from (48) that

[\begin{matrix} D_{ζ, U} \hat{π} (ζ, U, t) \\ D_{ζ, U} \hat{Y} (ζ, U, t) \end{matrix}] = (\begin{matrix} 0 & I_{q} & 0 \\ 0 & 0 & I \end{matrix}) A {(ζ, U, t)}^{- 1} [\begin{matrix} 0 & 0 \\ - I_{q} & 0 \\ 0 & - D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) \end{matrix}] .

(49)

We can easily obtain the following expression of

A (ζ, U, t)

:

A (ζ, U, t) = [\begin{matrix} \nabla_{x x}^{2} L (\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t)) & J h {(\hat{x} (ζ, U, t))}^{T} & D g {(\hat{x} (ζ, U, t))}^{*} \\ J h (\hat{x} (ζ, U, t)) & - t I_{q} & 0 \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (\hat{x} (ζ, U, t)) & 0 & - t I \end{matrix}] .

From the equality

D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) = D Π_{S_{+}^{p}} (t^{- 1} g (x (ζ, u, t) + Y)

that

A {(ζ, U, t)}^{- 1} = [\begin{matrix} \nabla_{x x}^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} & Θ_{12} (ζ, U, t) \\ Θ_{21} (ζ, U, t) & Θ_{22} (ζ, U, t) \end{matrix}]

(50)

with

\begin{matrix} Θ_{12} (ζ, U, t) = - t^{- 1} \nabla_{x x}^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} [J h {(\hat{x} (ζ, U, t))}^{T} D g \hat{x} {(ζ, U, t)}^{*}] \\ Θ_{21} (ζ, U, t) = - t^{- 1} [\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}] \nabla_{x x}^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} \\ Θ_{22} (ζ, U, t) = - t^{- 1} I + t^{- 2} [\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}] \times \\ \times \nabla_{x x}^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} {[\begin{matrix} J h (x (ζ, U, t)) \\ D g (x (ζ, U, t)) \end{matrix}]}^{*} . \end{matrix}

Thus, we have from (49) that

\begin{matrix} [\begin{matrix} D_{ζ, U} \hat{π} (ζ, U, t) \\ D_{ζ, U} \hat{Y} (ζ, U, t) \end{matrix}] = Θ_{22} (ζ, U, t) [\begin{matrix} I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) \end{matrix}], \end{matrix}

(51)

which implies

\begin{matrix} [\begin{matrix} D_{ζ, U} \hat{π} (ζ, U, t) \\ D_{ζ, U} \hat{Y} (ζ, U, t) \end{matrix}] = t^{- 1} [\begin{matrix} I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) \end{matrix}] - t^{- 2} [\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}] \times \\ \times \nabla^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} {[\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}]}^{*} . \end{matrix}

Then, we get

\begin{matrix} (\begin{matrix} \hat{π} (ζ, U, t) - \bar{π} \\ \hat{Y} (ζ, U, t) - \bar{Y} \end{matrix}) - (\begin{matrix} \hat{π} (ζ, U, t) - π (0, 0, t) \\ \hat{Y} (ζ, U, t) - Y (0, 0, t) \end{matrix}) \\ = \hat{π} (t, γ) - \hat{π} (0, γ) = \int_{0}^{1} [\begin{matrix} D_{ζ, U} \hat{π} (s ζ, s U, t) \\ D_{ζ, U} \hat{Y} (s ζ, s U, t) \end{matrix}] (\begin{matrix} ζ \\ U \end{matrix}) d s \\ = \int_{0}^{1} \{t^{- 1} [\begin{matrix} I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) \end{matrix}] - t^{- 2} [\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}] \times \\ \times \nabla^{2} L_{t^{- 1}} {(\hat{x} (ζ, U, t), \hat{π} (ζ, U, t), \hat{Y} (ζ, U, t))}^{- 1} {[\begin{matrix} J h (x (ζ, U, t)) \\ D Π_{S_{+}^{p}} (Z_{o} (ζ, U, t)) D g (x (ζ, U, t)) \end{matrix}]}^{*}\} (\begin{matrix} ζ \\ U \end{matrix}) d s . \end{matrix}

Substituting

t = 1 / σ

,

ζ = (π - \bar{π}) / σ

,

ζ = (Y - \bar{Y}) / σ

,

x (π, Y, σ) = \hat{x} (ζ, U, t)

, and

\tilde{π} (π, Y, σ) = \hat{π} (ζ, U, t) = π - σ h (x (ζ, U, t))

,

\tilde{Y} (π, Y, σ) = \hat{Y} (ζ, U, t) = Π_{S_{+}^{p}} (Y + σ g (\hat{x} (ζ, U, t)))

. We obtain the desired result. □

Theorem 3.

Assume that

(\bar{x}, \bar{π}, \bar{Y}) \in R^{n} \times R^{q} \times S^{p}

satisfies the Jacobian uniqueness y conditions,

σ^{*} > 0

, δ and ε are the constants given by Theorem 1. Suppose that

{\bar{σ}}^{*} > max \{σ^{*}, 2 max | λ (\nabla^{2} p (0)) |\} .

(52)

Then, there exists a scalar

{\bar{δ}}_{1} \in (0, δ]

such that if

(π^{0}, Y^{0})

and

{σ_{k}}

satisfy

∥(\begin{matrix} π^{0} - \bar{π} \\ Y^{0} - \bar{Y} \end{matrix})∥ / σ_{0} < {\bar{δ}}_{1}, {\bar{σ}}^{*} \leq σ_{k} \leq σ_{k + 1}, \forall k = 0, 1, 2, \dots,

(53)

then the sequence

{(π^{k}, Y^{k})}

generated by

\begin{matrix} π^{k + 1} = π^{k} + σ_{k} h (x (π^{k}, Y^{k}, σ_{k}), y (π^{k}, Y^{k}, σ_{k})), \\ Y^{k + 1} = Π_{S_{+}^{p}} (Y^{k} + σ_{k} g (x (π^{k}, Y^{k}, σ_{k}), y (π^{k}, Y^{k}, σ_{k}))) \end{matrix}

(54)

is well-defined, and

(π^{k}, Y^{k}) \to (\bar{π}, \bar{Y})

and

(x (π^{k}, σ_{k}), y (π^{k}, σ_{k})) \to (\bar{x}, \bar{Y})

. Furthermore if

\underset{k \to \infty}{lim sup} σ_{k} = σ_{*} < \infty

and

(π^{k}, Y^{k}) \neq (\bar{π}, \bar{Y})

for all k, then

\underset{k \to \infty}{lim sup} \frac{∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥}{∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥} \leq max |\frac{λ (\nabla^{2} p (0))}{σ_{*} + λ (\nabla^{2} p (0))}|,

(55)

while if

σ_{k} \to \infty

and

(π^{k}, Y^{k}) \neq (\bar{π}, \bar{Y})

for all k, then

lim_{k \to \infty} \frac{∥ (π^{k + 1}, Y^{k + 1}) - (\bar{π}, \bar{Y}) ∥}{∥ (π^{k}, Y^{k}) - (\bar{π}, \bar{Y}) ∥} = 0 .

(56)

Proof.

In view of

Υ_{σ}

of (46), we have that

\begin{matrix} Υ_{σ} (\bar{π}, \bar{Y}) = [\begin{matrix} I_{q} & 0 \\ 0 & D Π_{S_{+}^{p}} (σ g (\bar{x}) + \bar{Y}) \end{matrix}] \\ - σ [\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (σ g (\bar{x}) + \bar{Y}) D g (\bar{x}) \end{matrix}] \nabla^{2} L_{σ} {(\bar{x}, \bar{π}, \bar{Y})}^{- 1} {[\begin{matrix} J h (\bar{x}) \\ D Π_{S_{+}^{p}} (σ g (\bar{x}) + \bar{Y}) D g (\bar{x}) \end{matrix}]}^{*} . \end{matrix}

Using (44), we obtain

Υ_{σ} (\bar{π}, \bar{Y}) = I - σ {(\nabla^{2} p (0) + σ I)}^{- 1}

and thus for an eigenvalue

λ (Υ_{σ}))

, one has

λ (Υ_{σ} (\bar{π}, \bar{Y})) = \frac{λ (\nabla^{2} p (0))}{σ + λ (\nabla^{2} p (0))},

where

λ (\nabla^{2} p (0))

denotes the corresponding eigenvalue of

\nabla^{2} p (0)

. It is obvious that for any

ε_{1} > 0

, there exists

{\bar{δ}}_{1} \in (0, δ]

such that for all

(π, Y, σ)

with

σ \geq σ^{*}

and

∥ (π - \bar{π}, Y - \bar{Y}) ∥ / σ < {\bar{δ}}_{1}

we have

∥ Υ_{σ} (π, Y) ∥ \leq ∥ Υ_{σ} (\bar{π}, \bar{Y}) ∥ + ε_{1} = max | λ (Υ_{σ} (\bar{π}, \bar{Y})) | + ε_{1} = max |\frac{λ (\nabla^{2} p (0))}{σ + λ (\nabla^{2} p (0))}| + ε_{1},

where

∥ \cdot ∥

denotes the spectral norm of the operator. Using (45) for all

(π, Y, σ)

chosen as in the above, we have

∥(\begin{matrix} \tilde{π} (π, Y, σ) - \bar{π} \\ \tilde{Y} (π, Y, σ) - \bar{Y} \end{matrix})∥ \leq (max |\frac{λ (\nabla^{2} p (0))}{σ + λ (\nabla^{2} p (0))}| + ε_{1}) ∥(\begin{matrix} π - \bar{π} \\ Y - \bar{Y} \end{matrix})∥ .

(57)

From (52) and (53), we have

max |\frac{λ (\nabla^{2} p (0))}{σ + λ (\nabla^{2} p (0))}| < 1 .

Thus, by choosing a sufficiently small

ε_{1}

, we can determine that there exists

ϱ_{1} \in (0, 1)

such that

∥(\begin{matrix} \tilde{π} (π, Y, σ) - \bar{π} \\ \tilde{Y} (π, Y, σ) - \bar{Y} \end{matrix})∥ \leq ϱ_{1} ∥(\begin{matrix} π - \bar{π} \\ Y - \bar{Y} \end{matrix})∥

for

(π, Y, σ)

with

σ \geq {\bar{σ}}^{*}

and

∥ (π, Y) - (\bar{π}, \bar{Y}) ∥ / σ < δ_{1}

. Combining this with (19) and (53), we obtain that

π^{k} \to \bar{π}

and

(x (π^{k}, σ_{k}), y (π^{k}, σ_{k})) \to (\bar{x}, \bar{Y})

. The estimates (55) and (56) for the convergence rate can be easily obtained directly from (57). □

5. Conclusions

In this paper, we have studied the convergence rate of the augmented Lagrangian method for the nonlinear semidefinite optimization problem. We have proven the local linear rate of convergence of the sequence of multipliers and that the ratio constant is proportional to

1 / σ

when

σ

exceeds a threshold

σ^{*} > 0

and the ratio

∥ (π, Y) - (π^{*}, Y^{*}) ∥ / σ

is sufficiently small. Importantly, based on the second-order derivative of the perturbation function of the nonlinear semidefinite optimization problem, we have obtained an accurate estimation for the rate constant of the linear convergence of multiplier vectors generated by the augmented Lagrange method, which shows that the sequence of multipliers is superlinear convergent if

{σ_{k}}

is increasing to ∞.

There are many unsolved problems left in the augmented Lagrange method for nonlinear semidefinite optimization problems. First, in Theorem 1, the result on the convergence rate of the augmented Lagrange method is obtained when the subproblems are exactly solved. A natural problem is how to analyze the convergence rate of the ALM when the subproblems are solved inexactly. Second, all results in this paper are about local convergence of the augmented Lagrange method, global convergent augmented Lagrangian methods are worth studying. Third, for estimating the rate constant of linear convergence, we need the strict complementarity condition; this is a critical condition. What about the convergence properties of the augmented Lagrange method when this condition does not hold?

Author Contributions

Methodology, Y.Z., J.W. and H.L.; validation, J.W.; formal analysis, Y.Z. and J.Z.; writing—original draft preparation, Y.Z. and H.L.; writing—review and editing, J.W., J.Z. and H.L.; funding acquisition, Y.Z. and J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China under project number 2022YFA1004000 and National Natural Science Foundation of China (Nos. 12201097 and 12071055).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Powell, M.J.D. A method for nonlinear constraints in minimization problems. In Optimization; Academic Press: London, UK, 1969; pp. 283–298. [Google Scholar]
Hestenes, M.R. Multiplier and gradient methods. J. Optim. Theory Appl. 1969, 4, 303–320. [Google Scholar] [CrossRef]
Rockafellar, R.T. A dual approach to solving nonlinear programming problems by unconstrained optimization. Math. Program. 1973, 5, 354–373. [Google Scholar] [CrossRef]
Rockafellar, R.T. The multiplier method of Hestenes and Powell applied to convex programming. J. Optim. Theory Appl. 1973, 12, 555–562. [Google Scholar] [CrossRef]
Rockafellar, R.T. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Math. Oper. Res. 1976, 1, 97–116. [Google Scholar] [CrossRef]
Bertsekas, D.P. Constrained Optimization and Lagrange Multiplier Methods; Academic Press: Cambridge, MA, USA, 1982. [Google Scholar]
Ito, K.; Kunisch, K. The augmented Lagrangian method for equality and inequality constraints in Hilbert spaces. Math. Program. 1990, 46, 341–360. [Google Scholar] [CrossRef]
Conn, A.R.; Gould, N.I.M.; Toint, P.L. A globally convergent augmented Lagrangian algorithm for optimization with general constraints and simple bounds. SIAM J. Numer. Anal. 1991, 28, 545–572. [Google Scholar] [CrossRef]
Contesse-Becker, L. Extended convergence results for the method of multipliers for non-strictly binding inequality constraints. J. Optim. Theory Appl. 1993, 79, 273–310. [Google Scholar] [CrossRef]
Sun, D.F.; Sun, J.; Zhang, L.W. The rate of convergence of the augmented Lagrangian method for nonlinear semidefinite programming. Math. Program. 2008, 114, 349–391. [Google Scholar] [CrossRef]
Hang, N.T.V.; Sarabi, M.E. local convergence analysis of augmented lagrangian methods for piecewise linear-quadratic composite optimization problems. SIAM J. Optim. 2021, 31, 2665–2694. [Google Scholar] [CrossRef]
Kočvara, M.; Stingl, M. PENNON: A code for convex nonlinear and semidefinite programming. Optim. Methods Softw. 2003, 18, 317–333. [Google Scholar] [CrossRef]
El Yazidi, Y.; Ellabib, A. Augmented Lagrangian approach for a bilateral free boundary problem. J. Appl. Math. Comput. 2021, 67, 69–88. [Google Scholar] [CrossRef]
Debreu, G. Definite and semidefinite quadratic forms. Econometrica 1952, 20, 295–300. [Google Scholar] [CrossRef]
Zhang, F.Z. The Schur Complement and Its Application; Springer: Berlin/Heidelberg, Germany, 2005. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Wu, J.; Zhang, J.; Liu, H. Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem. Mathematics 2025, 13, 1946. https://doi.org/10.3390/math13121946

AMA Style

Zhang Y, Wu J, Zhang J, Liu H. Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem. Mathematics. 2025; 13(12):1946. https://doi.org/10.3390/math13121946

Chicago/Turabian Style

Zhang, Yule, Jia Wu, Jihong Zhang, and Haoyang Liu. 2025. "Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem" Mathematics 13, no. 12: 1946. https://doi.org/10.3390/math13121946

APA Style

Zhang, Y., Wu, J., Zhang, J., & Liu, H. (2025). Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem. Mathematics, 13(12), 1946. https://doi.org/10.3390/math13121946

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Characterization of the Convergence Rate of the Augmented Lagrange for the Nonlinear Semidefinite Optimization Problem

Abstract

1. Introduction

2. Properties of the Augmented Lagrangian

3. The Convergence Rate of the Augmented Lagrange Method

4. Asymptotical Superlinear Convergence of Multipliers

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI