Adaptive Douglas–Rachford Algorithms for Biconvex Optimization Problem in the Finite Dimensional Real Hilbert Spaces

Ming-Shr Lin; Chih-Sheng Chuang

doi:10.3390/math12233785

and

¹

Department of Risk Management and Insurance, Feng Chia University, Taichung 407102, Taiwan

²

Department of Applied Mathematics, National Chiayi University, Chiayi 600355, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics2024, 12(23), 3785;https://doi.org/10.3390/math12233785

This article belongs to the Section E: Applied Mathematics

Version Notes

Order Reprints

Abstract

In this paper, we delve into the realm of biconvex optimization problems, introducing an adaptive Douglas–Rachford algorithm and presenting related convergence theorems in the setting of finite-dimensional real Hilbert spaces. It is worth noting that our approach to proving the convergence theorem differs significantly from those in the literature.

Keywords:

biconvex optimization problem; regularized optimization problem; Douglas–Rachford algorithm; adaptive algorithm

MSC:

65K05; 49M37; 90C26

1. Introduction

In science and engineering, convex optimization has been extensively studied and applied. For convex optimization problems, any local minimum is also a global minimum, simplifying the search for optimal solutions. In parameter estimation, particularly in the context of system identification, convexity ensures the convergence of estimates to the true parameters. However, some identification problems cannot always be formulated as convex optimization problems. For instance, the identification of block-oriented nonlinear systems often leads to a biconvex optimization problem rather than a convex one. Unlike convex optimization, biconvex optimization may have numerous local minima. Nevertheless, it exhibits convex substructures, as a biconvex optimization problem can be divided into multiple convex optimization subproblems. These substructures can be effectively utilized to solve the entire biconvex optimization problem.

From the literature [1], we observe that many optimization problems are multi-convex programming, and many published studies on practical multi-convex programming focus on special practical models, like [2,3,4,5,6,7,8,9,10,11,12,13]. In particular, Wen, Yin and Zhang [12] pointed out that multi-convex programming is an NP-hard problem.

Let

H_{1}

and

H_{2}

be two real Hilbert spaces, and let

f : H_{1} \times H_{2} \to R \cup {+ \infty}

be an extended function. Then f is called a biconvex function if

x \to f (x, y)

is convex for each

y \in H_{2}

, and

y \to f (x, y)

is convex for each

x \in H_{1}

. Thus, every convex function is also biconvex, but a biconvex function may still be nonconvex.

The following is a type of biconvex optimization problem, also referred to as a block optimization problem.

(BOP) {argmin}_{x \in H_{1}, y \in H_{2}} J (x, y) = f (x) + g (y) + h (x, y),

where

H_{1}

and

H_{2}

are real Hilbert spaces,

h : H_{1} \times H_{2} \to R

is a block biconvex function, and

f : H_{1} \to R

and

g : H_{2} \to R

are convex functions. Here, f and g are called the regularization functions of (BOP). In general, f and g could be supposed as Fréchet differentiable. It is important to note that the function

(x, y) \to f (x) + g (y) + h (x, y)

can be nonconvex, even if f and g are convex functions.

The standard approach to solving the biconvex optimization problem is via the so-called Gauss–Seidel iteration scheme, popularized in the modern era under the name alternating minimization. Indeed, this method could be called the block coordinate descent algorithm (BCD).

(BCD) \{\begin{matrix} x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ x_{k + 1} \in {argmin}_{x \in H_{1}} f (x) + g (y_{k}) + h (x, y_{k}), \\ y_{k + 1} \in {argmin}_{y \in H_{2}} f (x_{k + 1}) + g (y) + h (x_{k + 1}, y), k \in N . \end{matrix}

In 1992, the proximal BCD algorithm was proposed by Auslender [14] to relax the requirements of the (BCD) convergence theorem.

(proximal BCD) \{\begin{matrix} choose τ > 0 and σ > 0, \\ x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ x_{k + 1} \in {argmin}_{x \in H_{1}} f (x) + g (y_{k}) + h (x, y_{k}) + \frac{1}{2 τ} | | x - x_{k} {| |}^{2}, \\ y_{k + 1} \in {argmin}_{y \in H_{2}} f (x_{k + 1}) + g (y) + h (x_{k + 1}, y) + \frac{1}{2 σ} | | y - y_{k} {| |}^{2}, k \in N . \end{matrix}

In 2013, Xu and Yin [13] gave the proximal linearized BCD.

\{\begin{matrix} choose τ_{k} > 0 and σ_{k} > 0, \\ x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ x_{k + 1} \in {argmin}_{x \in H_{1}} f (x) + \frac{1}{2 τ_{k}} | | x - x_{k} {| |}^{2} + ⟨ x, \nabla_{x} h (x_{k}, y_{k}) ⟩, \\ y_{k + 1} \in {argmin}_{y \in H_{2}} g (y) + \frac{1}{2 σ_{k}} | | y - y_{k} {| |}^{2} + ⟨ y, \nabla_{y} h (x_{k + 1}, y_{k}) ⟩, k \in N . \end{matrix}

In 2014, Botle et al. [3] gave the proximal alternating linearized minimization.

(PALM) \{\begin{matrix} r_{1} > 1, r_{2} > 1, \\ x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ c_{k} = r_{1} L_{1} (y_{k}), \\ x_{k + 1} \in p r o x_{c_{k}}^{f} (x_{k} - \frac{1}{c_{k}} \nabla_{x} h (x_{k}, y_{k})), \\ d_{k} = r_{2} L_{2} (x_{k + 1}), \\ x_{k + 1} \in p r o x_{d_{k}}^{g} (y_{k} - \frac{1}{d_{k}} \nabla_{y} h (x_{k + 1}, y_{k})), k \in N . \end{matrix}

In 2019, Nikolova and Tan [10] gave the alternating structure-adapted proximal gradient descent algorithm.

(ASAP) \{\begin{matrix} choose τ \in (0, \frac{2}{L i p (\nabla_{f})}), σ \in (0, \frac{2}{L i p (\nabla_{g})}), \\ x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ x_{k + 1} \in {argmin}_{x \in H_{1}} H (x, y_{k}) + \frac{1}{2 τ} | | x - x_{k} {| |}^{2} + ⟨ x, \nabla f (x_{k}) ⟩, \\ y_{k + 1} \in {argmin}_{y \in H_{2}} H (x_{k + 1}, y) + \frac{1}{2 σ} | | y - y_{k} {| |}^{2} + ⟨ y, \nabla g (y_{k}) ⟩, k \in N . \end{matrix}

In fact, the algorithm of ASAP is equivalent to the following.

\{\begin{matrix} choose τ \in (0, \frac{2}{L i p (\nabla_{f})}), σ \in (0, \frac{2}{L i p (\nabla_{g})}), \\ x_{1} \in H_{1} and y_{1} \in H_{2} are given arbitrarily, \\ x_{k + 1} = {argmin}_{x \in H_{1}} H (x, y_{k}) + \frac{1}{2 τ} | | x - x_{k} {| |}^{2} + ⟨ x - x_{k}, \nabla f (x_{k}) ⟩ + f (x_{k}), \\ y_{k + 1} \in {argmin}_{y \in H_{2}} H (x_{k + 1}, y) + \frac{1}{2 σ} | | y - y_{k} {| |}^{2} + ⟨ y - y_{k}, \nabla g (y_{k}) ⟩ + g (y_{k}), k \in N . \end{matrix}

Therefore, the algorithm of ASAP can be written in the following form.

\{\begin{matrix} choose τ \in (0, \frac{2}{L i p (\nabla_{f})}), σ \in (0, \frac{2}{L i p (\nabla_{g})}), \\ x_{k + 1} = p r o x_{τ}^{h (\cdot, y_{k})} (x_{k} - τ \nabla f (x_{k})), \\ y_{k + 1} = p r o x_{σ}^{h (x_{k + 1}, \cdot)} (y_{k} - σ \nabla g (y_{k})), k \in N . \end{matrix}

On the other hand, the following is a well-known generalized convex optimization problem, and related Douglas–Rachford algorithm.

arg min {φ (x) + g (x) : x \in H},

where H is a real Hilbert space, and

φ : H \to (- \infty, \infty]

and

g : H \to (- \infty, \infty]

are proper, lower semicontinuous, and convex functions (see Algorithm 1).

Algorithm 1: ([15], Corollary 27.4)

Let

{x_{n}}_{n \in N}

be generated by the following.

\{\begin{matrix} x_{1} \in H_{1} is chosen arbitrarily, \\ y_{n} : = p r o x_{β, φ} (x_{n}), \\ z_{n} : = p r o x_{β, g} (2 y_{n} - x_{n}), \\ x_{n + 1} : = x_{n} + k_{n} (z_{n} - y_{n}), n \in N, \end{matrix}

where

φ, g : H \to (- \infty, \infty]

are proper, lower semicontinuous, and convex functions,

β

is a positive real number,

{k_{n}}_{n \in N}

is a sequence of positive real numbers,

p r o x_{β, φ}

and

p r o x_{β, g}

are proximal operators of g and h with

β

, respectively.

Indeed, it is well-known that the Douglas–Rachford algorithm is widely used for convex optimization problems and those involving convexity assumptions. Additionally, despite the lack of theoretical justification, the literature shows that the algorithm has been successfully applied to various practical non-convex problems [16,17]. For further details on the Douglas–Rachford and Peaceman–Rachford algorithms, please refer to [18,19,20,21,22] and related references.

Inspired by the above work, we propose the adaptive Douglas–Rachford algorithm to study the biconvex optimization problem in the finite dimensional real Hilbert spaces.

Remark 1.

In Algorithm 2, if

{l_{n}}_{n \in N}

is given by Step 3-1, then this algorithm could be called the Douglas–Rachford algorithm. But, when

{l_{n}}_{n \in N}

could be given by Step 3-2, we call this algorithm the adaptive Douglas–Rachford algorithm.

In this paper, we study the biconvex optimization problem and give an adaptive Douglas–Rachford algorithm and related convergence theorems in the setting of finite dimensional real Hilbert spaces. It is worth noting that our approach to proving the convergence theorem differs significantly from those in the literature.

Algorithm 2: Adaptive Douglas–Rachford Algorithm

Let $τ > 0$ , and ${l_{n}}_{n \in N}$ be a sequence in $(0, 2)$ , and $[a, b] \subseteq (0, 2)$ with $a + b < 2$ , and $x_{1}$ and $y_{1}$ be given, and let ${x_{n}}_{n \in N}$ , ${y_{n}}_{n \in N}$ , ${u_{n}}_{n \in N}$ , and ${v_{n}}_{n \in N}$ be generated as follows.
Step 1. $u_{n} = p r o x_{τ}^{f} (x_{n})$ and $v_{n} = p r o x_{τ}^{g} (y_{n})$ .
Step 2. $s_{n} = p r o x_{τ}^{h (\cdot, v_{n})} (2 u_{n} - x_{n})$ and $t_{n} = p r o x_{τ}^{h (s_{n}, \cdot)} (2 v_{n} - y_{n})$ .
Step 3. $l_{n}$ is selected through the following steps.
Step 3-1. If $f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) \geq 0$ , then we choose $l_{n} \in (a, b)$ arbitrarily.
Step 3-2. If $f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) < 0$ , then we set $p_{n}$ as

$p_{n} = \frac{a^{2}}{2 τ} \cdot (\frac{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}]}{f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})}) .$
(i) If $p_{n} \geq b$ , then we choose $l_{n} \in (a, b)$ arbitrarily.
(ii) If $a \leq p_{n} < b$ , then we set $l_{n} = p_{n}$ .
(iii) If $p_{n} < a$ and $| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} \geq 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]$ , then we set $l_{n} = 0.5$ .
(iv) If $p_{n} < a$ and $| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} < 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]$ , then we set $l_{n}$ as

$l_{n} = min \{b, \frac{2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]}{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}} - \frac{a}{2}\} .$
Step 4. $x_{n + 1} = x_{n} + l_{n} (s_{n} - u_{n})$ and $y_{n + 1} = y_{n} + l_{n} (t_{n} - v_{n})$ . Set $n = n + 1$ and go to Step 1.

2. Preliminaries

Let H be a real Hilbert space with inner product

⟨ \cdot, \cdot ⟩

and norm

| | \cdot | |

. We denote the strong and weak convergence of

{x_{n}}_{n \in N}

to

x \in H

by

x_{n} \to x

and

x_{n} ⇀ x

, respectively. For each

x, y, u, v \in H

, and

λ \in R

, we have

| | x + {y | |}^{2} = {| | x | |}^{2} + 2 ⟨ x, y ⟩ + {| | y | |}^{2},

(1)

| | λ x + {(1 - λ) y | |}^{2} = {λ | | x | |}^{2} + {(1 - λ) | | y | |}^{2} - λ (1 - λ) | | x - {y | |}^{2},

(2)

2 ⟨ x - y, u - v ⟩ = | | x - {v | |}^{2} + | | y - {u | |}^{2} - | | x - {u | |}^{2} - | | y - {v | |}^{2} .

(3)

Definition 1.

Let H be a real Hilbert space,

B : H \to H

be a mapping, and

ρ > 0

. Thus,

(i): B is monotone if $⟨ x - y, B x - B y ⟩ \geq 0$ for all $x, y \in H$ .
(ii): B is ρ-strongly monotone if $⟨ x - y, B x - B y ⟩ \geq ρ | | x - {y | |}^{2}$ for all $x, y \in H$ .

Definition 2.

Let H be a real Hilbert space, and

B : H ⊸ H

be a set-valued mapping with domain

D (B) : = {x \in H : B (x) \neq Ø}

. Thus,

(i): B is monotone if $⟨ u - v, x - y ⟩ \geq 0$ for any $u \in B (x)$ and $v \in B (y)$ .
(ii): B is maximal monotone if its graph ${(x, y) : x \in D (B), y \in B (x)}$ is not properly contained in the graph of any other monotone mapping.
(iii): B is ρ-strongly monotone ( $ρ > 0$ ) if $⟨ x - y, u - v ⟩ \geq ρ | | x - {y | |}^{2}$ for all $x, y \in H$ , and all $u \in B (x)$ , and $v \in B (y)$ .

Definition 3.

Let H be a real Hilbert space, and

f : H \to R

be a function. Thus,

(i): f is proper if ${x \in H : f (x) < \infty} \neq Ø$ .
(ii): f is lower semicontinuous if ${x \in H : f (x) \leq r}$ is closed for each $r \in R$ .
(iii): f is convex if $f (t x + (1 - t) y) \leq t f (x) + (1 - t) f (y)$ for every $x, y \in H$ and $t \in [0, 1]$ .
(iv): f is ρ-strongly convex ( $ρ > 0$ ) if

$f (t x + (1 - t) y) + \frac{ρ}{2} \cdot t (1 - t) | | x - {y | |}^{2} \leq t f (x) + (1 - t) f (y)$

for all $x, y \in H$ and $t \in (0, 1)$ .
(v): f is Gâteaux differentiable at $x \in H$ if there is $\nabla f (x) \in H$ such that

$lim_{t \to 0} \frac{f (x + t y) - f (x)}{t} = ⟨ y, \nabla f (x) ⟩$

for each $y \in H$ .
(vi): f is Fréchet differentiable at x if there is $\nabla f (x)$ such that

$lim_{y \to 0} \frac{f (x + y) - f (x) - ⟨ \nabla f (x), y ⟩}{| | y | |} = 0 .$

Remark 2.

Let H be a real Hilbert space, and

f : H \to R

be a function. Then f is a convex function if and only if

g (x) = f (x) + \frac{ρ}{2} \cdot {| | x | |}^{2}

is a ρ-strongly convex function ([15], Proposition 10.6). Hence, it is easy to establish the relation between convex functions and strongly convex functions.

Definition 4.

Let

f : H \to (- \infty, \infty]

be a proper lower semicontinuous and convex function. Then the subdifferential

\partial f

of f is defined by

\partial f (x) : = {x^{*} \in H : f (x) + ⟨ y - x, x^{*} ⟩ \leq f (y) for each y \in H}

for each

x \in H

.

Lemma 1

([15,23]). Let

f : H \to (- \infty, \infty]

be a proper lower semicontinuous and convex function. Then the following is satisfied.

(i): $\partial f$ is a set-valued maximal monotone mapping.
(ii): f is Gâteaux differentiable at $x \in int (D)$ if and only if $\partial f (x)$ consists of a single element. That is, $\partial f (x) = {\nabla f (x)}$ .
(iii): Suppose that f is Fréchet differentiable. Then f is convex if and only if $\nabla f$ is a monotone mapping.

Lemma 2

([15], Example 22.3(iv)). Let

ρ > 0

, H be a real Hilbert space, and

f : H \to R

be a proper lower-semicontinuous and convex function. If f is ρ-strongly convex, then

\partial f

is ρ-strongly monotone.

Lemma 3

([15], Proposition 16.26). Let H be a real Hilbert space, and

f : H \to (\infty, \infty]

be a proper lower semicontinuous and convex function. If

{u_{n}}_{n \in N}

and

{x_{n}}_{n \in N}

are sequences in H with

u_{n} \in \partial f (x_{n})

for all

n \in N

, and

x_{n} ⇀ x

and

u_{n} \to u

, then

u \in \partial f (x)

.

Lemma 4

([24]). Let H be a real Hilbert space,

B : H ⊸ H

be a set-valued maximal monotone mapping,

β > 0

, and

J_{β}^{B}

be defined by

J_{β}^{B} (x) : = {(I + β B)}^{- 1} (x)

for each

x \in H

. Then

J_{β}^{B}

is a single-valued mapping.

Definition 5.

Let

τ > 0

, H be a real Hilbert space, and

g : H \to R

be a proper lower-semicontinuous and convex function. Then the proximal operator of g with τ is defined by

p r o x_{τ}^{g} (x) : = {argmin}_{v \in H} {g (v) + \frac{1}{2 τ} | | v - {x | |}^{2}}

for each

x \in H

.

Lemma 5.

Let

f : H \to R

be a proper, lower semicontinuous, and convex function. Assume

τ > 0

and

\bar{x} = p r o x_{τ}^{f} (x)

, we have

f (u) \geq f (\bar{x}) - \frac{1}{τ} \cdot ⟨ u - \bar{x}, \bar{x} - x ⟩

for each

u \in H

.

3. Main Results

We are interested in solving nonconvex minimization problems of the form

(B O P) {argmin}_{x \in H_{1}, y \in H_{2}} J (x, y) = f (x) + g (y) + h (x, y),

where

H_{1}

and

H_{2}

are two real Hilbert spaces,

J : H_{1} \times H_{2} \to R

,

f : H_{1} \to R

,

g : H_{2} : \to R

, and

h : H_{1} \times H_{2} \to R

.

Here, if

(\bar{x}, \bar{y}) \in H_{1} \times H_{2}

is a solution of the problem (BOP), then

f (\bar{x}) + g (\bar{y}) + h (\bar{x}, \bar{y}) \leq f (x) + g (y) + h (x, y) for all (x, y) \in H_{1} \times H_{2},

(4)

and this implies that

f (\bar{x}) + g (\bar{y}) + h (\bar{x}, \bar{y}) \leq f (x) + g (\bar{y}) + h (x, \bar{y}),

and

f (\bar{x}) + g (\bar{y}) + h (\bar{x}, \bar{y}) \leq f (\bar{x}) + g (y) + h (\bar{x}, y)

for all

(x, y) \in H_{1} \times H_{2}

. That is,

\forall (x, y) \in H_{1} \times H_{2} : f (\bar{x}) + h (\bar{x}, \bar{y}) \leq f (x) + h (x, \bar{y}) & g (\bar{y}) + h (\bar{x}, \bar{y}) \leq g (y) + h (\bar{x}, y) .

(5)

This implies that

\forall (x, y) \in H_{1} \times H_{2} : f (\bar{x}) + g (\bar{y}) + 2 \cdot h (\bar{x}, \bar{y}) \leq f (x) + g (y) + h (x, \bar{y}) + h (\bar{x}, y) .

(6)

So, it is natural to give a condition:

{(B O P)}_{C} h (x, v) + h (u, y) = h (x, y) + h (u, v)

for all

x, u \in H_{1}

and

y, v \in H_{2}

. Indeed, if this condition

{(B O P)}_{C}

holds, then (4), (5), and (6) are equivalent.

Example 1.

Let

h : R \times R \to R

be defined as

h (x, y) = \{\begin{matrix} log | x | + log | y | & if & | x | > 1 and | y | > 1, \\ log | x | & if & | x | > 1 and | y | \leq 1, \\ log | y | & if & | x | \leq 1 and | y | > 1, \\ 0 & if & | x | \leq 1 and | y | \leq 1, \end{matrix}

for all

(x, y) \in R \times R

. Then h satisfies the condition

{(B O P)}_{C}

.

Assumption 1.

Assume that:

(i): f and g are proper lower semicontinuous and convex functions;
(ii): h is continuous, and $x \to h (x, y)$ and $y \to h (x, y)$ are proper and convex functions;
(iii): J is lower bounded;
(iv): $h (x, v) + h (u, y) = h (x, y) + h (u, v)$ for all $x, u \in H_{1}$ and $y, v \in H_{2}$ .

For a fixed y, the partial subdifferential of

J (\cdot, y)

at x is denoted by

\partial_{x} J (x, y)

. For a fixed x, the partial subdifferential of

J (x, \cdot)

at y is denoted by

\partial_{y} J (x, y)

. Hence, for

J (x, y) = f (x) + g (y) + h (x, y)

, we have

\partial J (x, y) = \partial_{x} J (x, y) \times \partial_{y} J (x, y) = (\partial f (x) + \partial_{x} h (x, y)) \times (\partial g (y) + \partial_{y} h (x, y)) .

Fermat’s rule, extended to nonconvex and nonsmooth function, is given next.

Proposition 1

([25], Theorem 10.1). Let

f : R^{m} \to R \cup {+ \infty}

be a proper function. If f has a local minimum at

\bar{x}

, then

0 \in \partial f (\bar{x})

.

Definition 6.

We say that

(\bar{x}, \bar{y})

is a critical point of J if

(0, 0) \in \partial J (\bar{x}, \bar{y})

. For simplicity, the set of the critical points of J is denoted by

crit (J)

.

Remark 3.

We know

\begin{matrix} (0, 0) \in \partial J (\bar{x}, \bar{y}) \\ \Leftrightarrow & (0, 0) \in \partial_{x} J (\bar{x}, \bar{y}) \times \partial_{y} J (\bar{x}, \bar{y}) \\ \Leftrightarrow & 0 \in \partial f (\bar{x}) + \partial_{x} h (\bar{x}, \bar{y}) & 0 \in \partial g (\bar{y}) + \partial_{y} h (\bar{x}, \bar{y}) \\ \Leftrightarrow & \forall (x, y) \in H_{1} \times H_{2} : f (\bar{x}) + h (\bar{x}, \bar{y}) \leq f (x) + h (x, \bar{y}) & g (\bar{y}) + h (\bar{x}, \bar{y}) \leq g (y) + h (\bar{x}, y) . \end{matrix}

Hence, if the condition

{(B O P)}_{C}

is satisfied, then

(0, 0) \in \partial J (\bar{x}, \bar{y})

if and only if

(\bar{x}, \bar{y})

is a solution of the problem

(B O P)

.

Here, we consider the first part of Algorithm 2.

Remark 4.

In Algorithm 3, for each

x \in H_{1}

and

y \in H_{2}

, we set

u = p r o x_{τ}^{f} (x)

and

v = p r o x_{τ}^{g} (y)

. Thus,

| | u_{n} - u {| |}^{2} \leq ⟨ x_{n} - x, u_{n} - u ⟩

and

| | v_{n} - v {| |}^{2} \leq ⟨ y_{n} - y, v_{n} - v ⟩

. Further, if

{x_{n}}_{n \in N}

and

{y_{n}}_{n \in N}

are bounded, then

{u_{n}}_{n \in N}

and

{v_{n}}_{n \in N}

are bounded.

Algorithm 3: Adaptive Douglas–Rachford Algorithm (Part 1)

Let

τ > 0

, and

{l_{n}}_{n \in N}

be a sequence in

(0, 2)

, and

[a, b] \subseteq (0, 2)

with

a + b < 2

, and

x_{1}

and

y_{1}

be given, and let

{x_{n}}_{n \in N}

,

{y_{n}}_{n \in N}

,

{u_{n}}_{n \in N}

, and

{v_{n}}_{n \in N}

be generated as

\{\begin{cases} u_{n} = p r o x_{τ}^{f} (x_{n}), \\ v_{n} = p r o x_{τ}^{g} (y_{n}), \\ s_{n} = p r o x_{τ}^{h (\cdot, v_{n})} (2 u_{n} - x_{n}), \\ t_{n} = p r o x_{τ}^{h (s_{n}, \cdot)} (2 v_{n} - y_{n}), \\ x_{n + 1} = x_{n} + l_{n} (s_{n} - u_{n}), \\ y_{n + 1} = y_{n} + l_{n} (t_{n} - v_{n}), n \in N . \end{cases}

Proof.

Since

u_{n} = p r o x_{τ}^{f} (x_{n}) = {(I + τ \partial f)}^{- 1} (x_{n})

and

u = p r o x_{τ}^{f} (x)

, we have

\frac{1}{τ} (x_{n} - u_{n}) \in \partial f (u_{n}) and \frac{1}{τ} (x - u) \in \partial f (u)

(7)

Since

\partial f

is maximal monotone, we know

⟨ (x_{n} - u_{n}) - (x - u), u_{n} - u ⟩ \geq 0,

(8)

and this implies that

| | u_{n} - {u | |}^{2} \leq ⟨ x_{n} - x, u_{n} - u ⟩ \leq | | u_{n} - u | | \cdot | | x_{n} - x | | .

(9)

Similarly, we have

\frac{1}{τ} (y_{n} - v_{n}) \in \partial g (v_{n}) and \frac{1}{τ} (y - v) \in \partial g (v),

(10)

and

| | v_{n} - {v | |}^{2} \leq ⟨ y_{n} - y, v_{n} - v ⟩ \leq | | v_{n} - v | | \cdot | | y_{n} - y | | .

(11)

Further, it is easy to see

{u_{n}}_{n \in N}

and

{v_{n}}_{n \in N}

are bounded when

{x_{n}}_{n \in N}

and

{y_{n}}_{n \in N}

are bounded. □

Lemma 6.

Let

{x_{n}}_{n \in N}

,

{y_{n}}_{n \in N}

,

{u_{n}}_{n \in N}

, and

{v_{n}}_{n \in N}

be generated from Algorithm 3. Then, for each

x \in H_{1}

and

y \in H_{2}

, and

n \in N

, we have

\begin{matrix} | | x_{n + 1} - x {| |}^{2} \leq & | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} \\ - 2 l_{n} τ [f (u_{n}) - f (x) + h (s_{n}, v_{n}) - h (x, v_{n})], \end{matrix}

(12)

and

\begin{matrix} | | y_{n + 1} - y {| |}^{2} \leq & | | y_{n} - {y | |}^{2} - l_{n} (2 - l_{n}) | | t_{n} - v_{n} {| |}^{2} \\ - 2 l_{n} τ [g (v_{n}) - g (y) + h (s_{n}, t_{n}) - h (s_{n}, y)] . \end{matrix}

(13)

Proof.

Take any

x \in H_{1}

and

y \in H_{2}

, and let

x, y

be fixed. First, it follows from

u_{n} = p r o x_{τ}^{f} (x_{n})

and Lemma 5 that

f (x) \geq f (u_{n}) - \frac{1}{τ} \cdot ⟨ x - u_{n}, u_{n} - x_{n} ⟩,

(14)

and this implies that

| | u_{n} - {x | |}^{2} + 2 τ [f (u_{n}) - f (x)] \leq | | x_{n} - {x | |}^{2} - | | x_{n} - u_{n} {| |}^{2} .

(15)

Similar to (15), we have

| | v_{n} - {y | |}^{2} + 2 τ [g (v_{n}) - g (y)] \leq | | y_{n} - {y | |}^{2} - | | y_{n} - v_{n} {| |}^{2} .

(16)

Next, it follows from

s_{n} = p r o x_{τ}^{h (\cdot, v_{n})} (2 u_{n} - x_{n})

and Lemma 5 that

h (x, v_{n}) \geq h (s_{n}, v_{n}) - \frac{1}{τ} \cdot ⟨ x - s_{n}, s_{n} - (2 u_{n} - x_{n}) ⟩,

(17)

and this implies that

2 τ [h (s_{n}, v_{n}) - h (x, v_{n})] \leq 2 ⟨ x - s_{n}, s_{n} - (2 u_{n} - x_{n}) ⟩ .

(18)

Here, we know

\begin{matrix} 2 ⟨ x - s_{n}, s_{n} - (2 u_{n} - x_{n}) ⟩ \\ = & | | 2 u_{n} - x_{n} - {x | |}^{2} - | | s_{n} - {x | |}^{2} - | | 2 u_{n} - x_{n} - s_{n} {| |}^{2} \\ = & | | (u_{n} - x_{n}) + (u_{n} - x) {| |}^{2} - | | s_{n} - {x | |}^{2} - | | (u_{n} - x_{n}) + (u_{n} - s_{n}) {| |}^{2} \\ = & | | u_{n} - x_{n} {| |}^{2} + | | u_{n} - {x | |}^{2} + 2 ⟨ u_{n} - x_{n}, u_{n} - x ⟩ - | | s_{n} - x {| |}^{2} \\ - | | u_{n} - x_{n} {| |}^{2} - | | u_{n} - s_{n} {| |}^{2} - 2 ⟨ u_{n} - x_{n}, u_{n} - s_{n} ⟩ \\ = & | | u_{n} - {x | |}^{2} - | | s_{n} - {x | |}^{2} - | | u_{n} - s_{n} {| |}^{2} + 2 ⟨ u_{n} - x_{n}, s_{n} - x ⟩ \\ = & | | u_{n} - {x | |}^{2} - | | s_{n} - {x | |}^{2} - | | u_{n} - s_{n} {| |}^{2} + | | u_{n} - {x | |}^{2} + | | s_{n} - x_{n} {| |}^{2} \\ - | | u_{n} - s_{n} {| |}^{2} - | | x_{n} - x {| |}^{2} \\ = & 2 | | u_{n} - {x | |}^{2} - | | s_{n} - {x | |}^{2} - 2 | | u_{n} - s_{n} {| |}^{2} + | | s_{n} - x_{n} {| |}^{2} - | | x_{n} - x {| |}^{2} . \end{matrix}

(19)

By (18) and (19), we have

\begin{matrix} 2 τ [h (s_{n}, v_{n}) - h (x, v_{n})] \\ \leq & 2 | | u_{n} - {x | |}^{2} - | | s_{n} - {x | |}^{2} - 2 | | u_{n} - s_{n} {| |}^{2} + | | s_{n} - x_{n} {| |}^{2} - | | x_{n} - x {| |}^{2} . \end{matrix}

(20)

Similar to (18), we have

2 τ [h (s_{n}, t_{n}) - h (s_{n}, y)] \leq 2 ⟨ y - t_{n}, t_{n} - (2 v_{n} - y_{n}) ⟩ .

(21)

Hence, we know

\begin{matrix} 2 ⟨ y - t_{n}, t_{n} - (2 v_{n} - y_{n}) ⟩ \\ = & | | 2 v_{n} - y_{n} - {y | |}^{2} - | | t_{n} - {y | |}^{2} - | | 2 v_{n} - y_{n} - t_{n} {| |}^{2} \\ = & | | (v_{n} - y_{n}) + (v_{n} - y) {| |}^{2} - | | t_{n} - {y | |}^{2} - | | (v_{n} - y_{n}) + (v_{n} - t_{n}) {| |}^{2} \\ = & | | v_{n} - {y | |}^{2} - | | t_{n} - {y | |}^{2} - | | v_{n} - t_{n} {| |}^{2} + 2 ⟨ v_{n} - y_{n}, t_{n} - y ⟩ \\ = & 2 | | v_{n} - {y | |}^{2} - | | t_{n} - {y | |}^{2} - 2 | | v_{n} - t_{n} {| |}^{2} + | | y_{n} - t_{n} {| |}^{2} - | | y_{n} - y {| |}^{2} . \end{matrix}

(22)

By (21) and (22), we have

\begin{matrix} 2 τ [h (s_{n}, t_{n}) - h (s_{n}, y)] \\ \leq & 2 | | v_{n} - {y | |}^{2} - | | t_{n} - {y | |}^{2} - 2 | | v_{n} - t_{n} {| |}^{2} + | | y_{n} - t_{n} {| |}^{2} - | | y_{n} - y {| |}^{2} . \end{matrix}

(23)

Next, we have

\begin{matrix} | | x_{n + 1} - x {| |}^{2} \\ = & | | x_{n} + l_{n} (s_{n} - u_{n}) - x {| |}^{2} \\ = & | | (1 - \frac{l_{n}}{2}) (x_{n} - x) + \frac{l_{n}}{2} (x_{n} + 2 s_{n} - 2 u_{n} - x) {| |}^{2} \\ = & (1 - \frac{l_{n}}{2}) | | x_{n} - {x | |}^{2} + \frac{l_{n}}{2} | | x_{n} + 2 s_{n} - 2 u_{n} - {x | |}^{2} - \frac{l_{n}}{2} (1 - \frac{l_{n}}{2}) | | 2 s_{n} - 2 u_{n} {| |}^{2} \\ = & (1 - \frac{l_{n}}{2}) | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} \\ + \frac{l_{n}}{2} [| | x_{n} - {x | |}^{2} + 4 | | s_{n} - u_{n} {| |}^{2} + 4 ⟨ x_{n} - x, s_{n} - u_{n} ⟩] \\ = & | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} + 2 l_{n} | | s_{n} - u_{n} {| |}^{2} \\ + l_{n} | | x_{n} - u_{n} {| |}^{2} + l_{n} | | s_{n} - {x | |}^{2} - l_{n} | | x_{n} - s_{n} {| |}^{2} - l_{n} | | u_{n} - x {| |}^{2} . \end{matrix}

(24)

By (15), (20), and (24), we have

\begin{matrix} | | x_{n + 1} - x {| |}^{2} \\ = & | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} + 2 l_{n} | | s_{n} - u_{n} {| |}^{2} \\ + l_{n} | | x_{n} - u_{n} {| |}^{2} + l_{n} | | s_{n} - {x | |}^{2} - l_{n} | | x_{n} - s_{n} {| |}^{2} - l_{n} | | u_{n} - x {| |}^{2} \\ \leq & | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} + 2 l_{n} | | s_{n} - u_{n} {| |}^{2} \\ + l_{n} (| | x_{n} - {x | |}^{2} - | | u_{n} - {x | |}^{2} - 2 τ [f (u_{n}) - f (x)]) \\ + l_{n} (2 | | u_{n} - {x | |}^{2} - 2 | | u_{n} - s_{n} {| |}^{2} + | | s_{n} - x_{n} {| |}^{2} - | | x_{n} - x {| |}^{2} \\ - 2 τ [h (s_{n}, v_{n}) - h (x, v_{n})]) - l_{n} | | x_{n} - s_{n} {| |}^{2} - l_{n} | | u_{n} - x {| |}^{2} \\ = & | | x_{n} - {x | |}^{2} - l_{n} (2 - l_{n}) | | s_{n} - u_{n} {| |}^{2} \\ - 2 l_{n} τ [f (u_{n}) - f (x) + h (s_{n}, v_{n}) - h (x, v_{n})] . \end{matrix}

(25)

Similar to (24), we have

\begin{matrix} | | y_{n + 1} - y {| |}^{2} = & | | y_{n} - {y | |}^{2} - l_{n} (2 - l_{n}) | | t_{n} - v_{n} {| |}^{2} + 2 l_{n} | | t_{n} - v_{n} {| |}^{2} \\ + l_{n} | | y_{n} - v_{n} {| |}^{2} + l_{n} | | t_{n} - {y | |}^{2} - l_{n} | | y_{n} - t_{n} {| |}^{2} - l_{n} | | v_{n} - y {| |}^{2} . \end{matrix}

(26)

By (16), (23), and (26),

\begin{matrix} | | y_{n + 1} - y {| |}^{2} \leq & | | y_{n} - {y | |}^{2} - l_{n} (2 - l_{n}) | | t_{n} - v_{n} {| |}^{2} \\ - 2 l_{n} τ [g (v_{n}) - g (y) + h (s_{n}, t_{n}) - h (s_{n}, y)] . \end{matrix}

(27)

So, we obtain the conclusion of Lemma 6. □

The following is the second part of Algorithm 2.

Remark 5.

In Algorithm 4, we know the sequence

{l_{n}}_{n \in N}

is chosen from the interval

(0, 2)

with

\frac{a}{2} = min \{\frac{a}{2}, \frac{1}{2}\} \leq l_{n} \leq b .

Algorithm 4: Adaptive Douglas–Rachford Algorithm (Part 2)

In Algorithm 3, let $l_{n}$ be selected through the following steps.
Case 1. If $f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) \geq 0$ , then we choose $l_{n} \in (a, b)$ arbitrarily.
Case 2. If $f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) < 0$ , then we set $p_{n}$ as

$p_{n} = \frac{a^{2}}{2 τ} \cdot (\frac{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}]}{f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})}) .$
(i) If $p_{n} \geq b$ , then we choose $l_{n} \in (a, b)$ arbitrarily.
(ii) If $a \leq p_{n} < b$ , then we set $l_{n} = p_{n}$ .
(iii) If $p_{n} < a$ and $| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} \geq 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})],$ then we set $l_{n} = 0.5$ .
(iv) If $p_{n} < a$ and $| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} < 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})],$ then we set $l_{n}$ as

$l_{n} = min \{b, \frac{2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]}{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}} - \frac{a}{2}\} .$

Theorem 1.

Let

{x_{n}}_{n \in N}

,

{y_{n}}_{n \in N}

,

{u_{n}}_{n \in N}

, and

{v_{n}}_{n \in N}

be generated from Algorithms 3 and 4, and we assume that the solution set of the problem (BOP) is nonempty, and we assume that

H_{1}

and

H_{2}

are finite dimensional. Then there exist

\bar{x}, \bar{u} \in H_{1}

,

\bar{y}, \bar{v} \in H_{2}

, a subsequence

{(x_{n_{k}}, y_{n_{k}}, u_{n_{k}}, v_{n_{k}})}_{k \in N}

of

{(x_{n}, y_{n}, u_{n}, v_{n})}_{n \in N}

such that

x_{n_{k}} \to \bar{x}

,

y_{n_{k}} \to \bar{y}

,

u_{n_{k}} \to \bar{u}

,

v_{n_{k}} \to \bar{v}

, and

(\bar{u}, \bar{v})

is a solution of the problem

(B O P)

.

Proof.

Let

(\hat{u}, \hat{v}) \in H_{1} \times H_{2}

be any solution of problem (BOP). By (26), (27), and the assumption, we have

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 l_{n} τ [f (u_{n}) - f (\hat{u}) + h (s_{n}, v_{n}) - h (\hat{u}, v_{n})] \\ - 2 l_{n} τ [g (v_{n}) - g (\hat{v}) + h (s_{n}, t_{n}) - h (s_{n}, \hat{v})] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 l_{n} τ [f (u_{n}) + g (v_{n}) + h (s_{n}, t_{n}) - f (\hat{u}) - g (\hat{v}) - h (\hat{u}, \hat{v})] \\ = & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 l_{n} τ [f (s_{n}) + g (t_{n}) + h (s_{n}, t_{n}) - f (\hat{u}) - g (\hat{v}) - h (\hat{u}, \hat{v})] \\ - 2 l_{n} τ [f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n})] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 l_{n} τ [f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n})] . \end{matrix}

(28)

Next, we consider the following cases.

Case 1: If

f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) \geq 0

, then we choose

l_{n} \in (a, b) \subseteq (0, 2)

. Hence,

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 l_{n} τ [f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n})] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - a (2 - b) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) . \end{matrix}

(29)

Case 2: If

f (u_{n}) - f (s_{n}) + g (v_{n}) - g (t_{n}) < 0

, then we set

P_{n}

as

p_{n} = \frac{a^{2}}{2 τ} \cdot (\frac{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}]}{f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})}) .

Case 2 (i): If

p_{n} \geq b

, then we choose

l_{n} \in (a, b) \subseteq (0, 2)

. Hence,

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ + 2 l_{n} τ [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - a (2 - b) \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ + a^{2} \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - a (2 - a - b) \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) . \end{matrix}

(30)

Case 2 (ii): If

a \leq p_{n} < b

, then we set

l_{n} = p_{n}

. Hence,

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - a (2 - b) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ + 2 l_{n} τ [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})] \\ = & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - a (2 - a - b) \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) . \end{matrix}

(31)

Case 2 (iii): If

p_{n} < a

and

| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} \geq 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]

, then we set

l_{n} = 0.5

and have the following.

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ + 2 l_{n} τ [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ + l_{n} (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} (1 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - \frac{1}{4} \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) . \end{matrix}

(32)

Case 2 (iv): If

p_{n} < a

and

| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2} < 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]

, then we set

l_{n}

as

l_{n} = min \{b, \frac{2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]}{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}} - \frac{a}{2}\} .

Thus, we have the following.

\frac{a}{2} \leq l_{n} \leq b and 2 - l_{n} \geq 2 + \frac{a}{2} - \frac{2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]}{| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}} .

So,

\begin{matrix} (2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) - 2 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})] \\ \geq & (2 + \frac{a}{2}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) - 4 τ \cdot [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})] \\ \geq & (2 + \frac{a}{2}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) - 2 (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ = & \frac{a}{2} \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}), \end{matrix}

(33)

and this implies that

\begin{matrix} | | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - l_{n} [(2 - l_{n}) (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) \\ - 2 τ [f (s_{n}) - f (u_{n}) + g (t_{n}) - g (v_{n})]] \\ \leq & | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - {(\frac{a}{2})}^{2} \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) . \end{matrix}

(34)

Set

r_{n}

as

r_{n} = min \{a (2 - b), a (2 - a - b), \frac{1}{4}, {(\frac{a}{2})}^{2}\} = min \{a (2 - a - b), {(\frac{a}{2})}^{2}\} .

(35)

Hence, we obtain the following from (30), (31), (32), (34), and (35).

| | x_{n + 1} - \hat{u} {| |}^{2} + | | y_{n + 1} - \hat{v} {| |}^{2} \leq | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2} - r_{n} \cdot (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) .

Therefore,

{| | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2}}_{n \in N}

is nondecreasing, and

lim_{n \to \infty} | | x_{n} - \hat{u} {| |}^{2} + | | y_{n} - \hat{v} {| |}^{2}

exists, and this implies that

{x_{n}}_{n \in N}

,

{y_{n}}_{n \in N}

,

{u_{n}}_{n \in N}

,

{v_{n}}_{n \in N}

are bounded, and

\sum_{n = 1}^{\infty} (| | s_{n} - u_{n} {| |}^{2} + | | t_{n} - v_{n} {| |}^{2}) < \infty .

(36)

So,

\sum_{n = 1}^{\infty} | | x_{n + 1} - x_{n} {| |}^{2} = \sum_{n = 1}^{\infty} l_{n}^{2} \cdot | | s_{n} - u_{n} {| |}^{2} \leq 4 \sum_{n = 1}^{\infty} | | s_{n} - u_{n} {| |}^{2} < \infty

(37)

and

\sum_{n = 1}^{\infty} | | y_{n + 1} - y_{n} {| |}^{2} = \sum_{n = 1}^{\infty} l_{n}^{2} \cdot | | t_{n} - v_{n} {| |}^{2} \leq 4 \sum_{n = 1}^{\infty} | | t_{n} - v_{n} {| |}^{2} < \infty .

(38)

Since

{x_{n}}_{n \in N}

,

{y_{n}}_{n \in N}

,

{u_{n}}_{n \in N}

, and

{v_{n}}_{n \in N}

are bounded, there exist

\bar{x}, \bar{u} \in H_{1}

,

\bar{y}, \bar{v} \in H_{2}

, a subsequence

{x_{n_{k}}}_{k \in N}

of

{x_{n}}_{n \in N}

, a subsequence

{y_{n_{k}}}_{k \in N}

of

{y_{n}}_{n \in N}

, a subsequence

{u_{n_{k}}}_{k \in N}

of

{u_{n}}_{n \in N}

, and a subsequence

{v_{n_{k}}}_{k \in N}

of

{v_{n}}_{n \in N}

such that

x_{n_{k}} \to \bar{x}

,

y_{n_{k}} \to \bar{y}

,

u_{n_{k}} \to \bar{u}

, and

v_{n_{k}} \to \bar{v}

.

Next, by (7) and (10), we know

x_{n_{k}} - u_{n_{k}} \in τ \partial f (u_{n_{k}}) and y_{n_{k}} - v_{n_{k}} \in τ \partial g (v_{n_{k}}),

(39)

and this implies that

\bar{x} - \bar{u} \in τ \partial f (\bar{u}) and \bar{y} - \bar{v} \in τ \partial g (\bar{v}) .

(40)

Since

s_{n} = p r o x_{τ}^{h (\cdot, v_{n})} (2 u_{n} - x_{n})

and

t_{n} = p r o x_{τ}^{h (s_{n}, \cdot)} (2 v_{n} - y_{n})

, we know

u_{n_{k}} - s_{n_{k}} + u_{n_{k}} - x_{n_{k}} \in τ \partial_{x} h (s_{n_{k}}, v_{n_{k}}),

(41)

and

v_{n_{k}} - t_{n_{k}} + v_{n_{k}} - y_{n_{k}} \in τ \partial_{y} h (s_{n_{k}}, t_{n_{k}}) .

(42)

So, we have

\bar{u} - \bar{x} \in τ \partial h_{x} (\bar{u}, \bar{v}) and \bar{v} - \bar{y} \in τ \partial h_{y} (\bar{u}, \bar{v}) .

(43)

This implies that

0 \in \partial f (\bar{u}) + \partial h_{x} (\bar{u}, \bar{v}) and 0 \in \partial g (\bar{v}) + \partial h_{y} (\bar{u}, \bar{v}) .

(44)

So,

(\bar{u}, \bar{v})

is a critical point of J and this implies that

(\bar{u}, \bar{v})

is a solution of the problem (BOP). □

Theorem 2.

In Theorem 1, if

ρ > 0

and we further assume that f and g are ρ-strongly convex, then there exist

\bar{u} \in H_{1}

,

\bar{v} \in H_{2}

such that

u_{n} \to \bar{u}

and

v_{n} \to \bar{v}

, where

(\bar{u}, \bar{v})

is a solution of the problem

(B O P)

.

Proof.

In Theorem 1, there exist

\bar{x}, \bar{u} \in H_{1}

,

\bar{y}, \bar{v} \in H_{2}

, a subsequence

{(x_{n_{k}}, y_{n_{k}}, u_{n_{k}}, v_{n_{k}})}_{k \in N}

of

{(x_{n}, y_{n}, u_{n}, v_{n})}_{n \in N}

such that

x_{n_{k}} \to \bar{x}

,

y_{n_{k}} \to \bar{y}

,

u_{n_{k}} \to \bar{u}

,

v_{n_{k}} \to \bar{v}

, and

(\bar{u}, \bar{v})

is a solution of the problem

(B O P)

. Further, we have

\bar{x} - \bar{u} \in τ \partial f (\bar{u}) and \bar{y} - \bar{v} \in τ \partial g (\bar{v}),

(45)

and

\bar{u} - \bar{x} \in τ \partial h_{x} (\bar{u}, \bar{v}) and \bar{v} - \bar{y} \in τ \partial h_{y} (\bar{u}, \bar{v}) .

(46)

Hence, if

{({\hat{u}}_{n_{k}}, {\hat{v}}_{n_{k}})}_{k \in N}

is a subsequence of

{(u_{n}, v_{n})}_{n \in N}

such that

{\hat{u}}_{n_{k}} \to \hat{u}

and

{\hat{v}}_{n_{k}} \to \hat{v}

. Clearly, we know

{({\hat{x}}_{n_{k}}, {\hat{y}}_{n_{k}})}_{k \in N}

is a bounded subsequence of

{(x_{n}, y_{n})}_{n \in N}

. So, without loss of generality, we may assume that

{\hat{x}}_{n_{k}} \to \hat{x}

and

{\hat{y}}_{n_{k}} \to \hat{y}

. Next, it follows from the proof of Theorem 1 that

(\hat{u}, \hat{v})

is a solution of the problem

(B O P)

, and

\hat{x} - \hat{u} \in τ \partial f (\hat{u}) and \hat{y} - \hat{v} \in τ \partial g (\hat{v}),

(47)

and

\hat{u} - \hat{x} \in τ \partial h_{x} (\hat{u}, \hat{v}) and \hat{v} - \hat{y} \in τ \partial h_{y} (\hat{u}, \hat{v}) .

(48)

By (45) and (47), and f is when

ρ

-strongly convex, we have

τ \cdot ρ \cdot | | \bar{u} - \hat{u} {| |}^{2} \leq ⟨ (\hat{x} - \hat{u}) - (\bar{x} - \bar{u}), \hat{u} - \bar{u} ⟩,

(49)

and this implies that

τ \cdot ρ \cdot | | \bar{u} - \hat{u} {| |}^{2} \leq ⟨ \hat{x} - \bar{x}, \hat{u} - \bar{u} ⟩ - | | \bar{u} - \hat{u} {| |}^{2}

(50)

Since

x \to h (x, y)

is convex, we have

0 \leq ⟨ (\hat{u} - \hat{x}) - (\bar{u} - \bar{x}), \hat{u} - \bar{u} ⟩ = ⟨ - \hat{x} + \bar{x}, \hat{u} - \bar{u} ⟩ + | | \hat{u} - \bar{u} {| |}^{2} .

(51)

By (50) and (51), we know

\bar{u} = \hat{u}

. Similarly, we have

\bar{v} = \hat{v}

. Therefore, we know

u_{n} \to \bar{u}

and

v_{n} \to \bar{v}

, and the proof is completed. □

Author Contributions

Conceptualization, M.-S.L. and C.-S.C.; methodology, M.-S.L.; formal analysis, C.-S.C.; resources, M.-S.L.; writing—original draft preparation, C.-S.C.; writing—review and editing, M.-S.L. All authors have read and agreed to the published version of the manuscript.

Funding

Chih-Sheng Chuang was supported by the National Science and Technology Council (NSTC 112-2115-M-415-001).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BOP	Biconvex Optimization Problem (or Block Optimization Problem)
BCD	Black Coordinate Descent algorithm
PALM	Proximal ALternating Linearized Minimization
ASAP	Alternating Structure-Adapted Proximal gradient descent algorithm

References

Grant, M.; Boyd, S.; Ye, Y. Disciplined convex programming. In Global Optimization: From Theory to Implementation, Nonconvex Optimization and Its Applications; Liberti, L., Maculan, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2006; pp. 155–210. [Google Scholar]
Al-Shatri, H.; Li, X.; Ganesan, R.S.; Klein, A.; Weber, T. Maximizing the sum rate in cellular networks using multiconvex optimization. IEEE Trans. Wirel. Commun. 2016, 15, 3199–3211. [Google Scholar] [CrossRef]
Botle, J.; Sabach, S.; Teboulle, M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program Ser. A 2014, 146, 459–494. [Google Scholar]
Che, H.; Wang, J. A Two-Timescale Duplex Neurodynamic Approach to Biconvex Optimization. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2503–2514. [Google Scholar] [CrossRef] [PubMed]
Chiu, W.Y. Method of reduction of variables for bilinear matrix inequality problems in system and control designs. IEEE Trans. Syst. Man Cybern. Syst. 2017, 47, 1241–1256. [Google Scholar] [CrossRef]
Fu, X.; Huang, K.; Sidiropoulos, N.D. On identifiability of nonnegative matrix factorization. IEEE Signal Process. Lett. 2018, 25, 328–332. [Google Scholar] [CrossRef]
Gorski, J.; Pfeuffer, F.; Klamroth, K. Biconvex sets and optimization with biconvex functions: A survey and extensions. Math. Methods Oper. Res. 2007, 66, 373–407. [Google Scholar] [CrossRef]
Hours, J.H.; Jones, C.N. A parametric multiconvex splitting technique with application to real-time NMPC. In Proceedings of the 53rd IEEE Conference on Decision and Control, Los Angeles, CA, USA, 15–17 December 2014; pp. 5052–5057. [Google Scholar]
Li, G.; Wen, C.; Zheng, W.X.; Zhao, G. Iterative identification of block-oriented nonlinear systems based on biconvex optimization. Syst. Control Lett. 2015, 79, 68–75. [Google Scholar] [CrossRef]
Nikolova, M.; Tan, P. Alternating structure-adapted proximal gradient descent for nonconvex nonsmooth block-regularized problems. SIAM J. Optim. 2008, 29, 2053–2078. [Google Scholar] [CrossRef]
Shah, S.; Yadav, A.K.; Castillo, C.D.; Jacobs, D.W.; Studer, C.; Goldstein, T. Biconvex Relaxation for Semidefinite Programming in Computer Vision. In Computer Vision—ECCV 2016; Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Leibe, B., Matas, J., Sebe, N., Welling, M., Eds.; Springer: Cham, Switzerland, 2016. [Google Scholar]
Wen, Z.; Yin, W.; Zhang, Y. Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Math. Program. Comput. 2012, 4, 333–361. [Google Scholar] [CrossRef]
Xu, Y.; Yin, W. A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J. Imaging Sci. 2013, 6, 1758–1789. [Google Scholar] [CrossRef]
Auslender, A. Asymptotic properties of the fenchel dual functional and applications to decomposition problems. J. Optim. Theory Appl. 1992, 73, 427–449. [Google Scholar] [CrossRef]
Bauschke, H.H.; Combettes, P.L. Convex Functions: Variantsn. In Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: Berlin, Germany, 2011; pp. 143–153. [Google Scholar]
Elser, V.; Rankenburg, I.; Thibault, P. Searching with iterated maps. Proc. Natl. Acad. Sci. USA 2007, 104, 418–423. [Google Scholar] [CrossRef] [PubMed]
Gravel, S.; Elser, V. Divide and concur: A general approach constraint satisfaction. Phys. Rev. E 2008, 78, 036706. [Google Scholar] [CrossRef] [PubMed]
Aragón Artacho, F.J.; Borwein, J.M. Global convergence of a non-convex Douglas–Rachford iteration. J. Glob. Optim. 2013, 57, 753–769. [Google Scholar] [CrossRef][Green Version]
Aragón Artacho, F.J.; Campoy, R. A new projection method for finding the closet point in the intersection of convex sets. Comput. Optim. Appl. 2018, 69, 99–132. [Google Scholar] [CrossRef]
Bauschke, H.H.; Moursi, W.M. On the Douglas–Rachford algorithm. Math. Program. 2017, 164, 263–284. [Google Scholar] [CrossRef]
Borwein, J.M.; Sims, B. The Douglas–Rachford Algorithm in the Absence of Convexity. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering; Springer Optimization and Its Applications; Springer: New York, NY, USA, 2011; Volune 49, pp. 93–109. [Google Scholar]
Eckstein, J.; Bertsekas, D.P. On the Douglas—Rachford splitting method and the proximal point algorithm for maximal monotone operators. Math. Program. 1992, 55, 293–318. [Google Scholar] [CrossRef]
Butnariu, D.; Iusem, A.N. Totally Convex Functions. In Totally Convex Functions for Fixed Points Computation and Infinite Dimensional Optimization; Kluwer Academic Publishers: Dordrecht, The Netherlands, 2000; pp. 2–45. [Google Scholar]
Marino, G.; Xu, H.K. Convergence of generalized proximal point algorithm. Comm. Pure Appl. Anal. 2004, 3, 791–808. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, J.B. Variational Analysis; Springer: New York, NY, USA, 1998. [Google Scholar]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Adaptive Douglas–Rachford Algorithms for Biconvex Optimization Problem in the Finite Dimensional Real Hilbert Spaces

Abstract

1. Introduction

2. Preliminaries

3. Main Results

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics