A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications

Thongsri, Piti; Panyanak, Bancha; Suantai, Suthep

doi:10.3390/math11030702

Open AccessArticle

A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications

by

Piti Thongsri

¹,

Bancha Panyanak

^2,3 and

Suthep Suantai

^2,3,*

¹

PhD Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

²

Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

³

Data Science Research Center, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(3), 702; https://doi.org/10.3390/math11030702

Submission received: 21 December 2022 / Revised: 26 January 2023 / Accepted: 28 January 2023 / Published: 30 January 2023

(This article belongs to the Special Issue Fixed Point, Optimization, and Applications II)

Download

Browse Figure

Versions Notes

Abstract

A new accelerated common fixed point algorithm is introduced and analyzed for a countable family of nonexpansive mappings and then we apply it to solve some convex bilevel optimization problems. Then, under some suitable conditions, we prove a strong convergence result of the proposed algorithm. As an application, we employ the proposed algorithm for regression and classification problems. Moreover, we compare the performance of our algorithm with others. By numerical experiments, we found that our algorithm has a better performance than the others.

Keywords:

bilevel optimization; fixed point algorithm; forward-backward algorithm; regression and classification problems

MSC:

47H10; 65K10; 90C25

1. Introduction

Let H be a real Hilbert space and

f, g : H \to R

be real valued functions. The convex bilevel minimization problem is a special kind of optimization problem where one problem is embedded within another. The outer level is the following constraint minimization problem:

min_{x \in X^{*}} ω (x),

(1)

where

ω : R^{n} \to R

is strongly convex with parameter

σ > 0

and a continuously differentiable function such that

\nabla ω

is Lipschitz continuous with constant

L_{ω}

while

X^{*}

is the nonempty set of minimizers of the inner level problem, given by

min_{x \in R^{n}} {f (x) + g (x)},

(2)

and sometimes we will use the notation

argmin (f + g)

for

X^{*}

. The following assumptions are assumed for solving Problem (2).

(i): $g : R^{n} \to R \cup {\infty}$ is a proper convex and lower semi-continuous function;
(ii): $f : R^{n} \to R$ is a convex and differentiable function such that $\nabla f$ is a Lipschitz continuous with constant $L_{f} > 0$ , that is,

$∥ \nabla f (x) - \nabla f (y) ∥ \leq L_{f} ∥ x - y ∥ for all x, y \in R^{n} .$

The solution of (2) can be characterized by Theorem 16.3 of Bauschke and Combettes [1] as follows:

\overset{˘}{q} is a minimizer of Problem (2) if and only if 0 \in \partial g (\overset{˘}{q}) + \nabla f (\overset{˘}{q}),

where

\partial g

is the subdifferential of g and

\nabla f

is the gradient of f. Moreover, Problem (2) is also characterized by the following fixed point problem:

\begin{matrix} \overset{˘}{q} is a minimizer of (f + g) if and only if \overset{˘}{q} & = p r o x_{c g} (I - c \nabla f) (\overset{˘}{q}), \end{matrix}

where

c > 0

and

p r o x_{c g} (x) = a r g m i n_{y \in H} (g (y) + \frac{1}{2 c} {∥ x - y ∥}^{2})

. It is also known that

p r o x_{c g} (I - c \nabla f)

is a nonexpansive operator when

c \in (0, 2 / L)

and

L > 0

. The operator

p r o x_{c g} (I - c \nabla f)

is called the forward-backward operator of f and g with respect to c. Moreover, it is known that

\overset{˘}{q} \in X^{*}

is a point of minimizer of problem (1) if and only if

\begin{matrix} 〈 \nabla ω (\overset{˘}{q}), x - \overset{˘}{q} 〉 \geq 0 for all x \in X^{*} . \end{matrix}

(3)

In the past decade, many researchers have proposed methods to find optimal solutions of Problem (2). Lions and Mercier [2] introduced a simple algorithm, called Forward-Backward Splitting (FBS), for solving Problem (2). Their algorithm was given by

\begin{matrix} x_{n + 1} = p r o x_{c_{n} g} (I - c_{n} \nabla f) (x_{n}), \end{matrix}

(4)

where

c_{n}

is the step-size and

c_{n} \in (0, 2 / L)

.

In 1964, Polyak [3] firstly introduced the inertial technique for accelerating the rate of convergence of the algorithm. Since then, this technique was widely used for this purpose.

In [4], Beck and Teboulle employed the inertial technique to introduce a fast iterative shrinkage-thresholding algorithm (FISTA) for solving Problem (2) as follows:

\begin{matrix} x_{1} = y_{0} \in C, t_{1} = 1, \\ y_{n} = p r o x_{α g} (I - α \nabla f) (x_{n}), α > 0, \\ t_{n + 1} = \frac{\sqrt{1 + 4 t_{n}^{2}} + 1}{2}, \\ θ_{n} = \frac{t_{n} - 1}{t_{n + 1}}, \\ x_{n + 1} = y_{n} + θ_{n} (y_{n} - y_{n - 1}) . \end{matrix}

They showed that the convergence behavior of FISTA is better than the others.

Recently, some authors, for instance, Bussaban et al. [5], Puangpee and Suantai [6] and Jailoka et al. [7], employed the inertial technique to introduce common fixed point algorithms for a countable family of nonexpansive operators and established some convergence results under NST-condition (I), NST

^{☆}

-condition, and the condition (Z). They also applied their algorithms to solving some convex minimization problems.

In 2017, Sabach and Shtern [8] introduced a new method, called Sequential Averaging Method (SAM), for solving a bilevel optimization problem. Such an algorithm was developed from [9] for solving a certain class of fixed point problems. To solve the bilevel optimization Problems (1) and (2), the Bilevel Gradient sequential Averaging Method (BiG-SAM) was proposed in [8]. Their algorithm was defined by Algorithm 1.

Algorithm 1 Bilevel Gradient sequential Averaging Method (BiG-SAM)

(1): Input: $s \in (0, 2 / (σ + L_{ω}))$ , $c \in (0, 1 / L_{f})$ , and ${α_{k}}_{k \in N} \subset (0, 1] .$
(2): Initialization: $x_{1} \in R^{n}$ .
(3): General step: $(k = 1, 2, . . .)$ :

$\begin{matrix} y_{k} & = p r o x_{c g} (x_{k} - c \nabla f (x_{k})), \\ z_{k} & = x_{k} - s \nabla ω (x_{k}), \\ x_{k + 1} & = α_{k} z_{k} + (1 - α_{k}) y_{k}, \end{matrix}$

where

\nabla ω

is the gradient of

ω

.

They proved a strong convergence theorem of the sequence

{x_{k}}_{k \in N}

generated by BiG-SAM under some control conditions.

After that, Shehu et al. [10] used the inertial technique for improving the convergence behavior of BiG-SAM. Their algorithm is known as the inertial Bilevel Gradient Sequential Averaging Method (iBiG-SAM). It was defined as follows (Algorithm 2):

Algorithm 2 Inertial Bilevel Gradient sequential Averaging Method (iBiG-SAM)

(1): Input: $α \geq 3$ , $s \in (0, 2 / (σ + L_{ω}))$ , and $c \in (0, 1 / L_{f})$ .
(2): Initialization: $x_{0}, x_{1} \in R^{n}$ .
(3): Step 1 For $(k = 1, 2, . . .)$ :

$\begin{matrix} θ_{n} = \{\begin{matrix} min {\frac{k}{k + α - 1}, \frac{γ_{n}}{∥ x_{k} - x_{k - 1} ∥}}, & if x_{k} \neq x_{k - 1}, \\ \frac{k}{k + α - 1}, & otherwise . \end{matrix} \end{matrix}$
(4): Step 2 Compute:

$\begin{matrix} w_{k} & = x_{k} + μ_{k} (x_{k} - x_{k - 1}), \\ y_{k} & = p r o x_{c g} (w_{k} - c \nabla f (w_{k})), \\ z_{k} & = w_{k} - s \nabla ω (w_{k}), \\ x_{k + 1} & = α_{k} z_{k} + (1 - α_{k}) y_{k}, \end{matrix}$

where

\nabla ω

is the gradient of

ω

.

In 2022, Duan and Zhang [11] introduced a new algorithm based on the proximal gradient algorithm for solving a bilevel optimization problem. This algorithm is known as the alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM). It was defined as follows (Algorithm 3):

Algorithm 3 The alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM)

(1): Input: $α \geq 3$ , $s \in (0, 2 / (σ + L_{ω}))$ and $c \in (0, 1 / L_{f})$ . Set $λ > 0$ .
(2): Initialization: $x_{0}, x_{1} \in R^{n}$ .
(3): Step 1 For $(k = 1, 2, . . .)$ :

$\begin{matrix} w_{k} = \{\begin{matrix} x_{k} + μ_{k} (x_{k} - x_{k - 1})), & if k is odd, \\ x_{k}, & if k is even . \end{matrix} \end{matrix}$

where k is odd, choose

μ_{k}

such that

0 \leq | μ_{k} | \leq θ_{n}

with

θ_{n}

defined by

\begin{matrix} θ_{n} = \{\begin{matrix} min {\frac{k}{k + α - 1}, \frac{γ_{k}}{∥ x_{k} - x_{k - 1} ∥}}, & if x_{k} \neq x_{k - 1}, \\ \frac{k}{k + α - 1}, & otherwise . \end{matrix} \end{matrix}

(4): Step 2 Compute:

$\begin{matrix} y_{k} & = p r o x_{c g} (w_{k} - c \nabla f (w_{k})), \\ z_{k} & = w_{k} - s \nabla ω (w_{k}), \\ x_{k + 1} & = α_{k} z_{k} + (1 - α_{k}) y_{k} . \end{matrix}$
(5): Step 3 If $∥ x_{k} - x_{k - 1} ∥ < λ$ , then stop.

They also proved a strong convergence result of the proposed method.

Motivated by these works, we are interested in proposing a new efficient algorithm for convex bilevel Problems (1) and (2). We establish and prove a convergence theorem of the proposed algorithm under some suitable conditions. We employ it for data prediction and classification. The paper is organized as follows. In Section 2, we describe some notations and useful lemmas for the later sections. In Section 3, we discuss and analyze the convergence of our proposed algorithm. In Section 4, we present applications of the obtained fixed-point results in Section 3 for solving regression and classification problems. Moreover, some numerical experiments on regression and classification problems are also given in Section 4. Finally, we also give conclusions of the paper in Section 5.

2. Preliminaries

Let H be a real Hilbert space with norm

∥ \cdot ∥

and inner product

〈 \cdot, \cdot 〉

, and C be a nonempty closed convex subset of H. Let

L \in (0, \infty)

. An operator

T : C \to C

is said to be L-Lipschitz if

∥T x - T y∥ \leq L ∥x - y∥ for all x, y \in C

. If T is Lipschitz continuous with a coefficient

L \in (0, 1)

, then T is called a contraction. The operator T is said to be nonexpansive if

L = 1

. We use

F (T)

to denote the set of fixed points of T, that is,

F (T) : = {x \in C : x = T x}

. The set of all common fixed points of a sequence of nonexpansive operators

{T_{n}}

of C into itself is

⋂_{n = 1}^{\infty} F (T_{n})

. For finding a common fixed point of

{T_{n}}

, Takahashi [12] introduced the NST-condition as the following. Let

{T_{n}}

and

Ψ

be families of nonexpansive operators of C into itself with

\emptyset \neq F (Ψ) \subset ⋂_{n = 1}^{\infty} F (T_{n})

, where

F (Ψ) = ⋂_{T \in Ψ} F (T)

. A sequence

{T_{n}}

satisfies the NST-condition (I) with

Ψ

, if for any bounded sequence

{v_{n}}

in C,

lim_{n \to \infty} ∥ v_{n} - T_{n} v_{n} ∥ = 0 implies lim_{n \to \infty} ∥ v_{n} - T v_{n} ∥ = 0

for all

T \in Ψ

. The sequence

{T_{n}}

satisfies the NST-condition (I) with T if

Ψ = {T}

.

The following Lemma is useful for proving our main result.

Lemma 1

([5]). Let f be a convex and differentiable function from H into

R

with

\nabla f

as a Lipschitz continuous with constant

L > 0

, and g is a proper convex and lower semi-continuous function from H into

R \cup {\infty}

. Let

T_{n} : = p r o x_{c_{n} g} (I - c_{n} \nabla f)

and

T : = p r o x_{c g} (I - c \nabla f)

, where

c_{n}, c \in (0, 2 / L)

with

c_{n} \to c

as

n \to \infty

. Then

{T_{n}}

satisfies the NST-condition (I) with T.

Definition 1

([13,14]). A sequence

{T_{n} : H \to H}

with a nonempty common fixed point set is said to satisfy the condition (Z) if

{x_{n}}

is a bounded sequence in H such that

lim_{n \to \infty} ∥ x_{n} - T_{n} x_{n} ∥ = 0,

it follows that every weak cluster point of

{x_{n}}

belongs to

⋂_{n = 1}^{\infty} F (T_{n})

.

The following remark is obtained by demicloseness of

I - T

where

T : H \to H

is the nonexpansive operator.

Remark 1.

If

{T_{n}}

is nonexpansive, the operator satisfies the NST-condition (I) with respect to T where T is the nonexpansive operator. Then

{T_{n}}

satisfies the condition (Z).

Note that if

g : R^{n} \to R \cup {\infty}

is a proper, lower semi-continuous and convex function, then the

p r o x_{g} (x)

exists and is unique for all

x \in R^{n}

; see [15]. We end this part with the following useful lemmas, which will be used in the later section.

Lemma 2

([16,17]). For any

m, n \in H

and

μ \in [0, 1]

, the following statements hold:

(1): ${∥ m \pm n ∥}^{2} = {∥ m ∥}^{2} \pm 2 〈 m, n 〉 + {∥ n ∥}^{2} for all m, n \in H$ ;
(2): ${∥ m + n ∥}^{2} \leq {∥ m ∥}^{2} + 2 〈n, m + n〉 for all m, n \in H$ ;
(3): ${∥ μ m + (1 - μ) n ∥}^{2} = {μ ∥ m ∥}^{2} + {(1 - μ) ∥ n ∥}^{2} - μ (1 - μ) {∥ m - n ∥}^{2} .$

The identity in Lemma 2(3) implies that the following equality holds:

{∥ κ m + α n + η z ∥}^{2} = {κ ∥ m ∥}^{2} + {α ∥ n ∥}^{2} + {η ∥ z ∥}^{2} {- κ α ∥ m - n ∥}^{2} {- α η ∥ n - z ∥}^{2} - κ η {∥ m - z ∥}^{2},

for all

m, n, z \in H

and

κ, α, η \in [0, 1]

with

κ + α + η = 1

.

Lemma 3

([18]). Let

{a_{n}}

,

{ρ_{n}} \subset R_{+}

,

{s_{n}} \subset R

and

{ζ_{n}} \subset [0, 1]

such that

a_{n + 1} \leq (1 - ζ_{n}) a_{n} + ζ_{n} s_{n} + ρ_{n}

for all

n \in N

. If the following conditions hold:

(i): $\sum_{n = 1}^{\infty} ζ_{n} = \infty$ ;
(ii): $\sum_{n = 1}^{\infty} ρ_{n} < \infty$ ;
(iii): ${lim sup}_{n \to \infty} s_{n} \leq 0$ ,

then

{lim}_{n \to \infty} a_{n} = 0

.

Lemma 4

([19]). Let

{Φ_{n}}

be a sequence of real numbers which does not decrease at infinity in the sense that there exists a subsequence

{Φ_{n_{i}}}

of

{Φ_{n}}

, satisfying

Φ_{n_{i}} < Φ_{n_{i} + 1}

for all

i \in N

. Define the sequence

{λ (n)}_{n \geq n_{0}}

of integers as follows:

λ (n) : = max {k \leq n : Φ_{k} < Φ_{k + 1}},

where

n_{0} \in N

such that

{k \leq n_{0} : Φ_{k} < Φ_{k + 1}} \neq \emptyset

. Then the following statements hold:

(i): $λ (n_{0}) \leq λ (n_{0} + 1) \leq \dots$ and $λ (n) \to \infty$ ;
(ii): $Φ_{λ (n)} \leq Φ_{λ (n) + 1}$ and $Φ_{n} \leq Φ_{λ (n) + 1}$ for all $n \geq n_{0}$ .

Let C be a nonempty closed convex subset of a Hilbert space H and let

P : H \to C

be a mapping. The metric projection onto C, denoted by

P_{C}

, is defined for each

x \in H

,

P_{C} x

and is the unique element in C such that

∥ x - P_{C} x ∥ = inf_{y \in C} ∥ x - y ∥ .

It is known that

\overset{˘}{q} = P_{C} x \Leftrightarrow 〈 x - \overset{˘}{q}, y - \overset{˘}{q} 〉 \leq 0,

(5)

for all

y \in C

; see [16].

3. Results

Throughout this section, we let

{T_{n}}

and

Ψ

be families of nonexpansive operators on a real Hilbert space H such that

F (Ψ) \subset Γ : = ⋂_{n = 1}^{\infty} F (T_{n})

and

S : H \to H

are a contraction mapping with a constant

k \in (0, 1)

.

To find a common fixed point of a countable family of nonexpansive operators in a real Hilbert space, we first propose a new accelerated algorithm. Then, under certain conditions, we show a strong convergence theorem. Now, we are ready to introduce our accelerated algorithm as follows:

Theorem 1.

Suppose that

{T_{n}}

satisfies the condition (Z). Let

{x_{n}}

be a sequence generated by Algorithm 4 which satisfies the following conditions:

(i): $0 < a \leq κ_{n} \leq b < 1$ for some $a, b \in R$ ;
(ii): ${lim}_{n \to \infty} α_{n} = 0$ and $\sum_{n = 1}^{\infty} α_{n} = + \infty$ ;
(iii): ${lim}_{n \to \infty} γ_{n} = 0$ .

Then

{x_{n}}

converges strongly to an element

\overset{˘}{q} \in Γ

, where

\overset{˘}{q} = P_{Γ} S (\overset{˘}{q})

.

Algorithm 4 An Inertial Viscosity Modified Picard (IVMP)

Initial. Take

x_{0}, x_{1} \in H

arbitrarily and

t_{1} = 0

.

For $n \geq 1$ , set

\begin{matrix} θ_{n} = \{\begin{matrix} min {\frac{t_{n} - 1}{t_{n + 1}}, \frac{γ_{n} α_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}, \\ \frac{t_{n} - 1}{t_{n + 1}}, & otherwise, \end{matrix} \end{matrix}

(6)

where

t_{n + 1} = \frac{1 + \sqrt{1 + 4 t_{n}^{2}}}{2}

.

Step 1. Calculate

y_{n}

,

z_{n}

and

x_{n + 1}

using:

\begin{matrix} y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ z_{n} = (1 - α_{n} - κ_{n}) y_{n} + α_{n} S (y_{n}) + κ_{n} T_{n} y_{n}, \\ x_{n + 1} = T_{n} z_{n}, \end{matrix}

where

{α_{n}}

,

{κ_{n}} \subset [0, 1]

with

α_{n} + κ_{n} \leq 1

.

Then, update

n : = n + 1

and return to Step 1.

Proof.

Let

\overset{˘}{q} \in Γ

be such that

\overset{˘}{q} = P_{Γ} S (\overset{˘}{q})

. By the definition of

y_{n}

and

z_{n}

in Algorithm 4, for each

n \in N

, we have

\begin{matrix} ∥ y_{n} - \overset{˘}{q} ∥ & = ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) - \overset{˘}{q} ∥ \\ \leq ∥ x_{n} - \overset{˘}{q} ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ \end{matrix}

(7)

and

\begin{matrix} ∥ z_{n} - \overset{˘}{q} ∥ & = ∥ (1 - α_{n} - κ_{n}) y_{n} + α_{n} S (y_{n}) + κ_{n} T_{n} y_{n} - \overset{˘}{q} ∥ \\ = ∥ (1 - α_{n} - κ_{n}) (y_{n} - \overset{˘}{q}) + α_{n} (S (y_{n}) - \overset{˘}{q}) + κ_{n} (T_{n} y_{n} - \overset{˘}{q}) ∥ \\ \leq (1 - α_{n} - κ_{n}) ∥ (y_{n} - \overset{˘}{q}) ∥ + α_{n} ∥ (S (y_{n}) - \overset{˘}{q}) ∥ \\ + κ_{n} ∥ (T_{n} y_{n} - \overset{˘}{q}) ∥ \\ \leq (1 - α_{n} - κ_{n}) ∥ (y_{n} - \overset{˘}{q}) ∥ + α_{n} k ∥ y_{n} - \overset{˘}{q} ∥ + α_{n} ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥ \\ + κ_{n} ∥ y_{n} - \overset{˘}{q} ∥ \\ = (1 - (1 - k) α_{n}) ∥ y_{n} - \overset{˘}{q} ∥ + α_{n} ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥ \\ \leq (1 - (1 - k) α_{n}) (∥ x_{n} - \overset{˘}{q} ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥) + α_{n} ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥ \\ \leq (1 - (1 - k) α_{n}) ∥ x_{n} - \overset{˘}{q} ∥ + α_{n} (\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥) . \end{matrix}

(8)

From (7) and (8), we obtain

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥ & = ∥ T_{n} z_{n} - \overset{˘}{q} ∥ \\ \leq ∥ z_{n} - \overset{˘}{q} ∥ \\ \leq (1 - (1 - k) α_{n}) ∥ x_{n} - \overset{˘}{q} ∥ \\ + α_{n} (\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥) \\ = (1 - (1 - k) α_{n}) ∥ x_{n} - \overset{˘}{q} ∥ \\ + (1 - k) α_{n} (\frac{\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}) \\ \leq max \{∥ x_{n} - \overset{˘}{q} ∥, α_{n} (\frac{\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k})\} . \end{matrix}

(9)

Since

{lim}_{n \to \infty} γ_{n} = 0

, by (6), we get that

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0 as n \to \infty

. Thus, there is a constant

M_{1} > 0

such that

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \leq M_{1} for all n \geq 1

. This implies

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥ & \leq max \{∥ x_{n} - \overset{˘}{q} ∥, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\} for all n \geq 1 . \end{matrix}

(10)

Let

M = max \{∥ x_{1} - \overset{˘}{q} ∥, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\}

. We show

∥ x_{n} - \overset{˘}{q} ∥ \leq M

. For

n = 1

, we get

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥ & \leq max \{∥ x_{1} - \overset{˘}{q} ∥, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\} \\ = M . \end{matrix}

Suppose

∥ x_{n} - \overset{˘}{q} ∥ \leq M

for some

n \in N

. It follows from (10) that

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥ & \leq max \{∥ x_{1} - \overset{˘}{q} ∥, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\}, \\ \leq max \{M, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\} . \end{matrix}

Since

M = max \{∥ x_{1} - \overset{˘}{q} ∥, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\}

, we obtain

\frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k} \leq M

which implies

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥ & \leq max \{M, \frac{M_{1} + ∥ S (\overset{˘}{q}) - \overset{˘}{q} ∥}{1 - k}\} \\ = M . \end{matrix}

By mathematical induction, we conclude that

∥ x_{n} - \overset{˘}{q} ∥ \leq M

for all

n \in N

. Thus,

∥ x_{n} ∥ \leq ∥ x_{n} - \overset{˘}{q} ∥ + ∥ \overset{˘}{q} ∥ \leq M + ∥ \overset{˘}{q} ∥

for all

n \in N

. It follows that

{x_{n}}

is bounded, and so are

{y_{n}}, {z_{n}}, {T_{n} y_{n}}, {T_{n} z_{n}}

and

{S (y_{n})}

.

For each

n \geq 1

, we have

\begin{matrix} ∥ y_{n} - \overset{˘}{q} ∥^{2} & = ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) - \overset{˘}{q} ∥^{2} \\ \leq ∥ x_{n} - \overset{˘}{q} ∥^{2} + 2 〈x_{n} - \overset{˘}{q}, θ_{n} (x_{n} - x_{n - 1})〉 + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} . \end{matrix}

(11)

By Lemma 2, we get

\begin{matrix} ∥ z_{n} - \overset{˘}{q} ∥^{2} & = ∥ (1 - α_{n} - κ_{n}) y_{n} + α_{n} S (y_{n}) + κ_{n} T_{n} y_{n} - \overset{˘}{q} ∥^{2} \\ = ∥ (1 - α_{n} - κ_{n}) (y_{n} - \overset{˘}{q}) + α_{n} (S (y_{n}) - \overset{˘}{q}) + κ_{n} (T_{n} y_{n} - \overset{˘}{q}) ∥^{2} \\ \leq ∥ (1 - α_{n} - κ_{n}) (y_{n} - \overset{˘}{q}) + α_{n} (S (y_{n}) - S (\overset{˘}{q})) + κ_{n} (T_{n} y_{n} - \overset{˘}{q}) ∥^{2} \\ + 2 〈α_{n} (S (\overset{˘}{q}) - \overset{˘}{q}), z_{n} - \overset{˘}{q}〉 \\ \leq (1 - α_{n} - κ_{n}) ∥ y_{n} - \overset{˘}{q} ∥^{2} + α_{n} ∥ S (y_{n}) - S (\overset{˘}{q}) ∥^{2} + κ_{n} {∥ T_{n} y_{n} - \overset{˘}{q} ∥}^{2} \\ + 2 α_{n} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 \\ \leq (1 - α_{n}) ∥ y_{n} - \overset{˘}{q} ∥^{2} + α_{n} k^{2} {∥ y_{n} - \overset{˘}{q} ∥}^{2} \\ + 2 α_{n} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 for all n \geq 1 . \end{matrix}

It follows from (7) with

0 < k < 1

that

\begin{matrix} ∥ z_{n} - \overset{˘}{q} ∥^{2} & \leq (1 - α_{n}) {∥ y_{n} - \overset{˘}{q} ∥}^{2} \\ + α_{n} k {∥ y_{n} - \overset{˘}{q} ∥}^{2} + 2 α_{n} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 for all n \geq 1 . \\ \leq (1 - α_{n} + α_{n} k) (∥ x_{n} - \overset{˘}{q} ∥^{2} + 2 〈x_{n} - \overset{˘}{q}, θ_{n} (x_{n} - x_{n - 1})〉) \\ + (1 - α_{n} + α_{n} k) (θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥^{2}) + 2 α_{n} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 . \end{matrix}

(12)

Using (12),

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥^{2} & = ∥ T_{n} z_{n} - \overset{˘}{q} ∥^{2} \\ \leq ∥ z_{n} - \overset{˘}{q} ∥^{2} \\ \leq (1 - (1 - k) α_{n}) {∥ x_{n} - \overset{˘}{q} ∥}^{2} \\ + (1 - k) α_{n} [\frac{2 θ_{n}}{(1 - k) α_{n}} ∥ x_{n} - \overset{˘}{q} ∥ ∥ x_{n} - x_{n - 1} ∥] \\ + (1 - k) α_{n} [\frac{θ_{n}^{2}}{(1 - k) α_{n}} ∥ x_{n} - x_{n - 1} ∥ + \frac{2}{1 - k} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉] . \end{matrix}

(13)

From the above inequality, we get

a_{n} : = {∥ x_{n} - \overset{˘}{q} ∥}^{2}, ζ_{n} : = (1 - k) α_{n},

and

s_{n} : = (1 - k) α_{n} [\frac{2 θ_{n}}{α_{n} (1 - k)} ∥ x_{n} - \overset{˘}{q} ∥ ∥ x_{n} - x_{n - 1} ∥ + \frac{θ_{n}^{2}}{(1 - k) α_{n}} ∥ x_{n} - x_{n - 1} ∥ + \frac{2}{1 - k} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉] .

So, we obtain

\begin{matrix} a_{n + 1} \leq (1 - ζ_{n}) a_{n} + ζ_{n} s_{n} . \end{matrix}

(14)

Now, we consider two cases for the covergence of the sequence

{x_{n}}

generated by Algorithm 4.

Case 1.

There exists a

n_{0} \in N

such that the sequence

{∥ x_{n} - \overset{˘}{q} {∥}}_{n \geq n_{0}}

is nonincreasing. Since

{∥ x_{n} - \overset{˘}{q} ∥}

is bounded from below by zero,

{lim}_{n \to \infty} ∥ x_{n} - \overset{˘}{q} ∥

exists. Using assumption

{lim}_{n \to \infty} α_{n} = 0

and

\sum_{n = 1}^{\infty} α_{n} = + \infty

, we get that

\sum_{n = 1}^{\infty} (1 - k) α_{n} = (1 - k) \sum_{n = 1}^{\infty} α_{n} = + \infty

. For applying this in Lemma 4, we need to show that

\underset{n \to \infty}{lim sup} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 \leq 0 .

Using the fact of Lemma 2(3), we get

\begin{matrix} ∥ x_{n + 1} - \overset{˘}{q} ∥^{2} & = ∥ T_{n} z_{n} - \overset{˘}{q} ∥^{2} \\ \leq ∥ z_{n} - \overset{˘}{q} ∥^{2} \\ = ∥ (1 - α_{n} - κ_{n}) y_{n} + α_{n} S (y_{n}) + κ_{n} T_{n} y_{n} - \overset{˘}{q} ∥^{2} \\ = ∥ (1 - α_{n} - κ_{n}) (y_{n} - \overset{˘}{q}) + α_{n} (S (y_{n}) - \overset{˘}{q}) + κ_{n} (T_{n} y_{n} - \overset{˘}{q}) ∥^{2} \\ = (1 - α_{n} - κ_{n}) ∥ y_{n} - \overset{˘}{q} ∥^{2} + α_{n} ∥ S (y_{n}) - \overset{˘}{q} ∥^{2} + κ_{n} {∥ T_{n} y_{n} - \overset{˘}{q} ∥}^{2} \\ - (1 - α_{n} - κ_{n}) α_{n} ∥ y_{n} - S (y_{n}) ∥^{2} - α_{n} κ_{n} {∥ S (y_{n}) - T_{n} y_{n} ∥}^{2} \\ - (1 - α_{n} - κ_{n}) κ_{n} {∥ y_{n} - T_{n} y_{n} ∥}^{2} \\ \leq (1 - α_{n} - κ_{n}) ∥ y_{n} - \overset{˘}{q} ∥^{2} + α_{n} ∥ S (y_{n}) - \overset{˘}{q} ∥^{2} + κ_{n} {∥ y_{n} - \overset{˘}{q} ∥}^{2} \\ - (1 - α_{n} - κ_{n}) κ_{n} {∥ y_{n} - T_{n} y_{n} ∥}^{2} . \end{matrix}

This implies that

\begin{matrix} (1 - α_{n} - κ_{n}) κ_{n} {∥ y_{n} - T_{n} y_{n} ∥}^{2} & \leq (1 - α_{n}) ∥ y_{n} - \overset{˘}{q} ∥^{2} + α_{n} ∥ S (y_{n}) - \overset{˘}{q} ∥^{2} - {∥ x_{n + 1} - \overset{˘}{q} ∥}^{2} \\ \leq (1 - α_{n}) (∥ x_{n} - \overset{˘}{q} ∥^{2} + 2 〈x_{n} - \overset{˘}{q}, θ_{n} (x_{n} - x_{n - 1})〉) \\ + (1 - α_{n}) (θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥^{2}) + α_{n} {∥ S (y_{n}) - \overset{˘}{q} ∥}^{2} \\ - ∥ x_{n + 1} - \overset{˘}{q} ∥^{2} \\ \leq (1 - α_{n}) (∥ x_{n} - \overset{˘}{q} ∥^{2} + 2 ∥ x_{n} - \overset{˘}{q} ∥ ∥ θ_{n} (x_{n} - x_{n - 1}) ∥ \\ + (1 - α_{n}) (θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥^{2}) + α_{n} {∥ S (y_{n}) - \overset{˘}{q} ∥}^{2} \\ - ∥ x_{n + 1} - \overset{˘}{q} ∥^{2} . \end{matrix}

It follows from the assumption and the convergence of the sequence

{∥ x_{n} - \overset{˘}{q} ∥}

and

{θ_{n} ∥ x_{n} - x_{n - 1} ∥}

that

{lim}_{n \to \infty} ∥ y_{n} - T_{n} y_{n} ∥ = 0

. For each

n \geq 1

, we have

\begin{matrix} ∥ z_{n} - T_{n} z_{n} ∥ & = ∥ z_{n} - y_{n} + y_{n} - T_{n} y_{n} + T_{n} y_{n} - T_{n} z_{n} ∥ \\ \leq ∥ z_{n} - y_{n} ∥ + ∥ y_{n} - T_{n} y_{n} ∥ + ∥ T_{n} y_{n} - T_{n} z_{n} ∥ \\ \leq ∥ z_{n} - y_{n} ∥ + ∥ y_{n} - T_{n} y_{n} ∥ + ∥ y_{n} - z_{n} ∥ \\ \leq 2 ∥ z_{n} - y_{n} ∥ + ∥ y_{n} - T_{n} y_{n} ∥ \\ = 2 ∥ (1 - α_{n} - κ_{n}) y_{n} + α_{n} S (y_{n}) + κ_{n} T_{n} y_{n} - y_{n} ∥ + ∥ y_{n} - T_{n} y_{n} ∥ \\ = 2 ∥ α_{n} (S (y_{n}) - y_{n}) + κ_{n} (T_{n} y_{n} - y_{n}) ∥ + ∥ y_{n} - T_{n} y_{n} ∥ \\ \leq 2 α_{n} ∥ S (y_{n}) - y_{n} ∥ + (1 + 2 κ_{n}) ∥ y_{n} - T_{n} y_{n} ∥ . \end{matrix}

This implies that

{lim}_{n \to \infty} ∥ z_{n} - T_{n} z_{n} ∥ = 0

. Let

v = \underset{n \to \infty}{lim sup} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 .

Since

{z_{n}}

is bounded, we can choose a subsequence

{z_{n_{k}}}

of

{z_{n}}

such that

v = lim_{k \to \infty} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n_{k}} - \overset{˘}{q}〉

and

z_{m_{k}} ⇀ w

for some

w \in H

. It follows from the condition (Z) of

{T_{n}}

that

w \in Γ

.

Moreover, using

\overset{˘}{q} = P_{Γ} S (\overset{˘}{q})

, we obtain

\begin{matrix} v & = lim_{k \to \infty} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n_{k}} - \overset{˘}{q}〉 \\ = 〈S (\overset{˘}{q}) - \overset{˘}{q}, w - \overset{˘}{q}〉 \\ = 〈S (\overset{˘}{q}) - P_{Γ} S (\overset{˘}{q}), w - \overset{˘}{q}〉 \\ \leq 0 . \end{matrix}

Thus,

\begin{matrix} \underset{n \to \infty}{lim sup} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{n} - \overset{˘}{q}〉 \leq 0 . \end{matrix}

(15)

It implies by (15) and the fact of

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0

that

{lim sup}_{n \to \infty} t_{n} \leq 0

. From (14), using Lemma 4, we can conclude that

x_{n} \to \overset{˘}{q}

.

Case 2.

Suppose that sequence

{x_{n} - \overset{˘}{q}}_{n \geq n_{0}}

is not a monotone decreasing sequence for all

n_{0}

large enough. Set

Φ_{n} : = {∥ x_{n} - \overset{˘}{q} ∥}^{2} .

So, there exists a subsequence

{Φ_{n_{j}}}

of

{Φ_{n}}

such that

Φ_{n_{j}} \leq Φ_{n_{j} + 1}

for all

j \in N

. In this case, we define

λ : {n : n \geq n_{0}} \to N

by

λ (n) : = max {k \in N : k \leq n, Φ_{k} < Φ_{k + 1}} .

By Lemma 4, we obtain

Φ_{λ (n)} \leq Φ_{λ (n) + 1}

for all

n \geq n_{0}

. Then,

∥ x_{λ (n)} - \overset{˘}{q} ∥ \leq ∥ x_{λ (n) + 1} ∥ for all n \geq n_{0} .

The same as the argument in Case 1, we obtain

\begin{matrix} ∥ z_{λ (n)} - T_{λ (n)} z_{λ (n)} ∥ & \leq 2 α_{λ (n)} ∥ S (y_{λ (n)}) - y_{λ (n)} ∥ + (1 + 2 κ_{λ (n)}) ∥ y_{λ (n)} - T_{λ (n)} y_{λ (n)} ∥ \end{matrix}

(16)

for all

n \geq n_{0}

. Hence

∥ z_{λ (n)} - T_{λ (n)} z_{λ (n)} ∥ \to 0

as

n \to 0 .

Similary, we have

{lim sup}_{n \to \infty} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{λ (n)} - \overset{˘}{q}〉 \leq 0

.

Since

Φ_{λ (n)} \leq Φ_{λ (n) + 1}

and

0 < (1 - k) α_{λ (n)}

, we obtain

\begin{matrix} ∥ x_{λ (n)} - \overset{˘}{q} ∥^{2} & \leq ∥ x_{λ (n) + 1} - \overset{˘}{q} ∥^{2} \\ \leq (1 - (1 - k) α_{λ (n)}) {∥ x_{λ (n)} - \overset{˘}{q} ∥}^{2} \\ + α_{λ (n)} (1 - k) [\frac{2 Φ_{λ (n)}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - \overset{˘}{q} ∥ ∥ x_{λ (n)} - x_{λ (n) - 1} ∥] \\ + α_{λ (n)} (1 - k); [\frac{Φ_{λ (n)}^{2}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - x_{λ (n) - 1} ∥ + \frac{2}{1 - k} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{λ (n)} - \overset{˘}{q}〉] \\ \leq \frac{2 Φ_{λ (n)}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - \overset{˘}{q} ∥ ∥ x_{λ (n)} - x_{λ (n) - 1} ∥ \\ + \frac{Φ_{λ (n)}^{2}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - x_{λ (n) - 1} ∥ + \frac{2}{1 - k} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{λ (n)} - \overset{˘}{q}〉 . \end{matrix}

Since

\frac{Φ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0

and

{lim sup}_{n \to \infty} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{λ (n)} - \overset{˘}{q}〉 \leq 0

, it follows that

\underset{n \to \infty}{lim sup} {∥ x_{λ (n)} - \overset{˘}{q} ∥}^{2} \leq 0,

and hence

∥ x_{λ (n)} - \overset{˘}{q} ∥ \to 0

as

n \to \infty

. It implies by

\begin{matrix} ∥ x_{λ (n) + 1} - \overset{˘}{q} ∥^{2} & \leq (1 - (1 - k) α_{λ (n)}) {∥ x_{λ (n)} - \overset{˘}{q} ∥}^{2} \\ + α_{λ (n)} (1 - k) [\frac{2 Φ_{λ (n)}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - \overset{˘}{q} ∥ ∥ x_{λ (n)} - x_{λ (n) - 1} ∥] \\ + α_{λ (n)} (1 - k) [\frac{Φ_{λ (n)}^{2}}{α_{λ (n)} (1 - k)} ∥ x_{λ (n)} - x_{λ (n) - 1} ∥] \\ + α_{λ (n)} (1 - k) [\frac{2}{1 - k} 〈S (\overset{˘}{q}) - \overset{˘}{q}, z_{λ (n)} - \overset{˘}{q}〉] \end{matrix}

that

∥ x_{λ (n) + 1} - \overset{˘}{q} ∥ \to 0

as

n \to \infty

. Using Lemma 4, we obtain

∥ x_{n} - \overset{˘}{q} ∥ \leq ∥ x_{λ (n) + 1} - \overset{˘}{q} ∥ \to 0

as

n \to \infty

. Hence

x_{n} \to \overset{˘}{q}

. The proof is complete. □

Now, we employ Algorithm 4 for solving Problem (1). We obtain the following result as a consequence of Theorem 1.

Theorem 2.

Let

ω : R^{n} \to R

be strongly convex with parameter

σ > 0

and a continuously differentiable function such that

\nabla ω

is Lipschitz continuous with constant

L_{ω}

. Suppose that f and g satisfy the assumptions of (2). Let

{c_{n}}

be a sequence of positive real numbers in

(0, 2 / L_{f})

such that

c_{n} \to c

as

n \to \infty

where

c \in (0, 2 / L_{f})

and let

{x_{n}}

be a sequence generated by Algorithm 5. Then

{x_{n}}

converges strongly to

\overset{˘}{q} \in Γ

.

Algorithm 5 An Inertial Bilevel Gradient Modified Picard (IBiG-MP)

Initial. Take

x_{0}, x_{1} \in H

arbitrarily and

t_{1} = 0

.

For $n \geq 1$ . Set

\begin{matrix} θ_{n} = \{\begin{matrix} min {\frac{t_{n} - 1}{t_{n + 1}}, \frac{γ_{n} α_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}, \\ \frac{t_{n} - 1}{t_{n + 1}}, & otherwise, \end{matrix} \end{matrix}

(17)

where

t_{n + 1} = \frac{1 + \sqrt{1 + 4 t_{n}^{2}}}{2}

.

Step 1. Calculate

y_{n}

,

z_{n}

and

x_{n + 1}

as follows:

\begin{matrix} y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ z_{n} = (1 - α_{n} - κ_{n}) y_{n} + α_{n} (I - s \nabla ω) (y_{n}) + κ_{n} p r o x_{c_{n} g} (I - c_{n} \nabla f) y_{n}, \\ x_{n + 1} = p r o x_{c_{n} g} (I - c_{n} \nabla f) z_{n}, \end{matrix}

where

{α_{n}}

,

{κ_{n}} \subset [0, 1]

with

α_{n} + κ_{n} \leq 1

.

Then, update

n : = n + 1

and return to Step 1.

Proof.

Let

T_{n} = p r o x_{c_{n} g} (I - c_{n} \nabla f), n \in N

and

T = p r o x_{c g} (I - c \nabla f)

. By Lemma 1 and Remark 1, we know that

{T_{n}}

satisfies the condition (Z). From Theorem 1, we get that

{x_{n}}

converges to

\overset{˘}{q} \in Γ = a r g {min}_{x \in R^{n}} (f (x) + g (x))

. Notice also that

S = I - s \nabla ω (x)

is a k-contraction with parameter

k = \sqrt{1 - \frac{2 s σ L_{ω}}{σ + L_{ω}}}

, whenever

s \in (0, 2 / (σ + L_{ω}))

. It remains to show that the variational inequality holds true. By using

\overset{˘}{q} = P_{Γ} S (\overset{˘}{q})

and (5), for all

z \in Γ

, we obtain

\begin{matrix} \overset{˘}{q} = P_{Γ} S (\overset{˘}{q}) & \Leftrightarrow 〈 S (\overset{˘}{q}) - \overset{˘}{q}, z - \overset{˘}{q} 〉 \leq 0 \\ \Leftrightarrow 〈 \overset{˘}{q} - s \nabla ω (\overset{˘}{q}) - \overset{˘}{q}, z - \overset{˘}{q} 〉 \leq 0 \\ \Leftrightarrow 〈 s \nabla ω (\overset{˘}{q}), z - \overset{˘}{q} 〉 \geq 0 \\ \Leftrightarrow s 〈 \nabla ω (\overset{˘}{q}), z - \overset{˘}{q} 〉 \geq 0 \\ \Leftrightarrow 〈 \nabla ω (\overset{˘}{q}), z - \overset{˘}{q} 〉 \geq 0 for all z \in Γ = X^{*} . \end{matrix}

Thus,

\overset{˘}{q}

is an optimal solution for the Problem (1). □

4. Application

In this section, we employ Algorithm 5 as a machine learning algorithm for regression, a graph of the Sine function and classification of some data by using a model of SLFNs (Single Hidden Layer Feedforward Neural Networks ) and Extreme Learning Machine. The MATLAB computing environment and an Intel Core-i5 gen 8 with 8 GB RAM(Integrated Electronics Corporation, Santa Clara, CA, USA) are used to perform all results.

We first recall a basic knowledge of the extreme learning machine for regression and classification problems. Moreover, we use the propose algorithm for solving these problems and compare the performance among Big-SAM, iBig-SAM and aiBig-SAM.

Extreme learning machine (ELM) [20] is defined as follows: Let

D = {(x_{d}, q_{d}) : x_{d} \in R^{n}, q_{d} \in R^{m}, d = 1, 2, \dots, N}

be a training set of N distinct samples,

x_{d}

is an input data and

q_{d}

is a target. A standard SLFNs with M hidden nodes and activation function

φ (x)

is given by

\sum_{j = 1}^{M} ξ_{j} φ (〈 p_{j}, x_{d} 〉 + c_{j}) = o_{d}, d = 1, \dots, N,

where

ξ_{j}

is the weight vector connected between the j-th hidden node and the output node,

p_{j}

is the weight vector connected between the j-th hidden node and the input node, and

c_{j}

is the bias. The aim of SLFNs is to predict these N outputs such that

\sum_{d = 1}^{N} ∥o_{d} - q_{d}∥ = 0

. That is,

\begin{matrix} \sum_{j = 1}^{M} ξ_{j} φ (〈 p_{j}, x_{d} 〉 + c_{j}) = q_{d}, d = 1, \dots, N . \end{matrix}

(18)

We can rewrite the above system of linear equation by the following matrix equation:

\begin{matrix} R ξ = Q, \end{matrix}

(19)

where

\begin{matrix} R = {[\begin{matrix} φ (〈 p_{1}, x_{1} 〉 + c_{1}) & \dots & φ (〈 p_{M}, x_{1} 〉 + c_{M}) \\ ⋮ & ⋱ & ⋮ \\ φ (〈 p_{1}, x_{N} 〉 + c_{1}) & \dots & φ (〈 p_{M}, x_{N} 〉 + c_{M}) \end{matrix}]}_{N \times M} \\ ξ = {[ξ_{1}^{T}, \dots, ξ_{M}^{T}]}_{m \times M}^{T}, Q = {[q_{1}^{T}, \dots, q_{N}^{T}]}_{m \times N}^{T} . \end{matrix}

The objective of an SLFNs is estimating

ξ_{j}

,

p_{j}

and

c_{j}

for solving (18) while ELM aims to find only

ξ_{j}

with randomly

p_{j}

and

c_{j}

.

The Problem (19) can be considered as the following convex minimization problem:

min_{ξ} {∥R ξ - Q∥}_{2}^{2} + λ {∥ξ∥}_{1},

where

λ > 0

is called the regularization parameter. In Algorithm 5, we set

f (ξ) = {∥R ξ - Q∥}_{2}^{2}

and

g (ξ) = λ {∥ξ∥}_{1}

. We employ Algorithm 5 to solve convex bilevel optimization Problems (1) and (2) while the outer level function is defined by

ω (ξ) = \frac{1}{2} {∥ ξ ∥}_{2}^{2}

.

4.1. Regression a Sine Function

In our experiment for the regression of a graph of the Sine function, we construct a training set by randomly selecting 10 distinct data. we use sigmoid as our activation function. We also set the number of hidden nodes

M = 100

, and regularization parameter

λ = 1 \times 10^{- 5}

. In Algorithm 5, we set

T_{n} = p r o x_{c_{n} g} (I - c_{n} \nabla f)

. The Lipschitz constant

L_{f}

of gradient f is computed by

{2 ∥ R ∥}^{2}

. The values indicated in Table 1 are used for all control settings. We evaluate the result by

\begin{matrix} M e a n s q u a r e d e r r o r (M S E) = \frac{1}{n} \sum_{d = 1}^{n} {∥ o_{d} - q_{d} ∥}^{2} . \end{matrix}

We then obtain results compared to BIG-SAM, iBIG-SAM and aiBIG-SAM as in Figure 1 and Table 2.

Table 1 and Figure 1 show that Algorithm 5 gives a better performance to predict a sine function than others while there is little difference in processing time.

4.2. Data Classification

In order to classify datasets, we use four datasets from “https://www.kaggle.com/, accessed on 20 June 2020” and “https://archive.ics.uci.edu/, accessed on 20 June 2020” as follows:

Breast Cancer dataset [21] The dataset contains 11 attributes. In this dataset, we classify 2 classes of data.

Heart Disease UCI dataset [22] The dataset contains 14 attributes. There are two classes of this dataset.

Diabetes dataset [23] The dataset contains 9 attributes. In this dataset, we classify 2 classes of data.

Iris dataset [24] This dataset contains 3 classes of iris plant. The dataset contains 4 attributes. We aim to clasify each type of iris plant (Iris versicolour, Iris virginica and Iris setosa).

Table 3 shows the number of attributes of each dataset and the number of the training set (around

70 %

of data) and testing set (remainder

30 %

of data).

We set all control parameters the same as in Table 1 in Section 4.1, the number of hidden nodes

M = 100

, and activation function is sigmoid. Given a training set for each dataset as mentioned in Table 3, An accuracy of the output data is calculated by

\begin{matrix} accuracy = \frac{correct predicted data}{all data} \times 100 . \end{matrix}

We compare the iteration number, accuracy train and accuracy test of Algorithm 5 with the others on each dataset as seen in Table 4.

From Table 4, Algorithm 5 has a better performance of accuracy than BIG-SAM, iBIG-SAM, aiBIG-SAM in all experiments conducted.

5. Conclusions

We propose a new common fixed point algorithm for a countable family of nonexpansive operators and apply it to solve some convex bilevel optimization problems. We then prove a strong convergence theorem of the proposed algorithm under some suitable conditions. Moreover, we apply our algorithm to solve classification and regression problems. We also give numerical experiments for comparison of the performance of our proposed algorithm with the existing algorithms, the proposed algorithm is more efficient than the existing algorithms in the literature.

Author Contributions

Conceptualization, S.S.; Formal analysis, P.T. and S.S.; Investigation, P.T.; Methodology, S.S.; Supervision, S.S.; Validation, S.S. and B.P.; Writing—original draft, P.T.; Writing—review and editing, S.S. and B.P. All authors have read and agreed to the published version of the manuscript.

Funding

NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183].

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data sharing not applicable to this article as no datasets were generated or analysed during the current study.

Acknowledgments

The authors would like to thank the referees for valuable comments and suggestions for improving this work. This research has received funding support from the NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first author would like to thank Science Achievement Scholarship of Thailand (SAST) for the financial support. The second author was partially supported by Chiang Mai University under Fundamental Fund 2023.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. Siam J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
Polyak, B. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 35–44. [Google Scholar] [CrossRef]
Puangpee, J.; Suantai, S. A New Accelerated Viscosity Iterative Method for an Infinite Family of Nonexpansive Mappings with Applications to Image Restoration Problems. Mathematics 2020, 8, 615. [Google Scholar] [CrossRef]
Jailoka, P.; Suantai, S.; Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2021, 37, 449–461. [Google Scholar] [CrossRef]
Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef]
Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef]
Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim Methods Softw. 2019, 2019, 1–20. [Google Scholar] [CrossRef]
Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022, 2022, 1–29. [Google Scholar] [CrossRef]
Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to common fixed points of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
Aoyama, K.; Kimura, Y. Strong convergence theorems for strongly nonexpansive sequences. Appl. Math. Comput. 2011, 217, 7537–7545. [Google Scholar] [CrossRef]
Aoyama, K.; Kohsaka, F.; Takahashi, W. Strong convergence theorems by shrinking and hybrid projection methods for relatively nonexpansive mappings in Banach spaces. In Nonlinear Analysis and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009; pp. 2–7. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Pass: New York, NY, USA, 2004. [Google Scholar]
Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
Takahashi, W. Nonlinear Functional Analysis; Yokohama Publishers: Yokohama, Japan, 2000. [Google Scholar]
Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Aust. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef]
Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nostrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef] [PubMed]
Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef] [PubMed]
Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proc. Symp. Comput. Appl. Med. Care 1998, 1998, 261–265. [Google Scholar]
Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]

Figure 1. (a) A regression of the sine function at 100th step; (b) A regression of the sine function at 500th step.

Table 1. Setting parameters of each Algorithm.

Methods	Setting
Algorithm 5	$s = 0.01$ , $c_{n} = \frac{1}{L_{f}}$ , $t_{1} = 0$ , $α_{n} = \frac{1}{55 n}$ , $κ_{n} = \frac{0.499 (n + 1)}{n}$ , $γ_{n} = \frac{55 \cdot 10^{20}}{n}$
BIG-SAM	$s = 0.01$ , $α = 3$ , $c_{n} = \frac{1}{L_{f}}$ , $α_{n} = \frac{2 (0.1)}{1 - \frac{2 + c_{n} L_{f}}{4}}$ , $γ_{n} = \frac{α_{n}}{n^{0.01}}$
iBIG-SAM	$s = 0.01$ , $α = 3$ , $c_{n} = \frac{1}{L_{f}}$ , $α_{n} = \frac{2 (0.1)}{1 - \frac{2 + c_{n} L_{f}}{4}}$ , $γ_{n} = \frac{α_{n}}{n^{0.01}}$
aiBIG-SAM	$s = 0.01$ , $α = 3$ , $c_{n} = \frac{1}{L_{f}}$ , $α_{n} = \frac{1}{k + 2}$ , $γ_{n} = \frac{α_{n}}{n^{0.01}}$

Table 2. Numerical results for regression of a sine function with 500 iterations.

Methods	MSE	Computational Time
Algorithm 5	0.0011032	0.0216
BIG-SAM	0.3737940	0.0162
iBIG-SAM	0.3738094	0.0182
aiBIG-SAM	0.3685550	0.0172

Table 3. Training and testing sets of each dataset.

Dataset	Attributes	Sample
Dataset	Attributes	Train	Test
Breast Cancer	11	489	210
Heart Disease	14	213	90
Diabetes	9	538	230
Iris	4	105	45

Table 4. The iteration number of each algorithm with the best accuracy on each dataset.

Dataset	Algorithm	Iteration No.	Accuracy Train	Accuracy Test
	Algorithm 5	577	96.55	99.50
Breast	BIG-SAM	1500	96.55	98.49
Cancer	iBIG-SAM	1500	96.55	98.49
	aiBIG-SAM	1500	96.55	98.49
	Algorithm 5	185	87.14	82.80
Heart	BIG-SAM	1797	86.19	82.80
Disease	iBIG-SAM	1756	86.19	82.80
	aiBIG-SAM	2501	86.67	82.80
Diabetes	Algorithm 5	100	77.11	81.53
	BIG-SAM	690	76.01	81.08
	iBIG-SAM	695	76.37	81.08
	aiBIG-SAM	1280	76.92	81.18
Iris	Algorithm 5	190	98.10	100.00
	BIG-SAM	781	94.29	95.56
	iBIG-SAM	777	94.29	95.56
	aiBIG-SAM	771	94.29	95.56

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thongsri, P.; Panyanak, B.; Suantai, S. A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics 2023, 11, 702. https://doi.org/10.3390/math11030702

AMA Style

Thongsri P, Panyanak B, Suantai S. A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics. 2023; 11(3):702. https://doi.org/10.3390/math11030702

Chicago/Turabian Style

Thongsri, Piti, Bancha Panyanak, and Suthep Suantai. 2023. "A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications" Mathematics 11, no. 3: 702. https://doi.org/10.3390/math11030702

APA Style

Thongsri, P., Panyanak, B., & Suantai, S. (2023). A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications. Mathematics, 11(3), 702. https://doi.org/10.3390/math11030702

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Accelerated Algorithm Based on Fixed Point Method for Convex Bilevel Optimization Problems with Applications

Abstract

1. Introduction

2. Preliminaries

3. Results

4. Application

4.1. Regression a Sine Function

4.2. Data Classification

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI