Next Article in Journal
A Fractional Atmospheric Circulation System under the Influence of a Sliding Mode Controller
Previous Article in Journal
Shape Phase Transitions in Even–Even 176–198Pt: Higher-Order Interactions in the Interacting Boson Model
Previous Article in Special Issue
Some Fixed Point Results in E-Metric Spaces and an Application
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification

by
Panadda Thongpaen
1,
Warunun Inthakon
2,3,
Taninnit Leerapun
4 and
Suthep Suantai
2,3,*
1
Graduate Ph.D’s Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
2
Research Group in Mathematics and Applied Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand
3
Data Science Research Center, Department of Mathematics, Chiang Mai University, Chiang Mai 50200, Thailand
4
Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
*
Author to whom correspondence should be addressed.
Symmetry 2022, 14(12), 2617; https://doi.org/10.3390/sym14122617
Submission received: 3 November 2022 / Revised: 20 November 2022 / Accepted: 30 November 2022 / Published: 10 December 2022
(This article belongs to the Special Issue Symmetry in Nonlinear Analysis and Boundary Value Problems)

Abstract

:
In the development of algorithms for convex optimization problems, symmetry plays a very important role in the approximation of solutions in various real-world problems. In this paper, based on a fixed point algorithm with the inertial technique, we proposed and study a new accelerated algorithm for solving a convex bilevel optimization problem for which the inner level is the sum of smooth and nonsmooth convex functions and the outer level is a minimization of a smooth and strongly convex function over the set of solutions of the inner level. Then, we prove its strong convergence theorem under some conditions. As an application, we apply our proposed algorithm as a machine learning algorithm for solving some data classification problems. We also present some numerical experiments showing that our proposed algorithm has a better performance than the five other algorithms in the literature, namely BiG-SAM, iBiG-SAM, aiBiG-SAM, miBiG-SAM and amiBiG-SAM.

1. Introduction

Breast cancer is the most common type of cancer in Thai women. Anxiously, although the breast cancer can be treated, the risk of developing diseases that affect the heart or blood vessels is very high.
The three most common methods for treating breast cancer are surgery, chemotherapy and radiotherapy. However, radiotherapy often involves some incidental exposure of the heart to ionizing radiation because it was discovered, in [1], that the exposure of the heart to ionizing radiation during the therapy increases the consequent rate of ischemic heart disease which begins within a few years after exposure and continues for at least 20 years. Thus, women with preexisting cardiac risk factors have higher absolute increases in risk from this therapy than other women.
Therefore, if a patient is diagnosed with heart disease early, they will be able to prevent the risks from this type of treatment. Similarly, the malignant cells of a patient can be treated before it spreads to other parts of the body when cancer is detected at an early stage. To support the diagnosis of breast cancer and heart disease, our objective in this work is developing an algorithm for such patient prediction.
It is well known that symmetry serves as the foundation for fixed-point and optimization theory and methods. We first recall the background of some mathematical models. Consider the constrained minimization problem:
min x Γ F ( x ) ,
when H is a real Hilbert space, F : H R is a strongly convex differentiable function with convexity parameter ρ , and Γ is the nonempty set of minimizers of the unconstrained minimization problem, as in the form:
min x ϕ ( x ) + ψ ( x ) ,
where ψ , ϕ : H R { + } are proper convex and lower semicontinuous functions and ϕ is a smooth function. Problems (1) and (2) are called outer-level and inner-level problems, respectively. In [2,3,4,5], such a problem is labeled as a simple bilevel optimization problem.
In 2017, Sabach and Shtern [6] introduced the Bilevel Gradient Sequential Averaging Method (BiG-SAM) for solving (1) and (2) as defined by Algorithm 1.
Algorithm 1 BiG-SAM: Bilevel Gradient Sequential Averaging Method
1:
Initial step. Let x 1 R n and { α k } is a sequence in ( 0 , 1 ] satisfying the conditions assumed in [7].
Select λ 0 , 1 L ϕ and σ 0 , 2 L F + ρ while L ϕ is the Lipschitz gradient of ϕ and L F is the Lipschitz gradient of F .
2:
Step 1. For k 1 , compute
y k : = p r o x λ ψ x k λ ϕ ( x k ) , u k : = x k σ F ( x k ) , x k + 1 : = α k u k + ( 1 α k ) y k ,
where ϕ and F are gradients of ϕ and F , respectively.
They presented that BiG-SAM appears simpler and cheaper than the method desired in [8]. Moreover, the authors in [6] used a numerical example to show that BiG-SAM outruns the method in [8] for solving problems (1) and (2). Up to this point, the algorithm in [6] seems to be the most efficient method for convex simple bilevel optimization problems.
In 2019, Shehu et al. [9] utilized the notion of an inertial technique, which was proposed by Polyak [10], to be beneficial to accelerate the convergence rate of the BiG-SAM method, called iBiG-SAM, as defined by Algorithm 2.
Algorithm 2 iBiG-SAM: Inertial with Bilevel Gradient Sequential Averaging Method
1:
Initial step. Let L ϕ and L F be Lipschitz gradients of ϕ and F , respectively. Given { α k } be a sequence in ( 0 , 1 ) , λ 0 , 2 L ϕ and σ 0 , 2 L F + ρ . Select arbitrary points x 1 , x 0 R n and α 3 .
2:
Step 1. Choose μ k [ 0 , μ k ¯ ] such that for k 1 ,
μ k ¯ : = min k k + α 1 , η k x k x k 1 if x k x k 1 , k k + α 1 otherwise .
3:
Step 2. Compute
z k : = x k + μ k ( x k x k 1 ) , y k : = p r o x λ ψ z k λ ϕ z k , u k : = z k σ F ( z k ) , x k + 1 : = α k u k + ( 1 α k ) y k ,
where ϕ and F are gradients of ϕ and F , respectively.
They also proved that the sequence { x k } generated by iBiG-SAM converges to the optimal solution of problems (1) and (2) under the sequence { α k } satisfying conditions:
(1)
lim k α k = 0 ;
(2)
k = 1 α k = + .
The above assumptions are derived from [7] by reducing some situations.
Recently, to accelerate the convergence of the iBiG-SAM algorithm, Duan and Zhang [11] proposed three algorithms of inertial approximation methods based on the proximal gradient algorithm as defined by Algorithms 3–5.
Algorithm 3 aiBiG-SAM: The alternated inertial Bilevel Gradient Sequential Averaging Method
1:
Initial step. Let L ϕ and L F be Lipschitz gradients of ϕ and F , respectively. Given λ 0 , 2 L ϕ , σ 0 , 2 L F + ρ , ϵ > 0 . Let { α k } be a sequence in ( 0 , 1 ) satisfying the conditions assumed in [9]. Select arbitrary points x 1 , x 0 H and α 3 . Set k = 1 .
2:
Step 1. Compute
z k = x k + μ k ( x k x k 1 ) , if k = odd ; x k if k = even .
3:
When k is odd, choose μ k such that 0 | μ k | μ k ¯ with μ k ¯ defined by
μ k ¯ : = min k k + α 1 , η k x k x k 1 if x k x k 1 , k k + α 1 if x k = x k 1 .
4:
When k is even, μ k = 0 .
5:
Step 2. Compute
y k = prox λ ψ ( z k λ ϕ ( z k ) ) , u k = z k σ F ( z k ) , x k + 1 = α k u k + ( 1 α k ) y k , k 1 ,
where ϕ and F are gradients of ϕ and F , respectively.
6:
Step 3. If x k x k 1 < ϵ , then stop. Otherwise, set k = k + 1 and go to Step 1.
Algorithm 4 miBiG-SAM: The multi-step inertial Bilevel Gradient Sequential Averaging Method
1:
Initial step. Let L ϕ and L F be Lipschitz gradients of ϕ and F , respectively. Given λ k 0 , 2 L ϕ , σ 0 , 2 L F + ρ , ϵ > 0 and α 3 . Let { α k } be a sequence in ( 0 , 1 ) satisfying the conditions assumed in [9]. Select arbitrary points x 0 , x 1 , , x 2 q H and q N + . Set k = 1 .
2:
Step 1. Given x k , x k 1 , , x k q + 1 and compute
z k = x k + i Q μ i , k ( x k i x k 1 i ) ,
where Q = { 0 , 1 , , q 1 } . Choose μ i , k such that 0 | μ i , k | μ k ¯ with μ k ¯ defined by
μ k ¯ : = min k k + α 1 , η k i Q x k i x k 1 i if i Q x k i x k 1 i 0 , k k + α 1 otherwise .
3:
Step 2. Compute
y k = p r o x λ k ψ ( z k λ k ϕ ( z k ) ) , u k = z k σ F ( z k ) , x k + 1 = α k u k + ( 1 α k ) y k , k 1 ,
where ϕ and F are gradients of ϕ and F , respectively.
4:
Step 3. If x k x k 1 < ϵ , then stop. Otherwise, set k = k + 1 and go to Step 1.
Algorithm 5 amiBiG-SAM: The multi-step alternated inertial Bilevel Gradient Sequential Averaging Method
1:
Initial step. Let L ϕ and L F be Lipschitz gradients of ϕ and F , respectively. Given λ k 0 , 2 L ϕ , σ 0 , 2 L F + ρ , ϵ > 0 and α 3 . Let { α k } be a sequence in ( 0 , 1 ) satisfying the conditions assumed in [9]. Select arbitrary points x 0 , x 1 , , x 2 q H and q N + . Set k = 1 .
2:
Step 1. Given x k , x k 1 , , x k q + 1 and compute
z k = x k + i Q μ i , k ( x k i x k 1 i ) , if k = odd ; x k if k = even .
where Q = { 0 , 1 , , q 1 } . Choose μ i , k such that 0 | μ i , k | μ k ¯ with μ k ¯ defined by
μ k ¯ : = min k k + α 1 , η k i Q x k i x k 1 i if i Q x k i x k 1 i 0 , k k + α 1 otherwise .
3:
Step 2. Compute
y k = prox λ k ψ ( z k λ k ϕ ( z k ) ) , u k = z k σ F ( z k ) , x k + 1 = α k u k + ( 1 α k ) y k , k 1 ,
where ϕ and F are gradients of ϕ and F , respectively.
4:
Step 3. If x k x k 1 < ϵ , then stop. Otherwise, set k = k + 1 and go to Step 1.
The convergence behavior of Algorithms 3–5 was shown, in [11], to be better than that of BiG-SAM and iBiG-SAM.
It is known that the following variational inequality:
F ( x ) , x x 0 , x Γ
implies x is a solution of convex bilevel optimization problem (1); for more details, see [12]. For recent results, see [13,14] and references therein.
It is worth noting that x Γ can be described by fixed-point equation:
prox λ ψ ( x λ ϕ ( x ) ) = x ,
where λ > 0 and prox λ ψ ( x ) = arg min u H ψ ( u ) + 1 2 λ u x 2 2 , which was introduced by Moreau [15]. This means that solving the bilevel problem is equivalent to finding a fixed point of the proximal operator. It is well known that the fixed point theory plays a very crucial role in solving many real-world problems, such as problems in engineering, economics, machine learning and data science, see [16,17,18,19,20,21,22,23,24] for more details. For the past three decades, several fixed point algorithms were introduced and studied by many authors, see [25,26,27,28,29,30,31,32,33,34]. Some of these algorithms were applied for solving various problems in images and signal processing, data classification and regression, for example, see [19,20,21,22,23]. In addition, fuzzy classification is another important data classification mechanism, see [35,36].
All of the works mentioned above motivate and inspire us to establish a new accelerated algorithm to solve a convex bilevel optimization problem and apply it for solving data classification problems.
We organize the paper as follows: In Section 2, we provide some basic definitions and useful lemmas used in the later section. The main results of the paper are given in Section 3. In this section, we introduce and study a new accelerated algorithm for solving a convex bilevel optimization problem and then prove a strong convergence of our proposed algorithm. After that, we apply our main results for solving a data classification problem in Section 4. Finally, a brief conclusion of the paper is given in Section 5.

2. Preliminaries

Throughout this paper, a real Hilbert space, denoted by H , with the inner product · , · , inducing the norm · .
A mapping T : C C is called L-Lipschitz if there exists L > 0 such that
T x T y L x y , x , y C H .
If L [ 0 , 1 ) , then T is called contraction. It is called nonexpansive if L = 1 . We denote by F ( T ) the set of all fixed points of T, that is, F ( T ) = { x C : T x = x } . For a sequence { x k } in H , we denote the strong convergence and the weak convergence of { x k } to u H by x k u and x k u , respectively.
Let { T k } and be families of nonexpansive operators from C into itself with F ( ) k = 1 F ( T k ) , where F ( ) is the set of all common fixed points of and F ( T k ) is the set of all fixed points of T k .
The sequence { T k } is said to satisfy the NST-condition ( I ) with if for every bounded sequence { x k } in C,
lim k x k T k x k = 0 lim k x k T x k = 0 , T ,
see [37] for more details. In particular, if = { T } , then { T k } is a sequence satisfying NST-condition ( I ) with T.
Later, NST -condition was proposed by Nakajo et al. [38] which is a weaker condition than that of NST-condition ( I ) . A sequence { T k } is said to satisfy NST -condition if for every bounded sequence { x k } in C , if lim k x k x k + 1 = 0 and lim k x k T k x k = 0 imply ω w ( x k ) k = 1 F ( T k ) , where ω w ( x k ) is the set of all weak cluster points of { x k } . It is easy to see that if { T k } satisfies the NST-condition ( I ) , then it satisfies the NST -condition.
In a real Hilbert space H , these properties hold: for any u , v H ,
(1)
u + v 2 u 2 + 2 v , u + v ;
(2)
r u + ( 1 r ) v 2 = r u 2 + ( 1 r ) v 2 r ( 1 r ) u v 2 , r [ 0 , 1 ] .
If C is a nonempty closed convex subset of H , then for each x H , there exists a unique element in C, say P C x , such that
x P C x x y , y C .
The mapping P C is known as the metric projection of H onto C and it is also nonexpansive. Moreover,
x P C x , y P C x 0
holds for all x H and y C .
The following results are also essential for proving our main results.
Lemma 1
([39]). Let { u k } , { t k } be nonnegative real numbers sequences, { v k } a sequence in [ 0 , 1 ] and { w k } a sequence of numbers such that
u k + 1 ( 1 v k ) u k + v k w k + t k , k N .
If all following conditions hold:
(1) 
k = 1 v k = + ;
(2) 
k = 1 t k < + ;
(3) 
lim sup k w k 0 .
Then, lim sup k u k = 0 .
Lemma 2
([40]). Let H be a real Hilbert space and T : H H a nonexpansive mapping with F ( T ) . Then, for any sequence { x k } in H such that x k u H and lim k x k T x k = 0 imply u F ( T ) .
Lemma 3
([41]). Let { λ k } be a sequence of real numbers that does not decrease at infinity in the sense that there exists a subsequence { λ k i } of { λ k } which satisfies λ k i < λ k i + 1 for all i N . Define { φ ( k ) } k m 0 of integers as follows:
φ ( k ) = max { j k : λ k < λ k + 1 } ,
where m 0 N such that { j m 0 : λ k < λ k + 1 } . Then, the following hold:
(1) 
φ ( m 0 ) φ ( m 0 + 1 ) a n d φ ( k ) ;
(2) 
λ φ ( k ) λ φ ( k ) + 1 and λ k λ φ ( k ) + 1 for all k m 0 .
Proposition 1
([6]). Suppose F : H R is strongly convex with convexity parameter ρ > 0 and continuously differentiable function such that F is Lipschitz continuous with constant L F . Then, the mapping I σ F is contraction for all σ 2 L F + ρ , where I is the identity operator.
Definition 1
([15]). Let ψ : H R { + } be a proper convex and lower semicontinuous function. The proximity operator of parameter λ > 0 of ψ at u H is denoted by prox λ ψ and it is defined by
p r o x λ ψ ( u ) = arg min v H ψ ( v ) + 1 2 λ v u 2 .
The operator T : = prox λ ψ ( I λ ϕ ) is known as a forward–backward operator of ϕ and ψ with respect to λ , where λ > 0 and ϕ is the gradient operator of function ϕ . Moreover, T is a nonexpansive mapping whenever λ ( 0 , 2 L ϕ ) where L ϕ is a Lipschitz gradient of ϕ .
Lemma 4
([42]). For a real Hilbert space H , let ψ : H R { + } be a proper convex and lower semicontinuous function, and ϕ : H R be convex differentiable with gradient ϕ being L ϕ -Lipschitz gradient for some L ϕ > 0 . If { T k } is the family of forward–backward operators of ϕ and ψ with respect to c k 0 , 2 L ϕ such that { c k } converges to c, then { T k } satisfies NST-condition (I) with T, where T is the forward–backward operator of ϕ and ψ with respect to c 0 , 2 L ϕ .

3. Main Results

We start this section by introducing a new common fixed point algorithm using the inertial technique together with the modified Ishikawa iteration (see [43,44,45] for more details) to obtain a strong convergence theorem for two countable families of nonexpansive mappings in a real Hilbert space as seen in Algorithm 6.
Algorithm 6 IVAM (I): Inertial Viscosity Approximation Method for Two Families of Nonexpansive Mappings
1:
Input. Let x 0 , x 1 H , { η k } a positive sequence and f : H H a contraction with constant γ . Choose { α k } , { β k } , { ξ k } ( 0 , 1 ) and θ k 0 .
2:
Select μ k ( 0 , μ k ¯ ] such that for k 1 ,
μ k ¯ : = min θ k , η k x k x k 1 if x k x k 1 , θ k otherwise .
3:
Compute
z k = x k + μ k ( x k x k 1 ) , y k = β k z k + ( 1 β k ) T k z k , w k = ξ k y k + ( 1 ξ k ) S k y k x k + 1 = α k f ( w k ) + ( 1 α k ) w k .
Lemma 4
Let { T k } and { S k } be two countable families of nonexpansive mappings from H into itself such that Γ = k = 1 F ( T k ) k = 1 F ( S k ) and let f : H H be a contraction. If lim k η k α k = 0 , then the sequence { x k } generated by Algorithm 6 is bounded. Furthermore, { f ( w k ) } , { w k } , { y k } and { z k } are bounded.
Proof. 
Let x Γ be such that x = P Γ f ( x ) . Then, by the definition of z k and y k in Algorithm 6, for every k N , we have
z k x = x k + μ k ( x k x k 1 ) x x k x + μ k x k x k 1 ,
and
y k x β k z k x + ( 1 β k ) T k z k x β k z k x + ( 1 β k ) z k x = z k x .
This implies
w k x ξ k y k x + ( 1 ξ k ) S k y k x ξ k y k x + ( 1 ξ k ) y k x = y k x
z k x .
It follows from (8) and (11) that
x k + 1 x = α k f ( w k ) x + ( 1 α k ) ( w k x ) α k f ( w k ) x + ( 1 α k ) w k x = α k f ( w k ) f ( x ) + f ( x ) x + ( 1 α k ) w k x α k f ( w k ) f ( x ) + α k f ( x ) x + ( 1 α k ) w k x α k γ w k x + α k f ( x ) x + ( 1 α k ) w k x = [ 1 α k ( 1 γ ) ] w k x + α k f ( x ) x [ 1 α k ( 1 γ ) ] z k x + α k f ( x ) x [ 1 α k ( 1 γ ) ] ( x k x + μ k x k x k 1 ) + α k f ( x ) x = [ 1 α k ( 1 γ ) ] x k x + μ k x k x k 1 α k ( 1 γ ) μ k x k x k 1 + α k f ( x ) x [ 1 α k ( 1 γ ) ] x k x + μ k x k x k 1 + α k f ( x ) x = [ 1 α k ( 1 γ ) ] x k x + α k ( 1 γ ) μ k α k x k x k 1 + f ( x ) x 1 γ max x k x , μ k α k x k x k 1 + f ( x ) x 1 γ .
Using lim k η k α k = 0 and (7), we obtain
lim k μ k α k x k x k 1 = lim k η k α k x k x k 1 x k x k 1 = lim k η k α k = 0 .
Thus, there exists 0 < M ¯ such that μ k α k x k x k 1 < M ¯ for all k N , which implies
x k + 1 x max x k x , M ¯ + f ( x ) x 1 γ .
By mathematical induction, we conclude that x k x M for all k N , where M = max x 1 x , M ¯ + f ( x ) x 1 γ . It follows that { x k } is bounded. This implies that the sequences { f ( w k ) } , { w k } , { y k } and { z k } are bounded.    □
We now prove a strong convergence theorem of the sequence { x k } generated by Algorithm 6 to solve a common fixed point problem as follows.
Theorem 1
Let { T k } and { S k } be two countable families of nonexpansive mappings from H into H such that Γ = k = 1 F ( T k ) k = 1 F ( S k ) . Let { x k } be a sequence generated by Algorithm 6. Suppose { T k } and { S k } satisfy NST -conditions and the following conditions hold:
(1) 
0 < a < α k < a ^ < 1 ;
(2) 
0 < b < β k < b ^ < 1 ;
(3) 
0 < c < ξ k < c ^ < 1 ;
(4) 
lim k α k = 0 and k = 1 α k = + ;
(5) 
lim k η k α k = 0 ,
where a , b , c , a ^ , b ^ and c ^ are real positive numbers. Then, { x k } converges strongly to x Γ , where x = P Γ f ( x ) .
Proof. 
Let x Γ be such that x = P Γ f ( x ) . It follows from (11) that
x k + 1 x 2 = α k [ f ( w k ) f ( x ) ] + ( 1 α k ) ( w k x ) + α k ( f ( x ) x ) 2 α k [ f ( w k ) f ( x ) ] + ( 1 α k ) ( w k x ) 2 + 2 α k f ( x ) x , x k + 1 x = α k f ( w k ) f ( x ) 2 + ( 1 α k ) w k x 2 α k ( 1 α k ) ( f ( w k ) f ( x ) ) ( w k x ) 2 + 2 α k f ( x ) x , x k + 1 x α k f ( w k ) f ( x ) 2 + ( 1 α k ) w k x 2 + 2 α k f ( x ) x , x k + 1 x α k γ 2 w k x 2 + ( 1 α k ) w k x 2 + 2 α k f ( x ) x , x k + 1 x = [ 1 α k ( 1 γ 2 ) ] w k x 2 + 2 α k f ( x ) x , x k + 1 x [ 1 α k ( 1 γ 2 ) ] z k x 2 + 2 α k f ( x ) x , x k + 1 x .
This together with z k x = ( x k x ) + μ k ( x k x k 1 ) and 0 γ < 1 give us that
x k + 1 x 2 [ 1 α k ( 1 γ ) ] ( x k x ) + μ k ( x k x k 1 ) 2 + 2 α k f ( x ) x , x k + 1 x [ 1 α k ( 1 γ ) ] x k x 2 + 2 μ k x k x x k x k 1 + μ k 2 x k x k 1 2 + 2 α k f ( x ) x , x k + 1 x = [ 1 α k ( 1 γ ) ] x k x 2 + 2 α k f ( x ) x , x k + 1 x + [ 1 α k ( 1 γ ) ] μ k x k x k 1 2 x k x + μ k x k x k 1 .
Because lim k μ k x k x k 1 = lim k α k μ k α k x k x k 1 = 0 , there exists 0 < M 1 such that
μ k x k x k 1 M 1
for all k N .
Put M 2 : = sup k N { x k x , M 1 } . This together with (13) and (14) yields
x k + 1 x 2 [ 1 α k ( 1 γ ) ] x k x 2 + 2 α k f ( x ) x , x k + 1 x + μ k x k x k 1 ( 2 x k x + M 1 ) [ 1 α k ( 1 γ ) ] x k x 2 + 2 α k f ( x ) x , x k + 1 x + μ k x k x k 1 ( 2 M 2 + M 2 ) = [ 1 α k ( 1 γ ) ] x k x 2 + 2 α k f ( x ) x , x k + 1 x + 3 M 2 μ k x k x k 1 = [ 1 α k ( 1 γ ) ] x k x 2 + α k ( 1 γ ) 3 M 2 μ k α k x k x k 1 + 2 f ( x ) x , x k + 1 x 1 γ .
We now set u k , v k and s k as the following:
u k : = x k x 2 , v k : = α k ( 1 γ )
and
s k : = 3 M 2 μ k α k ( 1 γ ) x k x k 1 + 2 1 γ f ( x ) x , x k + 1 x .
So, we have from (15) that
u k + 1 ( 1 v k ) u k + v k s k , k N .
Next, we analyze the convergence of sequence { x k } by considering the following two cases:
Case 1. Suppose { x k x } k m o is nonincreasing for some m 0 N . Because { x k x } is bounded from below by zero, we obtain lim k x k x exists. It follows from lim k α k = 0 and k = 1 α k = + that
k = 1 v k = k = 1 α k ( 1 γ ) = ( 1 γ ) k = 1 α k = + .
To apply Lemma 1, we need to claim that lim sup k f ( x ) x , x k + 1 x 0 . Indeed, by definition of y k , we have
y k x 2 = β k ( z k x ) + ( 1 β k ) ( T k z k x ) 2 = β k z k x 2 + ( 1 β k ) T k z k x 2 β k ( 1 β k ) z k T k z k 2 β k z k x 2 + ( 1 β k ) z k x 2 β k ( 1 β k ) z k T k z k 2 = z k x 2 β k ( 1 β k ) z k T k z k 2 .
By Algorithm 6, (10) and (17), we obtain
x k + 1 x 2 = α k ( f ( w k ) x ) + ( 1 α k ) ( w k x ) 2 = α k f ( w k ) x 2 + ( 1 α k ) w k x 2 α k ( 1 α k ) f ( w k ) w k 2 α k f ( w k ) x 2 + ( 1 α k ) w k x 2 α k f ( w k ) x 2 + ( 1 α k ) y k x 2 α k f ( w k ) x 2 + ( 1 α k ) z k x 2 β k ( 1 β k ) z k T k z k 2 = α k f ( w k ) x 2 + ( 1 α k ) ( x k x ) + μ k ( x k x k 1 ) 2 ( 1 α k ) β k ( 1 β k ) z k T k z k 2 α k f ( w k ) x 2 ( 1 α k ) β k ( 1 β k ) z k T k z k 2 + ( 1 α k ) x k x 2 + 2 μ k x k x x k x k 1 + μ k 2 x k x k 1 2 ,
which implies that for any k N ,
( 1 α k ) β k ( 1 β k ) z k T k z k 2 α k f ( w k ) x 2 + ( 1 α k ) x k x 2 x k + 1 x 2 + 2 μ k ( 1 α k ) x k x x k x k 1 + μ k 2 ( 1 α k ) x k x k 1 2 = α k ( f ( w k ) f ( x k ) + f ( x k ) x 2 ) + ( 1 α k ) x k x 2 x k + 1 x 2 + 2 μ k ( 1 α k ) x k x x k x k 1 + μ k 2 ( 1 α k ) x k x k 1 2 α k f ( w k ) f ( x k ) 2 + 2 f ( w k ) f ( x k ) f ( x k ) x + α k f ( x k ) x 2 + ( 1 α k ) x k x 2 x k + 1 x 2 + 2 μ k ( 1 α k ) x k x x k x k 1 + μ k 2 ( 1 α k ) x k x k 1 2 .
Taking k , we obtain
lim k z k T k z k = 0 .
This implies
lim k y k z k = lim k ( 1 β k ) T k z k z k lim k T k z k z k = 0 .
Because z k x k = μ k x k x k 1 and lim k μ k x k x k 1 = 0 , we derive
lim k z k x k = 0 .
From y k x k y k z k + z k x k , (20) and (21), we obtain
lim k y k x k = 0 .
Moreover, we have from (9), (18) and nonexpansiveness of S k that
x k + 1 x 2 α k f ( w k ) x 2 + ( 1 α k ) w k x 2 = α k f ( w k ) x 2 + w k x 2 α k w k x 2 α k f ( w k ) x 2 + w k x 2 = α k f ( w k ) x 2 + ξ k ( y k x ) + ( 1 ξ k ) ( S k y k x ) 2 = α k f ( w k ) x 2 + ξ k y k x 2 + ( 1 ξ k ) S k y k x 2 ξ k ( 1 ξ k ) y k S k y k 2 α k f ( w k ) x 2 + ξ k y k x 2 + ( 1 ξ k ) y k x 2 ξ k ( 1 ξ k ) y k S k y k 2 = α k f ( w k ) x 2 + y k x 2 ξ k ( 1 ξ k ) y k S k y k 2 α k f ( w k ) x 2 + z k x 2 ξ k ( 1 ξ k ) y k S k y k 2 = α k f ( w k ) x 2 + ( x k x ) + μ k ( x k x k 1 ) 2 ξ k ( 1 ξ k ) y k S k y k 2 α k f ( w k ) x 2 + x k x 2 + 2 μ k x k x x k x k 1 + μ k 2 x k x k 1 2 ξ k ( 1 ξ k ) y k S k y k 2 .
The above inequality implies
ξ k ( 1 ξ k ) y k S k y k 2 α k f ( w k ) x 2 + x k x 2 + 2 μ k x k x 2 x k x k 1 2 + μ k 2 x k x k 1 2 x k + 1 x 2 .
By assumptions (3), (4) and lim k x k x exists together with lim k μ k x k x k 1 = 0 , we obtain
lim k y k S k y k = 0 .
From the definition of w k and assumption (3), we have
w k x k ξ k y k x k + ( 1 ξ k ) S k y k x k ξ k y k x k + ( 1 ξ k ) S k y k y k + y k x k = y k x k + ( 1 ξ k ) S k y k y k y k x k + S k y k y k .
It follows from (22) and (23) that
lim k w k x k = 0 .
Using the definition of x k + 1 , we have
x k + 1 x k α k f ( w k ) x k + ( 1 α k ) w k x k α k f ( w k ) f ( x ) + α k f ( x ) x k + ( 1 α k ) w k x k α k γ w k x + α k f ( x ) x k + w k x k .
Due to lim k α k = 0 , (24) and the boundedness of { x k } and { w k } , we obtain
lim k x k + 1 x k = 0 .
Let ζ = lim sup k f ( x ) x , x k + 1 x . The boundedness of { x k } implies that there exists a subsequence { x k j } such that
lim j f ( x ) x , x k j + 1 x = lim sup k f ( x ) x , x k + 1 x = ζ
and x k j x H . It derives from the nonexpansiveness of T k that
x k T k x k x k z k + z k T k z k + T k z k T k x k x k z k + z k T k z k + z k x k = 2 x k z k + z k T k z k .
It follows from (19) and (21) that lim k x k T k x k = 0 .
Using Lemma 2, we obtain x k = 1 F ( T k ) . Due to S k being nonexpansive, we have for any k N ,
x k S k x k x k y k + y k S k y k + S k y k S k x k x k y k + y k S k y k + y k x k = 2 x k y k + y k S k y k ,
which implies lim k x k S k x k = 0 by employing (22) and (23). By Lemma 2, we obtain x k = 1 F ( S k ) . Because lim k x k + 1 x k = 0 , it follows that x k j + 1 converges weakly to x.
In addition, utilizing x = P Γ f ( x ) together with (6) gives us that
ζ = lim j f ( x ) x , x k j + 1 x = f ( x ) x , x x 0 .
Therefore,
lim sup k f ( x ) x , x k x = 0 .
Invoking lim k μ k α k x k x k 1 = 0 and (28), we obtain
lim sup k s k = lim sup k 3 M 2 μ k α k ( 1 γ ) x k x k 1 + 2 1 γ f ( x ) x , x k + 1 x 0 .
Coming back to (16), by Lemma 1, we can conclude that x k x .
Case 2. Suppose that { x k x } is not a monotonically decreasing sequence. To apply Lemma 3, put λ k : = x k x . Then, there exists a subsequence { λ k i } of { λ k } such that
λ k i < λ k i + 1 , i N .
In this case, let φ : N N be defined by
φ ( k ) : = max { j N : j k , λ k j < λ k j + 1 } .
Therefore, φ ( k ) satisfies the condition in Lemma 3. Hence, we have λ φ ( k ) λ φ ( k ) + 1 for all k . This means that
x φ ( k ) x x φ ( k ) + 1 x , k .
As the proof in Case 1, we also have that for any k ,
β φ ( k ) ( 1 β φ ( k ) ) ( 1 α φ ( k ) ) z φ ( k ) T φ ( k ) z φ ( k ) 2 α φ ( k ) f ( w φ ( k ) ) f ( x φ ( k ) ) 2 + α φ ( k ) f ( x φ ( k ) ) x 2 + 2 α φ ( k ) f ( w φ ( k ) ) f ( x φ ( k ) ) f ( x φ ( k ) ) x α φ ( k ) x φ ( k ) x 2 + x φ ( k ) x 2 x φ ( k ) + 1 x 2 + μ φ ( k ) ( 1 α φ ( k ) ) x φ ( k ) x φ ( k ) 1 2 x φ ( k ) x + μ φ ( k ) x φ ( k ) x φ ( k ) 1 .
Because x φ ( k ) x x φ ( k ) + 1 x for all k, the above inequality leads to
β φ ( k ) ( 1 β φ ( k ) ) ( 1 α φ ( k ) ) z φ ( k ) T φ ( k ) z φ ( k ) 2 α φ ( k ) f ( w φ ( k ) ) f ( x φ ( k ) ) 2 + α φ ( k ) f ( x φ ( k ) ) x 2 + 2 α φ ( k ) f ( w φ ( k ) ) f ( x φ ( k ) ) f ( x φ ( k ) ) x α φ ( k ) x φ ( k ) x 2 + μ φ ( k ) ( 1 α φ ( k ) ) x φ ( k ) x φ ( k ) 1 2 x φ ( k ) x + μ φ ( k ) x φ ( k ) x φ ( k ) 1 .
Using lim k α φ ( k ) = 0 and lim k μ φ ( k ) x φ ( k ) x φ ( k ) 1 = 0 , we obtain
lim k z φ ( k ) T φ ( k ) z φ ( k ) = 0 .
Similar to the proof of Case 1, we conclude
lim k z φ ( k ) x φ ( k ) = 0 ,
lim k y φ ( k ) x φ ( k ) = 0 ,
lim k y φ ( k ) S φ ( k ) y φ ( k ) = 0 ,
and so
lim k x φ ( k ) + 1 x φ ( k ) = 0 .
Put δ : = lim sup k f ( x ) x , x φ ( k ) + 1 x . Due to { x φ ( k ) } being bounded, there exists a subsequence { x φ ( k j ) } of { x φ ( k ) } such that
δ : = lim sup k f ( x ) x , x φ ( k ) + 1 x = δ : = lim j f ( x ) x , x φ ( k j ) + 1 x
and x φ ( k j ) ν for some ν H . The nonexpansiveness of T φ ( k ) and S φ ( k ) implies
x φ ( k ) T φ ( k ) x φ ( k ) x φ ( k ) z φ ( k ) + z φ ( k ) T φ ( k ) z φ ( k ) + T φ ( k ) z φ ( k ) T φ ( k ) x φ ( k ) x φ ( k ) z φ ( k ) + z φ ( k ) T φ ( k ) z φ ( k ) + z φ ( k ) x φ ( k )
and
x φ ( k ) S φ ( k ) x φ ( k ) x φ ( k ) y φ ( k ) + y φ ( k ) S φ ( k ) y φ ( k ) + S φ ( k ) y φ ( k ) S φ ( k ) x φ ( k ) x φ ( k ) y φ ( k ) + y φ ( k ) S φ ( k ) y φ ( k ) + y φ ( k ) x φ ( k ) .
Taking k in (35) and (36), we derive from (30)–(33) that
lim k x φ ( k ) T φ ( k ) x φ ( k ) = 0
and
lim k x φ ( k ) S φ ( k ) x φ ( k ) = 0 .
By Lemma 2, we obtain ν Γ . Due to lim j x φ ( k j ) + 1 x φ ( k j ) = 0 , we obtain x φ ( k j ) + 1 ν . Furthermore, it follows from x : = P Γ f ( x ) and (6) that
δ : = lim j f ( x ) x , x φ ( k j ) + 1 x = f ( x ) P Γ f ( x ) , ν P Γ f ( x ) 0 ,
and thus
lim sup k f ( x ) x , x φ ( k ) + 1 x = δ 0 .
Because λ φ ( k ) λ φ ( k ) + 1 , as in the proof of Case 1, we have that for every k,
x φ ( k ) x 2 x φ ( k ) + 1 x 2 1 α φ ( k ) ( 1 γ ) x φ ( k ) x 2 + α φ ( k ) ( 1 γ ) 3 M 2 μ φ ( k ) α φ ( k ) x φ ( k ) x φ ( k ) 1 + 2 f ( x ) x , x φ ( k ) + 1 x 1 γ .
Therefore,
α φ ( k ) ( 1 γ ) x φ ( k ) x 2 α φ ( k ) ( 1 γ ) 3 M 2 μ φ ( k ) α φ ( k ) x φ ( k ) x φ ( k ) 1 + 2 f ( x ) x , x φ ( k ) + 1 x 1 γ .
From α φ ( k ) ( 0 , 1 ) and γ [ 0 , 1 ) , we obtain α φ ( k ) ( 1 γ ) > 0 , which implies
x φ ( k ) x 2 3 M 2 μ φ ( k ) α φ ( k ) x φ ( k ) x φ ( k ) 1 + 2 f ( x ) x , x φ ( k ) + 1 x 1 γ .
Invoking lim k μ k α k x k x k 1 = 0 and (39), we obtain
lim sup k x φ ( k ) x = 0 ,
and hence
lim k x φ ( k ) x = 0 .
It follows from (34) that lim k x φ ( k ) + 1 x = 0 . By Lemma 3, we obtain
0 lim k x k x lim k x φ ( k ) + 1 x = 0 .
Therefore, { x k } converges strongly to x .    □
We observe that Algorithm 6 can be reduced to Algorithm 7 by setting S k = T k for finding a common fixed point of a countable family of nonexpansive mappings of { T k } .
Corollary 1.
Let { T k } be a countable family of nonexpansive mappings from H into itself such that Γ = k = 1 F ( T k ) . Suppose { T k } satisfies NST -conditions and the following conditions hold:
(1) 
0 < a < α k < a ^ < 1 ;
(2) 
0 < b < β k < b ^ < 1 ;
(3) 
0 < c < ξ k < c ^ < 1 ;
(4) 
lim k α k = 0 and k = 1 α k = + ;
(5) 
lim k η k α k = 0 ,
where a , b , c , a ^ , b ^ and c ^ are real positive numbers. Then, the sequence { x k } generated by Algorithm 7 converges strongly to x Γ , where x = P Γ f ( x ) .
Algorithm 7 IVMIA (II): Inertial Viscosity Approximation Method for a family of Nonexpansive Mappings
1:
Input. Let x 0 , x 1 H , { η k } a positive sequence and f : H H a γ -contraction. Choose { α k } , { β k } , ξ k ( 0 , 1 ) and θ k 0 .
2:
Select μ k ( 0 , μ k ¯ ] such that for k 1 ,
μ k ¯ : = min θ k , η k x k x k 1 if x k x k 1 , θ k otherwise .
3:
Compute
z k = x k + μ k ( x k x k 1 ) , y k = β k z k + ( 1 β k ) T k z k , w k = ξ k y k + ( 1 ξ k ) T k y k x k + 1 = α k f ( w k ) + ( 1 α k ) w k .

4. Application to Convex Bilevel Optimization Problems

The aim of this section is to apply our proposed algorithm for solving the following convex bilevel optimization problem:
min x Γ F ( x ) ,
where F : H R is strongly convex differentiable with F being L F -Lipschitz continuous and Γ is the set of all common minimizers of the following unconstrained minimization problems:
min x ϕ 1 ( x ) + ψ 1 ( x ) a n d min x ϕ 2 ( x ) + ψ 2 ( x ) ,
where ψ i , ϕ i : H ( , + ] , i = 1 , 2 , are proper convex and lower semicontinuous functions and ϕ 1 , ϕ 2 are differentiable functions. Problem (45) can be reduced to (2) if ϕ 1 = ϕ 2 and ψ 1 = ψ 2 . As in the literature, we know that x Γ if and only if
x = prox λ k ψ 1 ( I λ k ϕ 1 ) a n d x = prox ε k ψ 2 ( I ε k ϕ 2 ) ,
where λ k 0 , 2 L ϕ 1 , ε k 0 , 2 L ϕ 2 while L ϕ 1 and L ϕ 2 are Lipschitz gradients of ϕ 1 and ϕ 2 , respectively. In addition, x Γ also is a solution of problem (44) if it satisfies the following form:
F ( x ) , x x 0 , x Γ .
Therefore, we solve convex bilevel optimization problems (44) and (45) by finding a common fixed point x of prox λ k ψ 1 ( I λ k ϕ 1 ) and prox ε k ψ 2 ( I ε k ϕ 2 ) , which satisfies the formulation of (46).
Next, we present the algorithm derived from our main result for solving the convex bilevel optimization problem as defined by Algorithm 8.
In order to solve (44) and (45), we suppose the following conditions hold:
(1)
f : H H is a γ -contraction with γ [ 0 , 1 ) ;
(2)
{ λ k } 0 , 2 L ϕ 1 and λ 0 , 2 L ϕ 1 with λ k λ ;
(3)
{ ε k } 0 , 2 L ϕ 2 and ε 0 , 2 L ϕ 2 with ε k ε ;
(4)
{ α k } , { β k } and { ξ k } are sequences in ( 0 , 1 ) ;
(5)
ψ i , i = 1 , 2 , be two lower semicontinuous functions and convex from H into R + ;
(6)
ϕ i , i = 1 , 2 , be two smooth convex loss functions and differentiable with L ϕ i -Lipschitz continuous gradients of ϕ i , i = 1 , 2 , respectively;
(7)
F : H R is strongly convex differentiable with F being L F -Lipschitz constant and σ 0 , 2 L F + ρ where ρ is a parameter such that F is strongly convex.
Theorem 2.
Let { x k } be a sequence generated by Algorithm 8 such that all conditions as in Theorem 1 hold. Let Ω be the set of all solutions of (44). Then, { x k } converges strongly to x Ω which satisfies x = P Γ f ( x ) .
Algorithm 8 iVMBi(I): Inertial Viscosity Method for Bilevel Optimization Problem (I)
1:
Input. Let x 0 , x 1 H , { η k } a positive sequence. Choose { α k } , { β k } , { ξ k } ( 0 , 1 ) and θ k 0 .
2:
Step 1. Select μ k ( 0 , μ k ¯ ] such that for k 1 ,
μ k ¯ : = min θ k , η k x k x k 1 if x k x k 1 , θ k otherwise .
3:
Step 2. Compute
z k = x k + μ k ( x k x k 1 ) , y k = β k z k + ( 1 β k ) prox λ k ψ 1 ( I λ k ϕ 1 ) z k , w k = ξ k y k + ( 1 ξ k ) prox ε k ψ 2 ( I ε k ϕ 2 ) y k u k = ( I σ F ) ( w k ) , x k + 1 = α k u k + ( 1 α k ) w k .
Proof. 
Let T k : = prox λ k ψ 1 ( I λ k ϕ 1 ) and S k : = prox ε k ψ 2 ( I ε k ϕ 2 ) as in Algorithm 6, where λ k 0 , 2 L ϕ 1 , ε k 0 , 2 L ϕ 2 while L ϕ i , i = 1 , 2 , are Lipschitz gradients of ϕ i , i = 1 , 2 , respectively. Using Proposition 1, we get that I σ F is a contraction mapping. By Theorem 1 and setting f : = I σ F , we obtain that { x k } converges strongly to x Γ , where x = P Γ f ( x ) . Observe that f ( x ) = x σ F ( x ) . It is derived from (6) that for any x Γ ,
0 P Γ f ( x ) f ( x ) , x P Γ f ( x ) = x f ( x ) , x x = x x σ F ( x ) , x x = σ F ( x ) , x x .
Because 0 < σ , we conclude 0 F ( x ) , x x for all x Γ , that is, x is an optimal solution of problem (44). Hence, we obtain the desired result.    □
Furthermore, our algorithm can be applied to solving convex bilevel optimization problems (1) and (2) by using the same proximity operator in step 2 and 3 as seen in Algorithm 9.
Algorithm 9 iVMBi(II): Inertial Viscosity Method for Bilevel Optimization Problem (II)
1:
Input. Let x 0 , x 1 H , { η k } a positive sequence. Choose { α k } , { β k } , { ξ k } ( 0 , 1 ) and θ k 0 .
2:
Step 1. Select μ k ( 0 , μ k ¯ ] such that for k 1 ,
μ k ¯ : = min θ k , η k x k x k 1 if x k x k 1 , θ k otherwise .
3:
Step 2. Compute
z k = x k + μ k ( x k x k 1 ) , y k = β k z k + ( 1 β k ) prox λ k ψ ( I λ k ϕ ) z k , w k = ξ k y k + ( 1 ξ k ) prox λ k ψ ( I λ k ϕ ) y k u k = ( I σ F ) ( w k ) , x k + 1 = α k u k + ( 1 α k ) w k .
The following result is immediately obtained by Theorem 2.
Theorem 3.
Let { x k } be a sequence generated by Algorithm 9 such that all conditions as in Corollary 1 hold. Then, { x k } converges strongly to x arg min ( ϕ + ψ ) which satisfies
x = P Γ f ( x ) a n d F ( x ) , x x 0 , x Γ ,
that is, x k x Ω , where Ω is the set of all solutions of problems (1) and (2).
Next, we use Algorithm 9 as a machine learning algorithm for solving some data classification problems applying on UCI-datasets of breast cancer and heart disease. Moreover, we compare the performance of Algorithm 9 with BiG-SAM, iBiG-SAM, aiBiG-SAM, miBiG-SAM and amiBiG-SAM.
In order to employ Algorithm 9 for solving data classification, we need to know what is the objective function of the inner level. To obtain this, we use a single-layer feedback neuron network (SLFNs) model and the concept of extreme learning machine (ELM) introduced by Huang et al. [46].
In supervised learning, we start with the training set of N samples S : = { ( p k , q k ) : p k R n , q k R m , k = 1 , 2 , , N } , where p k is input data and q k is a target. The mathematical model of ELM for SLFNs with M hidden nodes and activate function G is given by
o j = i = 1 M m i G ( w i , p j + r i ) , j = 1 , 2 , , N
where m i is the weight vector connecting the i-th hidden node and the output node, r i is a bias and w i is the weight vector connecting the i-th hidden node and the input node.
Let A be a matrix given by the following:
A = G ( w 1 , p 1 + r 1 ) G ( w M , p 1 + r M ) G ( w 1 , p N + r 1 ) G ( w M , p N + r M ) .
This matrix A is known as the hidden-layer output matrix.
For prediction or classification problem by using ELM model, we need a zero mean, that is, j = 1 N | o j q j | = 0 . Hence,
q j = i = 1 M m i G ( w i , x j + r i ) , i = 1 , 2 , , N .
We can write the above system of linear equations of M variable and N equations as a matrix equation as follows:
A m = Q ,
where m = [ m 1 T , , m M T ] T and Q = [ q 1 T , , q N T ] T is the training data. To solve ELM, it is to find a weight m satisfies (49). If the Moore–Penrose generalized inverse A of A exists, then m = A Q . However, in the case that A does not exist, we can find m as the minimizer of the following convex minimization problem:
min m A m Q 2 2 .
Using a least squares model (50) may cause the over fitting problem. In order to prevent this problem, the regularization methods were proposed. The classical one is the Tikhonov regularization [47], which was employed to solve the following minimization problem:
Minimize : A m Q 2 2 + β K m 2 2 ,
where β is the regularized parameter and K is the Tikhonov matrix. In the standard form, K is set to be the identity.
Another regularization method is the least absolute shrinkage and selection operator (LASSO), which was proposed by Tibshirani [48] for solving the following convex minimization problem:
Minimize : A m Q 2 2 + β m 1 ,
where β is the regularized parameter and ( x 1 , x 2 , , x p ) 1 = i = 1 p | x i | .
In this work, we set ψ ( m ) = β m 1 and ϕ ( m ) = A m Q 2 2 . Based on model (51), we can apply Algorithm 9 for solving the convex bilevel optimization problems (1) and (2) while the objective function of the outer level F ( m ) = 1 2 m 2 2 . We now conduct some numerical experiments for classifications of the following datasets.
In these experiments, we aim to classify the datasets of breast cancer and heart disease from https://archive.ics.uci.edu, accessed on 12 June 2022.
Breast cancer dataset [49]. This dataset contains 699 samples, each of which has 11 attributes. In this dataset, we classify two classes of data.
Heart disease dataset [50]. This dataset contains 303 samples, each of which has 13 attributes. In this dataset, we classify two classes of data.
Throughout these experiments, all the results are performed under MATLAB 9.6 (R2019a) running on a MacBook Air 13.3-inch, 2020, with Apple M1 chip processor and 8-core GPU, configured with 8 GB of RAM.
In all the experiments, sigmoid is used as an activation function, and we set the number of hidden node M = 30 . The following formula for the accuracy of the data classification is given by
Accuracy(Acc) = T P + T N T P + T N + F P + F N × 100 ,
where T P is the model successfully predicting the patient as positive, T N denotes the model successfully predicting the patient as negative, F N represents the prediction of the diseased patient as healthy by negative test results and F P means the prediction of a healthy patient as diseased by a positive test result.
We also compute the success probability of making a correct positive class classification as the following form:
Precision(Pre) = T P T P + F P .
In addition, we measure the sensitivity of the model toward identifying the positive class as the following form:
Recall(Rec) = T P T P + F N .
The Lipschitz gradient L ϕ of ϕ is computed by 2 A 2 . When the dimension of A is so large, it is hard to compute such L ϕ . All parameters for each algorithm of our experiments are given in Table 1.
From Table 1, we select the best choice of parameter for each algorithm in order to achieve the highest performance. It is worth noting that all parameters satisfy the assumptions of each convergence theorem, see [6,9,11] for more details. In addition, we set β = 0.00001 which is a regularized parameter of problem (52). In Algorithm 9, we choose ξ k , β k = 1 k + 2 for experimentation on the breast cancer dataset, while the classification of heart disease uses ξ k = 0.5 together with β k = 0.1 .
We compare the performance of each method at the 100th and 500th iterations and obtain the following results, as seen in Table 2 and Table 3, respectively.
Table 2 shows that our algorithm performs the best accuracy at the 100th iteration. Moreover, Table 3 shows the performance of each algorithm at the 500th iteration. We found that Algorithm 9 has a better accuracy than the others.
Next, we show the performance for the prediction of each algorithm in terms of the number of iterations and training times for which each algorithm achieves the highest accuracy.
From Table 4, comparing with Algorithm 1 (BiG-SAM), Algorithm 2 (iBiG-SAM), Algorithm 3 (aiBiG-SAM), Algorithm 4 (miBiG-SAM) and Algorithm 5 (amiBiG-SAM), Algorithm 9 provides a higher value of accuracy for training. In the testing case, we found that the accuracy of Algorithm 2 (iBiG-SAM) is better than our algorithm on the breast cancer experimentation. However, our method has the lowest number of iterations and training times compared with the others.
We also construct a 10-fold cross validation to appraise the performance of each algorithm and use Average accuracy as the appraising tool. It is defined as follows:
A v e r a g e A c c = i = 1 N u i v i × 100 % / N .
where N is a number of sets considered during the cross validation ( N = 10 ), u i is a number of correctly predicted data at fold i and v i is a number of all data at fold i.
Let Err M = sum of errors in all 10 training sets, Err K = sum of errors in all 10 testing sets, M = sum of all data in 10 training sets and K = sum of all data in 10 testing sets. Then,
Error % = error M % + error K % 2 ,
where error M % = Err M M × 100 % and error K % = Err K K × 100 %
We split the data into training sets and testing sets by using the 10-fold cross validation, as seen in Table 5.
In Table 6, we show the average of the accuracy of each algorithm with the 500th iteration.
Table 6 demonstrates that Algorithm 9 performs better than Algorithm 1 (BiG-SAM), Algorithm 2 (iBiG-SAM), Algorithm 3 (aiBiG-SAM), Algorithm 4 (miBiG-SAM) and Algorithm 5 (amiBiG-SAM) in terms of the accuracy in all the experiments conducted.

5. Conclusions

We propose a novel iterative method based on a fixed-point approach with an inertial technique for approximating a common fixed point of two countable families of nonexpansive mappings in a Hilbert space and also present strong convergence theorems. Our algorithm leads to a sequence converging strongly to a solution for convex bilevel optimization problems for which the inner level consists of the minimization of the sum of smooth and nonsmooth functions. Furthermore, we apply the proposed algorithm to the data classification of breast cancer and heart disease datasets and then their performances are assessed and compared with the other algorithms. We derive from the experiment that our algorithm provides a higher value of accuracy of training and testing on various datasets. We can conclude the advantages of our proposed algorithm from our experiments in that it requires a lower number of iterations and less training time compared with the others. It is worth mentioning that our proposed algorithm is intelligent machine learning for the prediction and classification of big data. It is an efficient algorithm that can be developed to software/applications for prediction and classifications in future works. Furthermore, we aim to employ our proposed algorithm for real datasets of the patients at Sriphat Medical Center, Faculty of Medicine, Chiang Mai University, Thailand.

Author Contributions

Conceptualization, S.S.; formal analysis, P.T. and W.I.; investigation, P.T.; methodology, S.S.; supervision, S.S.; validation, T.L. and S.S.; writing—original draft, P.T.; writing—review and editing, S.S. and W.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first and fourth authors were also supported by Thailand Science Research and Innovation under the project IRN62W0007.

Data Availability Statement

All data in this work is obtainable at https://archive.ics.uci.edu, accessed on 12 June 2022.

Acknowledgments

The authors would like to thank the referees for the valuable suggestions. This research has received funding support from the NSRF via the Program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first and fourth authors were also supported by Thailand Science Research and Innovation under the project IRN62W0007.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Darby, S.C.; Ewertz, M.; McGale, P.; Bennet, A.M.; Blom-Goldman, U.; Brønnum, D.; Correa, C.; Cutter, D.; Gagliardi, G.; Gigante, B.; et al. Risk of ischemic heart disease in women after radiotherapy for breast cancer. N. Engl. J. Med. 2013, 368, 987–998. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  2. Solodov, M. An explicit descent method for bilevel convex optimization. J. Convex Anal. 2007, 4, 227–237. [Google Scholar]
  3. Cabot, A. Proximal point algorithm controlled by a slowly vanishing term: Applications to hierarchical minimization. SIAM J. Optim. 2005, 15, 555–572. [Google Scholar] [CrossRef]
  4. Helou, E.S.; Simões, L.E. ϵ-subgradient algorithms for bilevel convex optimization. Inverse Probl. 2017, 33, 5. [Google Scholar] [CrossRef] [Green Version]
  5. Dempe, S.; Kalashnikov, V.; Pérez-Valdés, G.A.; Kalashnykova, N. Bilevel programming problems. In Energy Systems; Springer: Berlin, Germany, 2015. [Google Scholar]
  6. Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef] [Green Version]
  7. Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef] [Green Version]
  8. Beck, A.; Sabach, S. A first order method for finding minimal norm-like solutions of convex optimization problems. Math. Program. 2014, 147, 25–46. [Google Scholar] [CrossRef]
  9. Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 2019, 2019, 1–20. [Google Scholar] [CrossRef]
  10. Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–7. [Google Scholar] [CrossRef]
  11. Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022. [Google Scholar] [CrossRef]
  12. Anulekha, D.; Joydeep, D. Optimality Conditions in Convex Optimization: A Finite-Dimensional View; CRC Press: Boca Raton, FL, USA, 2011. [Google Scholar]
  13. Yao, Y.; Iyiola, O.S.; Shehu, Y. Subgradient extragradient method with double inertial steps for variational inequalities. J. Sci. Comput. 2022, 90, 1–29. [Google Scholar] [CrossRef]
  14. Zhao, X.; Yao, J.C.; Yao, Y. A nonmonotone gradient method for constrained multiobjective optimization problems. J. Nonlinear Var. Anal. 2022, 6, 693–706. [Google Scholar]
  15. Moreau, J.J. Proximité et dualité dans un espace hilbertien. Bull. Soc. Math. Fr. 1965, 93, 273–299. [Google Scholar] [CrossRef]
  16. Combettes, P.L.; Wajs, V.R. Signal recovery by proximal forward-backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef] [Green Version]
  17. Byrne, C. Iterative oblique projection onto convex subsets and the split feasibility problem. Inverse Probl. 2002, 18, 441–453. [Google Scholar] [CrossRef]
  18. Byrne, C. Aunified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Probl. 2004, 20, 103–120. [Google Scholar] [CrossRef] [Green Version]
  19. Cholamjiak, P.; Shehu, Y. Inertial forward-backward splitting method in Banach spaces with application to compressed sensing. Appl. Math. 2019, 64, 409–435. [Google Scholar] [CrossRef]
  20. Kunrada, K.; Pholasa, N.; Cholamjiak, P. On convergence and complexity of the modified forward-backward method involving new linesearches for convex minimization. Math. Meth. Appl. Sci. 2019, 42, 1352–1362. [Google Scholar]
  21. Suantai, S.; Eiamniran, N.; Pholasa, N.; Cholamjiak, P. Three-step projective methods for solving the split feasibility problems. Mathematics 2019, 7, 712. [Google Scholar] [CrossRef] [Green Version]
  22. Suantai, S.; Kesornprom, S.; Cholamjiak, P. Modified proximal algorithms for finding solutions of the split variational inclusions. Mathematics 2019, 7, 708. [Google Scholar] [CrossRef] [Green Version]
  23. Thong, D.V.; Cholamjiak, P. Strong convergence of a forward-backward splitting method with a new step size for solving monotone inclusions. Comput. Appl. Math. 2019, 38, 94. [Google Scholar] [CrossRef]
  24. Thongpaen, P.; Wattanaweekul, R. A fast fixed-point algorithm for convex minimization problems and its application in image restoration problems. Mathematics 2021, 9, 2619. [Google Scholar] [CrossRef]
  25. Mann, W.R. Mean value methods in iteration. Proc. Am. Math. Soc. 1953, 4, 506–510. [Google Scholar] [CrossRef]
  26. Halpern, B. Fixed points of nonexpansive maps. Bull. Am. Math. Soc. 1967, 73, 957–961. [Google Scholar] [CrossRef] [Green Version]
  27. Ishikawa, S. Fixed points by a new iteration method. Proc. Am. Math. Soc. 1974, 44, 147–150. [Google Scholar] [CrossRef]
  28. Phuengrattana, W.; Suantai, S. On the rate of convergence of Mann, Ishikawa, Noor and SP-iterations for continuous functions on an arbitrary interval. J. Comput. Appl. Math. 2011, 235, 3006–3014. [Google Scholar] [CrossRef] [Green Version]
  29. Wongyai, S.; Suantai, S. Convergence Theorem and Rate of Convergence of a New Iterative Method for Continuous Functions on Closed Interval. In Proceedings of the AMM and APAM Conference Proceedings, Bangkok, Thailand, 23–25 May 2016; pp. 111–118. [Google Scholar]
  30. De la Sen, M.; Agarwal, R.P. Common fixed points and best proximity points of two cyclic self-mappings. Fixed Point Theory Appl. 2012, 2012, 1–17. [Google Scholar] [CrossRef] [Green Version]
  31. Shoaib, A. Common fixed point for generalized contraction in b-multiplicative metric spaces with applications. Bull. Math. Anal. Appl. 2020, 12, 46–59. [Google Scholar]
  32. Kim, K.S. A Constructive scheme for a common coupled fixed point problems in Hilbert space. Mathematics 2020, 8, 1717. [Google Scholar] [CrossRef]
  33. Haghi, R.H.; Bakhshi, N. Some coupled fixed point results without mixed monotone property. J. Adv. Math. Stud. 2022, 15, 456–463. [Google Scholar]
  34. Jailoka, P.; Suantai, S.; Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2021, 37, 449–461. [Google Scholar] [CrossRef]
  35. Tang, Y.M.; Zhang, L.; Bao, G.Q.; Ren, F.J.; Pedrycz, W. Symmetric implicational algorithm derived from intuitionistic fuzzy entropy. Iran. J. Fuzzy Syst. 2022, 19, 27–44. [Google Scholar]
  36. Mokeddem, S.A. A fuzzy classification model for myocardial infarction risk assessment. Appl. Intell. 2018, 48, 1233–1250. [Google Scholar] [CrossRef]
  37. Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to common fixed points of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
  38. Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence theorems by the hybrid method for families of nonexpansive mappings in Hilbert spaces. Taiwan J. Math. 2006, 10, 339–360. [Google Scholar] [CrossRef]
  39. Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Aust. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef] [Green Version]
  40. Goebel, K.; Kirk, W.A. Topic in Metric Fixed Point Theory; Cambridge Studies in Advanced Mathematics; Cambridge University Press: Cambridge, UK, 1990; Volume 28. [Google Scholar]
  41. Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nostrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
  42. Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 21–30. [Google Scholar] [CrossRef]
  43. Das, G.; Debata, J.P. Fixed point of quasi-non-expansive mappings. Indian J. Pure Appl. Math. 1968, 17, 1263–1269. [Google Scholar]
  44. Takahashi, W.; Tamura, T. Convergence theorems for a pair of non-expansive mappings. J. Convex Anal. 1998, 5, 45–58. [Google Scholar]
  45. Thongpaen, P.; Inthakon, W. Common attractive points of widely more generalized hybrid mappings in Hilbert spaces. Thai J. Math. 2020, 18, 861–869. [Google Scholar]
  46. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  47. Tikhonov, A.N.; Arsenin, A.Y. Solution of Ill-Posed Problems; John Wiley & Sons: Washington, DC, USA, 1997. [Google Scholar]
  48. Tibshirani, R. Regression shrinkage abd selection via lasso. J. R. Stat. Soc. Ser. B (Method) 1996, 58, 267–288. [Google Scholar]
  49. Street, W.N.; Wolberg, W.H.; Mangasarian, O.L. Nuclear feature extraction for breast tumor diagnosis. Biomed. Image Process. Biomed. Vis. 1993, 1905, 861–870. [Google Scholar]
  50. Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef]
Table 1. Chosen parameters of each algorithm.
Table 1. Chosen parameters of each algorithm.
ParametersAlgorithm 9Algorithm 1Algorithm 2Algorithm 3Algorithm 4Algorithm 5
σ 2 L F + ρ 2 L F + ρ 2 L F + ρ 2 L F + ρ 2 L F + ρ 2 L F + ρ
λ - 1 L ϕ 1 L ϕ 1 L ϕ --
λ k 1 L ϕ --- 1 L ϕ 1 L ϕ
α --3333
α k 1 50 k 1 k + 2 1 50 k 1 k + 2 1 k + 2 1 k + 2
θ k k k + 1 -----
η k 10 50 k 2 - 10 50 k 2 α k k 0.01 α k k 0.01 α k k 0.01
q----44
Table 2. The performance of each algorithm at 100th iteration on each dataset.
Table 2. The performance of each algorithm at 100th iteration on each dataset.
DatasetAlgorithmPre TrainRec TrainPre TestRec TestAcc TrainAcc Test
Breast CancerAlgorithm 10.88450.98120.97181.000090.408298.0861
Algorithm 20.96860.96250.98571.000095.510299.0431
Algorithm 30.88450.98120.97181.000090.408298.0861
Algorithm 40.89660.97500.97181.000091.020498.0861
Algorithm 50.88450.98120.97181.000090.408298.0861
Algorithm 90.97470.96250.98571.000095.918499.0431
Heart DiseaseAlgorithm 10.76560.85220.76470.780077.619075.2688
Algorithm 20.83060.89570.77190.880084.285779.5699
Algorithm 30.76560.85220.76470.780077.619075.2688
Algorithm 40.80490.86090.75930.820080.952476.3441
Algorithm 50.76560.85220.76470.780077.619075.2688
Algorithm 90.82680.91300.76670.920084.761980.6452
Table 3. The performance of each algorithm at 500th iteration on each dataset.
Table 3. The performance of each algorithm at 500th iteration on each dataset.
DatasetAlgorithmPre TrainRec TrainPre TestRec TestAcc TrainAcc Test
Breast CancerAlgorithm 10.95060.96250.98571.000094.285799.0431
Algorithm 20.97780.96250.99281.000096.122499.5215
Algorithm 30.95060.96250.98571.000094.285799.0431
Algorithm 40.95360.96250.98571.000094.489899.0431
Algorithm 50.95060.96250.98571.000094.285799.0431
Algorithm 90.97780.96250.99281.000096.122499.5215
Heart DiseaseAlgorithm 10.80650.86960.76790.860081.428678.4946
Algorithm 20.84550.90430.77970.920085.714381.7204
Algorithm 30.80650.86960.76790.860081.428678.4946
Algorithm 40.81150.86090.75440.860081.428677.4194
Algorithm 50.80650.86960.76790.860081.428678.4946
Algorithm 90.84550.90430.78330.940085.714382.7957
Table 4. The iteration number and training time of each algorithm with the highest accuracy on each dataset.
Table 4. The iteration number and training time of each algorithm with the highest accuracy on each dataset.
DatasetAlgorithmIteration No.Training TimeAcc TrainAcc Test
Breast CancerAlgorithm 18190.027295.102099.0431
Algorithm 22640.009596.122499.5215
Algorithm 38190.026795.102099.0431
Algorithm 45310.032095.102099.0431
Algorithm 58190.033095.102099.0431
Algorithm 9780.005496.122499.0431
Heart DiseaseAlgorithm 120240.029386.190579.5699
Algorithm 25560.009686.190581.7204
Algorithm 320240.045286.190579.5699
Algorithm 412260.051786.190579.5699
Algorithm 513980.031785.714378.4946
Algorithm 91920.006486.190582.7957
Table 5. Number of samples in each fold for all datasets.
Table 5. Number of samples in each fold for all datasets.
Breast CancerHeart Disease
TrainTestTrainTest
Fold 16306927330
Fold 26297027231
Fold 36297027231
Fold 46297027231
Fold 56297027330
Fold 66297027330
Fold 76297027330
Fold 86297027330
Fold 96297027330
Fold 106297027330
Table 6. Average accuracy of each algorithm at 500th iteration with 10-fold cross validation.
Table 6. Average accuracy of each algorithm at 500th iteration with 10-fold cross validation.
AlgorithmBreast CancerHeart Disease
Acc TrainAcc TestError % Acc TrainAcc TestError %
Algorithm 195.898995.98764.053479.831978.548420.8104
Algorithm 296.709496.98763.147485.031882.838716.0616
Algorithm 395.898995.98764.053479.831978.548420.8104
Algorithm 496.042096.13043.910380.968380.526919.2519
Algorithm 595.898995.98764.053479.831978.548420.8104
Algorithm 996.788997.41822.893085.814882.838715.9883
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Thongpaen, P.; Inthakon, W.; Leerapun, T.; Suantai, S. A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification. Symmetry 2022, 14, 2617. https://doi.org/10.3390/sym14122617

AMA Style

Thongpaen P, Inthakon W, Leerapun T, Suantai S. A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification. Symmetry. 2022; 14(12):2617. https://doi.org/10.3390/sym14122617

Chicago/Turabian Style

Thongpaen, Panadda, Warunun Inthakon, Taninnit Leerapun, and Suthep Suantai. 2022. "A New Accelerated Algorithm for Convex Bilevel Optimization Problems and Applications in Data Classification" Symmetry 14, no. 12: 2617. https://doi.org/10.3390/sym14122617

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop