Abstract
A new accelerated common fixed point algorithm is introduced and analyzed for a countable family of nonexpansive mappings and then we apply it to solve some convex bilevel optimization problems. Then, under some suitable conditions, we prove a strong convergence result of the proposed algorithm. As an application, we employ the proposed algorithm for regression and classification problems. Moreover, we compare the performance of our algorithm with others. By numerical experiments, we found that our algorithm has a better performance than the others.
Keywords:
bilevel optimization; fixed point algorithm; forward-backward algorithm; regression and classification problems MSC:
47H10; 65K10; 90C25
1. Introduction
Let H be a real Hilbert space and be real valued functions. The convex bilevel minimization problem is a special kind of optimization problem where one problem is embedded within another. The outer level is the following constraint minimization problem:
where is strongly convex with parameter and a continuously differentiable function such that is Lipschitz continuous with constant while is the nonempty set of minimizers of the inner level problem, given by
and sometimes we will use the notation for . The following assumptions are assumed for solving Problem (2).
- (i)
- is a proper convex and lower semi-continuous function;
- (ii)
- is a convex and differentiable function such that is a Lipschitz continuous with constant , that is,
The solution of (2) can be characterized by Theorem 16.3 of Bauschke and Combettes [1] as follows:
where is the subdifferential of g and is the gradient of f. Moreover, Problem (2) is also characterized by the following fixed point problem:
where and . It is also known that is a nonexpansive operator when and . The operator is called the forward-backward operator of f and g with respect to c. Moreover, it is known that is a point of minimizer of problem (1) if and only if
In the past decade, many researchers have proposed methods to find optimal solutions of Problem (2). Lions and Mercier [2] introduced a simple algorithm, called Forward-Backward Splitting (FBS), for solving Problem (2). Their algorithm was given by
where is the step-size and .
In 1964, Polyak [3] firstly introduced the inertial technique for accelerating the rate of convergence of the algorithm. Since then, this technique was widely used for this purpose.
In [4], Beck and Teboulle employed the inertial technique to introduce a fast iterative shrinkage-thresholding algorithm (FISTA) for solving Problem (2) as follows:
They showed that the convergence behavior of FISTA is better than the others.
Recently, some authors, for instance, Bussaban et al. [5], Puangpee and Suantai [6] and Jailoka et al. [7], employed the inertial technique to introduce common fixed point algorithms for a countable family of nonexpansive operators and established some convergence results under NST-condition (I), NST-condition, and the condition (Z). They also applied their algorithms to solving some convex minimization problems.
In 2017, Sabach and Shtern [8] introduced a new method, called Sequential Averaging Method (SAM), for solving a bilevel optimization problem. Such an algorithm was developed from [9] for solving a certain class of fixed point problems. To solve the bilevel optimization Problems (1) and (2), the Bilevel Gradient sequential Averaging Method (BiG-SAM) was proposed in [8]. Their algorithm was defined by Algorithm 1.
| Algorithm 1 Bilevel Gradient sequential Averaging Method (BiG-SAM) |
|
They proved a strong convergence theorem of the sequence generated by BiG-SAM under some control conditions.
After that, Shehu et al. [10] used the inertial technique for improving the convergence behavior of BiG-SAM. Their algorithm is known as the inertial Bilevel Gradient Sequential Averaging Method (iBiG-SAM). It was defined as follows (Algorithm 2):
| Algorithm 2 Inertial Bilevel Gradient sequential Averaging Method (iBiG-SAM) |
|
In 2022, Duan and Zhang [11] introduced a new algorithm based on the proximal gradient algorithm for solving a bilevel optimization problem. This algorithm is known as the alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM). It was defined as follows (Algorithm 3):
| Algorithm 3 The alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM) |
|
They also proved a strong convergence result of the proposed method.
Motivated by these works, we are interested in proposing a new efficient algorithm for convex bilevel Problems (1) and (2). We establish and prove a convergence theorem of the proposed algorithm under some suitable conditions. We employ it for data prediction and classification. The paper is organized as follows. In Section 2, we describe some notations and useful lemmas for the later sections. In Section 3, we discuss and analyze the convergence of our proposed algorithm. In Section 4, we present applications of the obtained fixed-point results in Section 3 for solving regression and classification problems. Moreover, some numerical experiments on regression and classification problems are also given in Section 4. Finally, we also give conclusions of the paper in Section 5.
2. Preliminaries
Let H be a real Hilbert space with norm and inner product , and C be a nonempty closed convex subset of H. Let . An operator is said to be L-Lipschitz if . If T is Lipschitz continuous with a coefficient , then T is called a contraction. The operator T is said to be nonexpansive if . We use to denote the set of fixed points of T, that is, . The set of all common fixed points of a sequence of nonexpansive operators of C into itself is . For finding a common fixed point of , Takahashi [12] introduced the NST-condition as the following. Let and be families of nonexpansive operators of C into itself with , where . A sequence satisfies the NST-condition (I) with , if for any bounded sequence in C,
for all . The sequence satisfies the NST-condition (I) with T if .
The following Lemma is useful for proving our main result.
Lemma 1
([5]). Let f be a convex and differentiable function from H into with as a Lipschitz continuous with constant , and g is a proper convex and lower semi-continuous function from H into . Let and , where with as . Then satisfies the NST-condition (I) with T.
Definition 1
([13,14]). A sequence with a nonempty common fixed point set is said to satisfy the condition (Z) if is a bounded sequence in H such that
it follows that every weak cluster point of belongs to .
The following remark is obtained by demicloseness of where is the nonexpansive operator.
Remark 1.
If is nonexpansive, the operator satisfies the NST-condition (I) with respect to T where T is the nonexpansive operator. Then satisfies the condition (Z).
Note that if is a proper, lower semi-continuous and convex function, then the exists and is unique for all ; see [15]. We end this part with the following useful lemmas, which will be used in the later section.
Lemma 2
([16,17]). For any and , the following statements hold:
- (1)
- ;
- (2)
- ;
- (3)
The identity in Lemma 2(3) implies that the following equality holds:
for all and with .
Lemma 3
([18]). Let , , and such that
for all . If the following conditions hold:
- (i)
- ;
- (ii)
- ;
- (iii)
- ,
then .
Lemma 4
([19]). Let be a sequence of real numbers which does not decrease at infinity in the sense that there exists a subsequence of , satisfying for all . Define the sequence of integers as follows:
where such that . Then the following statements hold:
- (i)
- and ;
- (ii)
- and for all .
Let C be a nonempty closed convex subset of a Hilbert space H and let be a mapping. The metric projection onto C, denoted by , is defined for each , and is the unique element in C such that
It is known that
for all ; see [16].
3. Results
Throughout this section, we let and be families of nonexpansive operators on a real Hilbert space H such that and are a contraction mapping with a constant .
To find a common fixed point of a countable family of nonexpansive operators in a real Hilbert space, we first propose a new accelerated algorithm. Then, under certain conditions, we show a strong convergence theorem. Now, we are ready to introduce our accelerated algorithm as follows:
Theorem 1.
Suppose that satisfies the condition (Z). Let be a sequence generated by Algorithm 4 which satisfies the following conditions:
- (i)
- for some ;
- (ii)
- and ;
- (iii)
- .
Then converges strongly to an element , where .
| Algorithm 4 An Inertial Viscosity Modified Picard (IVMP) |
Initial. Take arbitrarily and . For , set
Step 1. Calculate , and using: Then, update and return to Step 1. |
Proof.
Let be such that . By the definition of and in Algorithm 4, for each , we have
and
From (7) and (8), we obtain
Since , by (6), we get that . Thus, there is a constant such that . This implies
Let . We show . For , we get
Suppose for some . It follows from (10) that
Since , we obtain which implies
By mathematical induction, we conclude that for all . Thus, for all . It follows that is bounded, and so are and .
For each , we have
By Lemma 2, we get
It follows from (7) with that
Using (12),
From the above inequality, we get
and
So, we obtain
Now, we consider two cases for the covergence of the sequence generated by Algorithm 4.
Case 1.
There exists a such that the sequence is nonincreasing. Since is bounded from below by zero, exists. Using assumption and , we get that . For applying this in Lemma 4, we need to show that
Using the fact of Lemma 2(3), we get
This implies that
It follows from the assumption and the convergence of the sequence and that . For each , we have
This implies that . Let
Since is bounded, we can choose a subsequence of such that
and for some . It follows from the condition (Z) of that .
Moreover, using , we obtain
Thus,
It implies by (15) and the fact of that . From (14), using Lemma 4, we can conclude that .
Case 2.
Suppose that sequence is not a monotone decreasing sequence for all large enough. Set
So, there exists a subsequence of such that for all . In this case, we define by
By Lemma 4, we obtain for all . Then,
The same as the argument in Case 1, we obtain
for all . Hence as
Similary, we have .
Since and , we obtain
Since and , it follows that
and hence as . It implies by
that as . Using Lemma 4, we obtain as . Hence . The proof is complete. □
Now, we employ Algorithm 4 for solving Problem (1). We obtain the following result as a consequence of Theorem 1.
Theorem 2.
Let be strongly convex with parameter and a continuously differentiable function such that is Lipschitz continuous with constant . Suppose that f and g satisfy the assumptions of (2). Let be a sequence of positive real numbers in such that as where and let be a sequence generated by Algorithm 5. Then converges strongly to .
| Algorithm 5 An Inertial Bilevel Gradient Modified Picard (IBiG-MP) |
Initial. Take arbitrarily and . For . Set
Step 1. Calculate , and as follows: Then, update and return to Step 1. |
Proof.
Let and . By Lemma 1 and Remark 1, we know that satisfies the condition (Z). From Theorem 1, we get that converges to . Notice also that is a k-contraction with parameter , whenever . It remains to show that the variational inequality holds true. By using and (5), for all , we obtain
Thus, is an optimal solution for the Problem (1). □
4. Application
In this section, we employ Algorithm 5 as a machine learning algorithm for regression, a graph of the Sine function and classification of some data by using a model of SLFNs (Single Hidden Layer Feedforward Neural Networks ) and Extreme Learning Machine. The MATLAB computing environment and an Intel Core-i5 gen 8 with 8 GB RAM(Integrated Electronics Corporation, Santa Clara, CA, USA) are used to perform all results.
We first recall a basic knowledge of the extreme learning machine for regression and classification problems. Moreover, we use the propose algorithm for solving these problems and compare the performance among Big-SAM, iBig-SAM and aiBig-SAM.
Extreme learning machine (ELM) [20] is defined as follows: Let be a training set of N distinct samples, is an input data and is a target. A standard SLFNs with M hidden nodes and activation function is given by
where is the weight vector connected between the j-th hidden node and the output node, is the weight vector connected between the j-th hidden node and the input node, and is the bias. The aim of SLFNs is to predict these N outputs such that . That is,
We can rewrite the above system of linear equation by the following matrix equation:
where
The objective of an SLFNs is estimating , and for solving (18) while ELM aims to find only with randomly and .
The Problem (19) can be considered as the following convex minimization problem:
where is called the regularization parameter. In Algorithm 5, we set and . We employ Algorithm 5 to solve convex bilevel optimization Problems (1) and (2) while the outer level function is defined by .
4.1. Regression a Sine Function
In our experiment for the regression of a graph of the Sine function, we construct a training set by randomly selecting 10 distinct data. we use sigmoid as our activation function. We also set the number of hidden nodes , and regularization parameter . In Algorithm 5, we set . The Lipschitz constant of gradient f is computed by . The values indicated in Table 1 are used for all control settings. We evaluate the result by
Table 1.
Setting parameters of each Algorithm.
Figure 1.
(a) A regression of the sine function at 100th step; (b) A regression of the sine function at 500th step.
Table 2.
Numerical results for regression of a sine function with 500 iterations.
4.2. Data Classification
In order to classify datasets, we use four datasets from “https://www.kaggle.com/, accessed on 20 June 2020” and “https://archive.ics.uci.edu/, accessed on 20 June 2020” as follows:
Breast Cancer dataset [21] The dataset contains 11 attributes. In this dataset, we classify 2 classes of data.
Heart Disease UCI dataset [22] The dataset contains 14 attributes. There are two classes of this dataset.
Diabetes dataset [23] The dataset contains 9 attributes. In this dataset, we classify 2 classes of data.
Iris dataset [24] This dataset contains 3 classes of iris plant. The dataset contains 4 attributes. We aim to clasify each type of iris plant (Iris versicolour, Iris virginica and Iris setosa).
Table 3 shows the number of attributes of each dataset and the number of the training set (around of data) and testing set (remainder of data).
Table 3.
Training and testing sets of each dataset.
We set all control parameters the same as in Table 1 in Section 4.1, the number of hidden nodes , and activation function is sigmoid. Given a training set for each dataset as mentioned in Table 3, An accuracy of the output data is calculated by
We compare the iteration number, accuracy train and accuracy test of Algorithm 5 with the others on each dataset as seen in Table 4.
Table 4.
The iteration number of each algorithm with the best accuracy on each dataset.
From Table 4, Algorithm 5 has a better performance of accuracy than BIG-SAM, iBIG-SAM, aiBIG-SAM in all experiments conducted.
5. Conclusions
We propose a new common fixed point algorithm for a countable family of nonexpansive operators and apply it to solve some convex bilevel optimization problems. We then prove a strong convergence theorem of the proposed algorithm under some suitable conditions. Moreover, we apply our algorithm to solve classification and regression problems. We also give numerical experiments for comparison of the performance of our proposed algorithm with the existing algorithms, the proposed algorithm is more efficient than the existing algorithms in the literature.
Author Contributions
Conceptualization, S.S.; Formal analysis, P.T. and S.S.; Investigation, P.T.; Methodology, S.S.; Supervision, S.S.; Validation, S.S. and B.P.; Writing—original draft, P.T.; Writing—review and editing, S.S. and B.P. All authors have read and agreed to the published version of the manuscript.
Funding
NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183].
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.
Acknowledgments
The authors would like to thank the referees for valuable comments and suggestions for improving this work. This research has received funding support from the NSRF program Management Unit for Human Resources & Institutional Development, Research and Innovation [grant number B05F640183] and Chiang Mai University. The first author would like to thank Science Achievement Scholarship of Thailand (SAST) for the financial support. The second author was partially supported by Chiang Mai University under Fundamental Fund 2023.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Bauschke, H.H.; Combettes, P.L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces; Springer: New York, NY, USA, 2011. [Google Scholar]
- Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. Siam J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
- Polyak, B. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
- Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
- Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 35–44. [Google Scholar] [CrossRef]
- Puangpee, J.; Suantai, S. A New Accelerated Viscosity Iterative Method for an Infinite Family of Nonexpansive Mappings with Applications to Image Restoration Problems. Mathematics 2020, 8, 615. [Google Scholar] [CrossRef]
- Jailoka, P.; Suantai, S.; Hanjing, A. A fast viscosity forward-backward algorithm for convex minimization problems with an application in image recovery. Carpathian J. Math. 2021, 37, 449–461. [Google Scholar] [CrossRef]
- Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef]
- Xu, H.K. Viscosity approximation methods for nonexpansive mappings. J. Math. Anal. Appl. 2004, 298, 279–291. [Google Scholar] [CrossRef]
- Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim Methods Softw. 2019, 2019, 1–20. [Google Scholar] [CrossRef]
- Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2022, 2022, 1–29. [Google Scholar] [CrossRef]
- Nakajo, K.; Shimoji, K.; Takahashi, W. Strong convergence to common fixed points of families of nonexpansive mappings in Banach spaces. J. Nonlinear Convex Anal. 2007, 8, 11–34. [Google Scholar]
- Aoyama, K.; Kimura, Y. Strong convergence theorems for strongly nonexpansive sequences. Appl. Math. Comput. 2011, 217, 7537–7545. [Google Scholar] [CrossRef]
- Aoyama, K.; Kohsaka, F.; Takahashi, W. Strong convergence theorems by shrinking and hybrid projection methods for relatively nonexpansive mappings in Banach spaces. In Nonlinear Analysis and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009; pp. 2–7. [Google Scholar]
- Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Pass: New York, NY, USA, 2004. [Google Scholar]
- Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
- Takahashi, W. Nonlinear Functional Analysis; Yokohama Publishers: Yokohama, Japan, 2000. [Google Scholar]
- Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Aust. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef]
- Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nostrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Wolberg, W.H.; Mangasarian, O.L. Multisurface method of pattern separation for medical diagnosis applied to breast cytology. Proc. Natl. Acad. Sci. USA 1990, 87, 9193–9196. [Google Scholar] [CrossRef] [PubMed]
- Detrano, R.; Janosi, A.; Steinbrunn, W.; Pfisterer, M.; Schmid, J.J.; Sandhu, S.; Guppy, K.H.; Lee, S.; Froelicher, V. International application of a new probability algorithm for the diagnosis of coronary artery disease. Am. J. Cardiol. 1989, 64, 304–310. [Google Scholar] [CrossRef] [PubMed]
- Smith, J.W.; Everhart, J.E.; Dickson, W.C.; Knowler, W.C.; Johannes, R.S. Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. Proc. Symp. Comput. Appl. Med. Care 1998, 1998, 261–265. [Google Scholar]
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).