Abstract
A convex minimization problem in the form of the sum of two proper lower-semicontinuous convex functions has received much attention from the community of optimization due to its broad applications to many disciplines, such as machine learning, regression and classification problems, image and signal processing, compressed sensing and optimal control. Many methods have been proposed to solve such problems but most of them take advantage of Lipschitz continuous assumption on the derivative of one function from the sum of them. In this work, we introduce a new accelerated algorithm for solving the mentioned convex minimization problem by using a linesearch technique together with a viscosity inertial forward–backward algorithm (VIFBA). A strong convergence result of the proposed method is obtained under some control conditions. As applications, we apply our proposed method to solve regression and classification problems by using an extreme learning machine model. Moreover, we show that our proposed algorithm has more efficiency and better convergence behavior than some algorithms mentioned in the literature.
1. Introduction
In this work, we are dealing with a convex minimization problem, which can be formulated as
where are proper, lower-semicontinuous convex functions and H is a Hilbert space. Many real world problems, such as signal processing, image reconstruction and compressed sensing, can be described using this model [1,2,3,4]. Moreover, data classification can also be formulated as (1); for more information about the importance and development of data classification and its methods see [5,6,7,8]. Therefore, a convex minimization problem has a wide range of applications, some of which will be studied in this research.
If f is differentiable then it is well-known that an element is a solution of (1) if and only if
where I is an identity mapping, and is a subdifferential of In addition, if is L-Lipschitz continuous then the classical foward–backward algorithm [9] can be used to solve (1). It is defined as follows:
where is a suitable stepsize. This method has been extensively used due to its simplicity, as a result it has been improved by many works, as seen in [2,10,11,12]. One well-known method that has improved the convergence rate of (3) significantly is known as the fast iterative shrinkage-threshodling algorithm or FISTA. It was proposed by Beck and Teboulle [13], as seen in Algorithm 1.
| Algorithm 1. FISTA. |
| 1: Input |
| 2: for to k do |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: end for |
| 8: return |
They proved that FISTA has a better convergence rate than (3), however the convergence theorem of this method was not given. Recently, Laing and Schonlieb [14] modified FISTA by setting where and and proved its weak convergence theorem.
In the case that H is an infinite dimension Hilbert space, weak convergence results may not be enough, consequently modifications of some algorithms are needed to obtain strong convergence results. There are several ways to modify the methods for such purpose, for more information see [15,16,17,18]. One method that caught our attention was the viscosity-base inertial foward–backward algorithm (VIFBA) proposed by Verma et al. [19], as seen in Algorithm 2.
| Algorithm 2. VIFBA. |
| 1: Input -contractive mapping F, |
| 2: for to k do |
| 3: |
| 4: , |
| 5: . |
| 6: end for |
| 7: return |
They proved a strong convergence of this algorithm if the following conditions are satisfied for all :
- A1.
- and
- A2.
- and
- A3.
- and
Note that all the methods mentioned above require to be L-Lipschitz continuous, which is quite difficult to find in general. Therefore, some improvements are still desirable.
Very recently, Cruz and Nghia [20] proposed a linesearch technique which can be used to eliminate the L-Lipschitz continuous assumption of and replaced it with weaker assumptions. In their work, the following conditions are needed instead:
- B1.
- are proper lower semicontinuous convex functions with
- B2.
- f is differentiable on an open set containing and is uniformly continuous on any bounded subset of and maps any a bounded subset of to a bounded set in
The linesearch step is defined as Algorithm 3 as follows.
| Algorithm 3. Linesearch 1. |
| 1: Input , and |
| 2: set |
| 3: while do |
| 4: |
| 5: end while |
| 6: return |
They also proved that Linesearch 1 stops after finitely many steps, and proposed Algorithm 4 as follows.
| Algorithm 4. |
| 1: Input , and |
| 2: for to k do |
| 3: Linesearch 1 |
| 4: |
| 5: end for |
| 6: return |
They also proved its weak convergence theorem. Again the weak convergence may not be enough in the context of infinite dimension space.
As we know, most of the work related to a convex minimization problem assumes the L-Lipschitz continuity of . This restriction can be relaxed using a linesearch technique. So, we are motivated to establish a novel accelerated algorithm for solving a convex minimization problem (1), which employs a linesearch technique introduced by Cruz and Nghia [20] together with VIFBA [19]. The novelty of our proposed method is a suitable combination of the two methods to obtain a fast and efficient method for solving (1). We improve Algorithm 4 by adding an inertial step, which enhances the performance of the algorithm. We also prove its strong convergence theorem under weaker assumptions on the control conditions than that of VIFBA. More precisely, we can eliminate the assumption A2 and replace A3 with a weaker assumption. As applications, we apply our main result to solve a data classification problem and a regression of a sine function. Then we compare the performance of our algorithm with FISTA, VIFBA, and Algorithm 4.
This work is organized as follows: In Section 2, we recall some useful concepts related to the topic. In Section 3, we provide a new algorithm and prove its strong convergence to a solution of (1). In Section 4, we conduct some numerical experiments with a data classification problem and a regression of a sine function and compare the performance of each algorithm (FISTA, VIFBA, Algorithms 4 and 5). Finally, the conclusion of this work is in Section 5.
2. Preliminaries
In this section, we review some important tools which will be used in the later sections. Throughout this paper, we denote and as strong and weak convergence of to respectively.
A mapping is said to be L-Lipschitz continuous if there exists such that
For a subdifferential of h at x is defined as follows:
We have known from [21] that a subdifferential is maximal monotone. Moreover, a graph of is demiclosed, i.e., for any sequence such that converges weakly to u and converges strongly to we have .
The proximal operator, with , is single-valued with a full domain. Moreover, the following is satisfied:
The following lemmas are crucial for the main results.
Lemma 1
([22]). Let be two proper lower semicontinuous convex functions with and Then for any and we have
Lemma 2
([23]). Let H be a real Hilbert space. Then the following holds, for all and ,
- (i)
- (ii)
- (iii)
Lemma 3
([24]). Let be a sequence of real numbers such that there exists a subsequence of such that for all Then there exists a nondecreasing sequence of such that and for all sufficiently large the following holds:
Lemma 4
([25]). Let be a sequence of nonnegative real numbers, a sequence in with , a sequence of nonnegative real numbers with and a sequence of real numbers with Suppose that the following holds
for all , then
3. Main Results
In this section, we assume the existence of a solution of (1) and denote the set of all such solutions. It is known that is closed and convex. We propose a new algorithm, by combining a linesearch technique (Linesearch 1) with VIFBA, as seen in Algorithm 5. A diagram of this algorithm can be seen in Figure 1.
| Algorithm 5: |
| 1: Input -contractive mapping number of iterations. |
| 2: for to k do |
| 3: |
| 4: , |
| 5: Linesearch 1, |
| 6: . |
| 7: end for |
| 8: return |
Figure 1.
Diagram of Algorithm 5.
We prove a strong convergence result of Algorithm 5 in Theorem 1 as follows.
Theorem 1.
Let H be a Hilbert space, proper lower-semicontinuous convex, proper convex differentiable with being uniformly continuous on any bounded subset of Suppose the following holds:
- C1.
- C2.
Then a sequence generated by Algorithm 5 converges strongly to
Proof.
Since is closed and convex, a mapping yields a fixed point. Let by the definition of and , we obtain the following, for all :
for some The following also holds:
Next, we prove the following
Hence, by using the above inequalities and the definition of , we have
Consequently,
Since
Furthermore, it follows from that
Inductively, we have and hence is bounded. Furthermore, by using (5) and (6), and are also bounded. To show the convergence of we divide the proof into two cases.
Case 1 There exists such that for all . So for some From (5) and (6), and the fact that
we have Using (9), we have Since is bounded, there exists a subsequence of such that for some and the following holds:
We claim that In order to prove this, we need to consider two cases of The first case, if for finitely many Then, without loss of generality, we can assume that for all . From the definition of we have
The uniform continuity of implies that We know that
Since is demiclosed, we can have that and hence
The second case, there exists a subsequence of such that for all Let = and From the definition of , we have
Moreover, from Lemma 1 we have
which implies that . Since is uniformly continuous, we also have as Combining this with (10), we obtain as Again, we know that
The demiclosedness of implies that and hence Therefore
Hence,
We set and in Lemma 4. Since and we have Consequently, Lemma 4 is applicable and hence that is converges strongly to
Case 2 There exists a subsequence of such that for all From Lemma 3, there exists a nondecreasing sequence of such that and the following holds, for all sufficiently large
So, as Since is bounded, there exists a subsequence such that for some and
Using the same argument as in case 1, we have that and
Moreover, it follows from (11) that
Consequently, Hence,
Thus, we can conclude that converges strongly to , and the proof is complete. □
Remark 1.
We observe that we can prove our main result, Theorem 1, without the condition and use the weaker condition instead of while VIFBA requires all of these conditions.
4. Applications to Data Classification and Regression Problems
As mentioned in the literature, many real world problems can be formulated in the form of a convex minimization problem. So, in this section, we illustrate the reformulation process of some problems in machine learning, namely classification and regression problems, into a convex minimization problem, and apply our proposed algorithm to solve such problems. We also show that our proposed method is more efficient than some methods mentioned in the literature.
First, we give a brief concept of extreme learning machine for data classification and regression problems, then we apply our main result to solve these two problems by conducting some numerical experiments. We also compare the performance of FISTA, VIFBA, Algorithms 4 and 5.
Extreme learning machine (ELM). Let be a training set of N distinct samples, then is an input data and is a target. For any single hidden layer of ELM, the output of the i-th hidden node is
where G is an activate function, and are parameters of the i-th hidden node. The output function of ELM for SLFNs with M hidden nodes is
where is the output weight of the i-th hidden node. The hidden layer output matrix is defined as follows:
The main goal of ELM is to find such that where is the training data. In some cases, finding , where is the Moore–Penrose generalized inverse of maybe a difficult task when does not exist. Thus, finding such solution by means of convex minimization can overcome such difficulty.
In this section, we conduct some experiments on regression and classification problems, the problem is formulated as the following convex minimization problem:
where is a regularization parameter. This problem is called the least absolute shrinkage and selection operator (LASSO) [26]. In this case and . We note that, in our experiments, FISTA and VIFBA can be used to solve the problems, since the L-Lipschitz constants of the problems exist. However, FISTA and VIFBA fail to solve problems in which L-Lipschitz constants do not exist, while Algorithms 4 and 5 succeed.
4.1. Regression of a Sine Function
Throughout Section 4.1 and Section 4.2, all parameters are chosen to satisfy all the hypotheses of Theorem 1. All results are performed on Intel Core i5-7500 CPU with 16GB RAM and GeForce GTX 1060 6GB GPU.
As seen in Table 1, we create randomly 10 distinct points which value between , then we create the training set and a graph of a sine function on as the target. The activation function is sigmoid, number of hidden nodes and regularization parameter We use FISTA, VIFBA, Algorithms 4 and 5 to predict a sine function with 10 training points.
Table 1.
Detail about the regression of a sine function experiment.
The first experiment is to compare the performance of Algorithm 5 with different c-contractive mapping so we can observe if F affects the performance of Algorithm 5. We use mean square error (MSE) as a measure defined as follows:
By setting and the inertial parameter and MSE as the stopping criteria, we obtain the results as seen in Table 2.
Table 2.
Numerical results of c-contractive mapping.
We observe that Algorithm 5 performs better when c is closer to 1.
In the second experiment, we compare the performance of Algorithm 5 with different inertial parameters in Theorem 1, namely
It can be shown that and satisfy C2. By setting and MSE as the stopping criteria, we obtain the results, as seen in Table 3.
Table 3.
Numerical results of each inertial parameter.
We can clearly see that significantly improves the performance of Algorithm 5. Although, converges to 0 as , we observe that the behavior of is different form and at the first few steps of the iteration, i.e., is extremely close to 1 while and are far away from 1. Based on this experiment, we choose as our default inertial parameter for later experiments.
The third experiment, we compare the performance of FISTA, VIFBA, Algorithms 4 and 5. As in Table 4, we set the following parameters for each algorithm:
Table 4.
Chosen parameters of each algorithm.
By setting MSE as the stopping criteria, we obtain the results, as seen in Table 5.
Table 5.
Numerical results of a regression of a sine function with the stopping criteria.
We observe that Algorithm 5 takes only 129 iterations while FISTA, VIFBA and Algorithm 4 take a higher number of iterations, and Algorithm 5 uses a training time less than Algorithm 4.
Next, we compare each algorithm at the 3000th iteration with different kinds of measures, namely mean absolute error (MAE) and root mean squared error (RMSE) defined as follows:
The results can be seen in Table 6.
Table 6.
Numerical results of a regression of a sine function at the 3000th iteration.
We observe from Table 6 that Algorithm 5 has the lowest MAE and RMSE, but takes the longest training time. In Figure 2, we observe that Algorithm 5 outperforms other algorithms in the regression of a graph of a sine function under the small number of iterations. In Figure 3, it is shown that Algorithm 5, FISTA and VIFBA have a better performance in the regression of a graph than Algorithm 4 when the number of iterations is higher.
Figure 2.
A regression of a sine function at the 130th iteration.
Figure 3.
A regression of a sine function at the 3000th iteration.
4.2. Data Classification
In this experiment, we classify the type of Iris plants from Iris dataset created by Fisher [27]. As shown in Table 7, this dataset contains 3 classes of 50 instances and each sample contains four attributes.
Table 7.
Iris dataset.
We also would like to thank https://archive.ics.uci.edu for providing the dataset.
With this dataset, we set sigmoid as an activation function, number of hidden nodes and regularization parameter We use FISTA, VIFBA, Algorithms 4 and 5 as the training algorithm to estimate the optimal weight The output data O of training and testing data are obtained by see Table 8 for more detail.
Table 8.
Details about the classification of Iris dataset experiment.
In the first experiment, we use the first 35 instances of each class as training data and the last 15 of each class as testing data, see Table 9 for detail.
Table 9.
Training and testing sets of the Iris dataset.
The accuracy of the output data is calculated by:
To compare the performance of FISTA, VIFBA, Algorithms 4 and 5, we choose parameters for each algorithm the same as in Table 4.
We first compare the accuracy of each method at the 700th iteration, and obtain the following results, as seen in Table 10.
Table 10.
The performance of each algorithm at the 700th iteration.
As we see, from Table 10, Algorithm 5 obtains the highest accuracy at 700th iterations. We use acc.train and acc.test for the accuracy of the training data set and testing data set, respectively.
Next we compare each method with the stopping criteria as acc.train > 90 and acc.test > 90, the results can be seen in Table 11.
Table 11.
The performance of each algorithm with the stopping criteria.
We observe from Table 11 that Algorithm 5 performs better than Algorithm 4.
In the next experiment, we use 10-fold stratified cross-validation to set up the training and testing data, see Table 12 for detail.
Table 12.
Training and testing sets for 10-fold stratified cross-validation.
We also use Average ACC and ERR to evaluate the performance of each algorithm.
where N is a number of sets considered during cross validation (), a number of correctly predicted data at fold i and a number of all data at fold i.
Let err = sum of errors in all 10 training sets, err = sum of errors in all 10 testing sets, sum of all data in 10 training sets and sum of all data in 10 testing sets. Define
where err = , and err = .
We choose the same parameters as in Table 4. We compare the accuracy at the 1000th iteration of each fold, and obtain the following results, as seen in Table 13.
Table 13.
The performance of each algorithm at the 1000th iteration with a 10-fold stratified cross-validation.
We observe from Table 13 that Algorithm 5 has higher average accuracy than Algorithm 4.
5. Conclusions
In this work, algorithms for solving a convex minimization problem (1) are studied. Many effective algorithms for solving this problem were proposed, most of them require a Lipschitz continuous assumption of By combining a linesearch technique introduced by Cruz and Nghia [20], and an iterative method VIFBA by Verma et al. [19], we establish a new algorithm that does not require a Lipschitz continuous assumption of As a result, it can be applied to solve problems in which Lipschitz constants do not exist, while VIFBA and FISTA cannot. Moreover, by viscosity approximation together with the inertial technique, our proposed algorithm has a better convergence behavior than Algorithm 4. A strong convergence of our proposed method is also proven under some control conditions that are weaker than that of VIFBA.
Our algorithm can be used to solve many real world problems such as image and signal processing, machine learning, especially regression and classification problems. To compare the performance of FISTA, VIFBA, Algorithm 4 and our proposed algorithm(Algorithm 5), we conduct numerical experiments on the latter problems. We observe from these experiments that Algorithms 4 and 5 take computational time longer than FISTA and VIFBA at the same number of iterations because the linesearch step (Linesearch 1) takes a long time to compute. In the experiments with the stopping criteria (Table 5 and Table 11), Algorithm 5 converges to a solution with a lower number of iterations than Algorithm 4 and hence performs better in terms of speed. We can also observe that Algorithm 5 performs decently in terms of accuracy, especially when compared with Algorithm 4.
For our future research, since FISTA performs better than Algorithm 5 in terms of speed, in order to compete with FISTA, we aim to find a new linesearch technique that takes less computational time than Linesearch 1 and hence decreases the computational time of Algorithm 5.
Author Contributions
Writing—review and editing, W.I.; supervision, S.S.; writing—original draft preparation, P.S.; software, D.C. All authors have read and agreed to the published version of the manuscript.
Funding
This work was funded by Chiang Mai University, Chiang Mai, Thailand.
Acknowledgments
This work was supported by Chiang Mai University, Chiang Mai, Thailand.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Byrne, C. Iterative oblique projection onto convex subsets and the split feasibility problem. Inverse Probl. 2002, 18, 441–453. [Google Scholar] [CrossRef]
- Combettes, P.L.; Wajs, V. Signal recovery by proximal forward–backward splitting. Multiscale Model. Simul. 2005, 4, 1168–1200. [Google Scholar] [CrossRef]
- Combettes, P.L.; Pesquet, J.C. Proximal Splitting Methods in Signal Processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering; Bauschke, H., Burachik, R., Combettes, P., Elser, V., Luke, D., Wolkowicz, H., Eds.; Springer: New York, NY, USA, 2011; pp. 185–212. [Google Scholar]
- Cholamjiak, P.; Shehu, Y. Inertial forward-backward splitting method in Banach spaces with application to compressed sensing. Appl. Math. 2019, 64, 409–435. [Google Scholar] [CrossRef]
- Szaleniec, M.; Tadeusiewicz, R.; Witkoa, M. How to select an optimal neural model of chemical reactivity? Neurocomputing 2008, 72, 241–256. [Google Scholar] [CrossRef]
- Szaleniec, J.; Wiatr, M.; Szaleniec, M.; Składzień, J.; Tomik, J.; Oleś, K.; Tadeusiewicz, R. Artificial neural network modelling of the results of tympanoplasty in chronic suppurative otitis media patients. Comput. Biol. Med. 2013, 43, 16–22. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl. Soft Comput. 2019, 84, 105740. [Google Scholar] [CrossRef]
- Pławiak, P.; Abdar, M.; Pławiak, J.; Makarenkov, V.; Acharya, U.R. DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Inf. Sci. 2020, 516, 401–418. [Google Scholar] [CrossRef]
- Lions, P.L.; Mercier, B. Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 1979, 16, 964–979. [Google Scholar] [CrossRef]
- Bussaban, L.; Suantai, S.; Kaewkhao, A. A parallel inertial S-iteration forward-backward algorithm for regression and classification problems. Carpathian J. Math. 2020, 36, 21–30. [Google Scholar]
- Moudafi, A.; Oliny, M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003, 155, 447–454. [Google Scholar] [CrossRef]
- Verma, M.; Shukla, K.K. A new accelerated proximal gradient technique for regularized multitask learning framework. Pattern Recogn. Lett. 2017, 95, 98–103. [Google Scholar] [CrossRef]
- Beck, A.; Teboulle, M. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2009, 2, 183–202. [Google Scholar] [CrossRef]
- Liang, J.; Schonlieb, C.B. Improving fista: Faster, smarter and greedier. arXiv 2018, arXiv:1811.01430. [Google Scholar]
- Moudafi, A. Viscosity approximation method for fixed-points problems. J. Math. Anal. Appl. 2000, 241, 46–55. [Google Scholar] [CrossRef]
- Nakajo, K.; Takahashi, W. Strong convergence theorems for nonexpansive mappings and nonexpansive semigroups. J. Math. Anal. Appl. 2003, 279, 372–379. [Google Scholar] [CrossRef]
- Takahashi, W.; Zembayashi, K. Strong Convergence Theorem by a New Hybrid Method for Equilibrium Problems and Relatively Nonexpansive Mappings. Fixed Point Theory Appl. 2008, 2008, 528476. [Google Scholar] [CrossRef]
- Halpern, B. Fixed points of nonexpansive maps. Bull. Am. Math. Soc. 1967, 73, 957–961. [Google Scholar] [CrossRef]
- Verma, M.; Sahu, D.R.; Shukla, K.K. VAGA: A novel viscosity-based accelerated gradient algorithm. Appl. Intell. 2018, 48, 2613–2627. [Google Scholar] [CrossRef]
- Bello Cruz, J.Y.; Nghia, T.T. On the convergence of the forward-backward splitting method with linesearches. Optim. Methods Softw. 2016, 31, 1209–1238. [Google Scholar] [CrossRef]
- Burachik, R.S.; Iusem, A.N. Set-Valued Mappings and Enlargements of Monotone Operators; Springer: Berlin, Germany, 2008. [Google Scholar]
- Huang, Y.; Dong, Y. New properties of forward-backward splitting and a practical proximal-descent algorithm. Appl. Math. Comput. 2014, 237, 60–68. [Google Scholar] [CrossRef]
- Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
- Mainge, P.E. Strong convergence of projected subgradient methods for nonsmooth and nonstrictly convex minimization. Set-Valued Anal. 2008, 16, 899–912. [Google Scholar] [CrossRef]
- Xu, H.K. Another control condition in an iterative method for nonexpansive mappings. Bull. Austral. Math. Soc. 2002, 65, 109–113. [Google Scholar] [CrossRef]
- Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
- Fisher, R.A. The use of multiple measurements in taxonomic problems. Ann. Eugen. 1936, 7, 179–188. [Google Scholar] [CrossRef]
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).