Abstract
This paper proposes a new accelerated fixed-point algorithm based on a double-inertial extrapolation technique for solving structured variational inclusion and convex bilevel optimization problems. The underlying framework leverages fixed-point theory and operator splitting methods to address inclusion problems of the form , where A is a cocoercive operator and B is a maximally monotone operator defined on a real Hilbert space. The algorithm incorporates two inertial terms and a relaxation step via a contractive mapping, resulting in improved convergence properties and numerical stability. Under mild conditions of step sizes and inertial parameters, we establish strong convergence of the proposed algorithm to a point in the solution set that satisfies a variational inequality with respect to a contractive mapping. Beyond theoretical development, we demonstrate the practical effectiveness of the proposed algorithm by applying it to data classification tasks using Deep Extreme Learning Machines (DELMs). In particular, the training processes of Two-Hidden-Layer ELM (TELM) models is reformulated as convex regularized optimization problems, enabling robust learning without requiring direct matrix inversions. Experimental results on benchmark and real-world medical datasets, including breast cancer and hypertension prediction, confirm the superior performance of our approach in terms of evaluation metrics and convergence. This work unifies and extends existing inertial-type forward–backward schemes, offering a versatile and theoretically grounded optimization tool for both fundamental research and practical applications in machine learning and data science.
Keywords:
variational inclusion; bilevel optimization; accelerated algorithm; data classification; Two-Hidden-Layer ELM (TELM) MSC:
47H10; 65K10; 90C25
1. Introduction
The convex bilevel optimization problem plays an important role in real-world applications such as image and signal processing, data classification problems, medical image, machine learning, and so on. Recently, deep learning has become an important tool in many areas, such as image classification, speech recognition, and medical data analysis.
The convex bilevel optimization consists of the following two levels.
The outer-level problem is
where is a strongly convex and differentiable function over a real Hilbert space and is the set of solutions to the inner-level problem:
where f is convex and differentiable and is the class of proper, lower semicontinuous, convex functions. The implicit nature of the constraint set makes the bilevel problem particularly challenging and well-suited for operator-theoretic approaches.
Various algorithms have been developed to solve Problems (1) and (2). Among these, the Bilevel Gradient Sequential Averaging Method (BiG-SAM, Algorithm 1) was proposed by Sabach and Shtern [] as follows:
| Algorithm 1 BiG-SAM |
|
They showed that , where is the set of all solutions to Problem (1).
The inertial technique for accelerating the convergence behavior of the algorithms was first proposed by Polyak []. Since then, this technique has been continuously employed in various algorithms.
Shehu et al. [] designed the inertial Bilevel Gradient Sequential Averaging Method (iBiG-SAM, Algorithm 2) as an extension of Algorithm 1, with inertial technique for improving its convergence rate.
| Algorithm 2 iBiG-SAM |
|
They further demonstrated that the sequence converges to some when the control sequence satisfies the following criteria:
To further improve the convergence performance of Algorithm 2, Duan and Zhang [] developed three related methods; namely, the alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM, Algorithm 3), multi-step inertial Bilevel Gradient Sequential Averaging Method (miBiG-SAM, Algorithm 4), and multi-step alternative inertial Bilevel Gradient Sequential Averaging Method (amiBiG-SAM, Algorithm 5), which were defined as follows:
| Algorithm 3 aiBiG-SAM |
|
| Algorithm 4 miBiG-SAM |
|
| Algorithm 5 amiBiG-SAM |
|
The convergence analysis revealed that Algorithms 3–5 achieve better performance than Algorithms 1 and 2 (see more details in Duan and Zhang []).
We note that Algorithms 1–5 were developed based on fixed-point techniques. Subsequently, viscosity approximation methods combined with the fixed-point method and inertial technique were employed to develop accelerated algorithms for solving convex bilevel optimization problems (see [,,]).
The convex minimization problem (2) is one of the most fundamental and crucial problems in applied mathematics, medical image, data science, data classification, and computer science settings.
It is well known that is a solution to Problem (2) if, and only if,
This leads to the more general framework of variational inclusion problems, which unifies many classes of problems by seeking a point , such that
where is a monotone operator and B is a Lipschitz continuous mapping. The solution set of Problem (5) is denoted by . Variational inclusion problems generalize fixed-point problems, monotone equations, and variational inequalities, and provide a flexible structure for handling nonsmooth terms and constraints.
The variational inclusion Problem (5) can be reformulated as the fixed point equation
where and for some .
Over the years, various iterative schemes have been proposed for solving variational inclusion Problem (5). A well known and extensively studied one is the forward–backward method (FBM), defined by
where is a positive step size.
To improve the performance of the classical forward–backward method in solving monotone inclusion problems, Moudafi and Oliny (2003) [] proposed an inertial variant known as the Inertial Forward–Backward Algorithm (IFBA). This method incorporates a momentum-like term inspired by inertial techniques and is designed for finding a zero of the sum of two monotone operators. The iterative scheme was given by
where and denotes the Lipschitz constant of the monotone operator A. The inclusion of the extrapolation parameter aims to accelerate the convergence rate of proposed algorithm. Under suitable assumptions on , they proved that the generated sequence converges weakly to a solution of the inclusion problem.
Recently, Peeyada [] proposed the Inertial Mann Forward–Backward Splitting Algorithm (IMFBSA, Algorithm 6) as a refined approach that combines Mann-type iterations with inertial extrapolation, as follows:
| Algorithm 6 IMFBSA |
|
Moreover, several authors have developed the algorithms by using multi-inertial forward–backward schemes for solving variational inclusion problems, which ensure convergence and demonstrate efficiency in practical applications such as image deblurring (see [,]). These developments illustrate the increasing interest in designing inertial-type iterative methods for monotone inclusion and convex optimization problems.
Building upon and inspired by the above-mentioned studies, in this work, we aim to propose a new accelerated fixed-point algorithm designed to address variational inclusion and convex bilevel optimization problems, and prove its strong convergence behavior, including comparison its effectiveness in data classification with the existing algorithms.
The structure of this paper is as follows. In Section 2, we introduce some fundamental definitions and key lemmas used in the later sections. The main theoretical contributions of our study are presented in Section 3. In Section 4, we apply the proposed algorithm to data classification problems using breast cancer and hypertension datasets, and compare its performance with other existing methods. Finally, the conclusion of our work is given in Section 5.
2. Preliminaries
Throughout this work, let be a real Hilbert space and a mapping. We use for strong convergence and for weak convergence.
Definition 1.
A mapping is called Lipschitzian if
for some . If , then T is a k-contraction, and if , then T is nonexpansive.
Definition 2.
Let and let . Then, T is β-cocoercive if is firmly nonexpansive, i.e.,
Definition 3.
Let and be closed and convex. Then, there is a unique point , such that
The mapping , defined by , is called the metric projection onto C.
We conclude this section with several auxiliary lemmas and propositions essential for supporting the main results.
Lemma 1
([]). Let and . Then, the following identities and inequality hold:
- (1)
- (2)
- (3)
Proposition 1.
Let be convex and let . Then,
Proposition 2
([]). Suppose is strongly convex with parameter and continuously differentiable, such that is Lipschitz continuous with constant . Then, the mapping is k-contraction for all , where and I is the identity mapping.
Lemma 2
([]). Let be a nonexpansive mapping with . If there exists a sequence such that and , then .
Lemma 3
([]). Let be a β-cocoercive mapping and a maximal monotone mapping. Then, we have
- (1)
- (2)
- .
Lemma 4
([]). Let and satisfy and
If for every subsequence of , such that , we have , then .
3. Main Results
In this section, we introduce the Double Inertial Viscosity Forward–Backward Algorithm (DIVFBA) which is a modification of Algorithm 6, introduced by Peeyada [], in order to accelerate its convergence by using a double inertial step at the first step and the viscosity approximation method at the final step. It is worth to mentioning that Algorithm 6 obtained only weak convergence, while we aim to prove strong convergence of our proposed algorithm. Moreover, we will compare the performance of our algorithm and the others in data classification in the next section.
Throughout this section, let be a k-contraction mapping with , A be a -cocoercive mapping on , and B be a maximal monotone operator from into , such that .
We are now ready to present our accelerated fixed-point algorithm (Algorithm 7).
| Algorithm 7 DIVFBA |
|
Theorem 1.
Let be a sequence generated by Algorithm 7, such that the following additional conditions hold:
- (i)
- ,
- (ii)
- and ,
- (iii)
- ,
- (iv)
- ,
- (v)
- .
Then, , such that .
Proof.
Let . Firstly, we prove that the sequence is bounded.
From (8), we have
From (10) and the nonexpansiveness of , we have
From (12), we obtain
By (6) and the condition (i), we have
Hence, there is , such that
Similarly, we also have that
and
From (17), we obtain
This implies that is bounded and so are , and .
By (8), we have
By the nonexpansiveness of and (21), we have
It follows from Lemma 1 and (23), that
Since
and
there exist the positive constants and , such that for all ,
We may deduce from (24) that for all ,
where .
Hence, we obtain
Suppose there is a subsequence of , satisfying
It follows from (27) that
where
By conditions (ii), (iii), and (iv), we obtain
which implies
From (8), we have
as .
We next show that .
Let be a subsequence of , such that
Since is bounded, there exists a subsequence of , such that . Without loss of generality, we may assume that .
From , we know that the mapping is nonexpansive. Due to (30) and (33), the following result is obtained:
Using Lemmas 2 and 3, we obtain
Since we have From , it is implied Proposition 1 that
By Lemma 4, we can conclude that □
Remark 1.
Note that Algorithm 7 is a modification of Algorithm 6 in order to accelerate its convergence by using a double inertial step at the first step and the viscosity approximation method at the final step. Moreover, our proposed algorithm has a strong convergence result while Algorithm 6 obtained only weak convergence. Furthermore, Algorithm 7 can be applied to solve convex bilevel optimization problems, as seen in Theorem 2, while Algorithm 6 cannot be used to solve such problems.
Next, we employ the Bilevel Double Inertial Forward–Backward Algorithm (BDIFBA, Algorithm 8) to solve the convex bilevel optimization Problem (1), by replacing A and B in Algorithm 7 with and , respectively.
| Algorithm 8 BDIFBA |
|
The following result is obtained directly by Theorem 1.
Theorem 2.
Let be a sequence generated by Algorithm 8 with the same condition as in Theorem 1. Then, where .
Proof.
Set in Theorem 1. From Proposition 2, we know that is a contraction. By Theorem 1, we obtain that where . From Proposition 1, it can be obtained that for any ,
hence , that is, □
4. Application
In this section, we apply our proposed algorithm to improve the training of deep learning models by reformulating their training tasks as structured convex optimization problems. Our approach is based on fixed-point theory, which provides strong theoretical guarantees for convergence and solution reliability. This makes the training process more stable, efficient, and robust, especially in the presence of noise or ill-conditioned data.
We focus on a class of models called Extreme Learning Machines (ELM) and their deeper extensions, Two-Hidden-Layer ELM (TELM). These models are known for their fast training and competitive accuracy. Unlike traditional neural networks, ELMs randomly assign hidden layer weights and only compute output weights, typically by solving a least-squares problem.
However, when the hidden layer output matrix is ill-conditioned or the data is noisy, direct pseudoinverse computations become unstable and prone to overfitting. To address this, we reformulate the training process as a convex minimization problem with regularization. This structure naturally fits into the framework of fixed-point problems, allowing us to apply our algorithm without relying on explicit matrix inversion.
4.1. Application to ELM
ELM is a neural network model initially proposed by Huang et al. []. ELM is well-known for its rapid training capability and strong generalization performance. By integrating our algorithm into the ELM framework, we aim to boost both optimization efficiency and predictive accuracy.
Let us define the training dataset as , consisting of s input–target pairs, where denotes the input vector and denotes the associated target output.
ELM is designed for Single-Layer Feedforward Networks (SLFNs) and operates based on the following functional form:
where is the predicted output, h denotes the number of hidden neurons, is the activation function, and are weight vectors for input and output connections of the j-th hidden node, and is the corresponding bias term.
Let the hidden layer output matrix be defined as
The training objective is to find a solution that best approximates the output target:
which can be compactly written in matrix form as
where is the output weight vector and is the desired output matrix.
To enhance generalization and reduce overfitting, a LASSO regularization term is introduced. The resulting optimization problem becomes
where denotes the -norm and is a regularization coefficient that controls sparsity.
4.2. Application to TELM
TELM is an extension of the traditional ELM that improves learning capacity by incorporating two hidden layers. Unlike conventional backpropagation-based multi-layer networks, TELM retains the fast training characteristics of ELM by leveraging analytic solutions in both stages. It is particularly suitable for modeling complex nonlinear relationships in high-dimensional data while avoiding the computational cost of iterative optimization.
A work by Janngam et al. [] demonstrated that TELM, when trained using their proposed algorithm, not only converges significantly faster than standard ELM but also achieves higher classification accuracy on various medical and benchmark datasets. Additionally, earlier work by Qu et al. [] showed that TELM consistently outperforms traditional ELM, especially in nonlinear and high-dimensional settings, by yielding better average accuracy with fewer hidden neurons.
These cumulative findings reinforce the choice of TELM as the core learning model for our study, particularly when enhanced with the proposed algorithm.
Let the training set be defined as , where is the input vector and is the corresponding target output.
- Stage 1: Initial Feature Transformation and Output Weights.
To simplify the initialization process, TELM begins by temporarily combining the two hidden layers into a single equivalent hidden layer. The combinated hidden layer matrix is defined as
where is the input matrix, is the randomly initialized weight matrix for the first hidden layer, is the bias matrix, and is the activation function.
The output weights connecting the hidden layer to the output layer are determined based on the linear system:
where is the target matrix.
We find the optimal weight u using Algorithms 7 and 8 for solving the convex optimization problem with LASSO regularization as follows:
where is the regularization parameter that controls model complexity and prevents overfitting.
- Stage 2: Separation and Refinement of Hidden Layers.
After computing the initial output weights u from the first stage using (52), the two hidden layers are separated to allow independent refinement.
To estimate the expected output of the second hidden layer, denoted as , we express that it satisfies the following equation:
However, rather than computing directly from matrix inversion, we apply our proposed algorithm to solve the following convex optimization problem with LASSO regularization:
where is the regularization parameter.
Next, TELM updates the weights and bias between the first and second hidden layers, denoted as and , respectively, using the expected output from (57). Ideally, the following equation describes the connection between layers:
However, since both and are unknown, solving (55) directly is not feasible. To address this, we reformulate the equation as
where is the extended input matrix and combines the weights and biases into a single matrix.
To estimate , we solve the following convex optimization problem with LASSO regularization:
where denotes the inverse of the activation function G, and is the regularization parameter.
Finally, using the estimated from (57), the refined output of the second hidden layer is computed as
where represents the updated hidden layer of the second hidden layer after adjusting the weights and biases.
- Final Stage: Output Layer Update.
Finally, TELM updates the output weight matrix , which connects the second hidden layer to the output layer, by solving
To obtain , we solve the following convex optimization problem using the LASSO technique:
where is the regularization parameter. Once is obtained, the predicted output matrix is computed as
This approach enhances numerical stability and improves the model ability to handle high-dimensional or noisy real-world data.
4.2.1. Experiments: Data Classification for Minimization Problems
Data classification is a fundamental task in machine learning, where the objective is to assign each input sample to one of several predefined categories. Common applications include medical diagnosis, object recognition, and fraud detection. In this work, we apply our proposed algorithm to train TELM for practical classification tasks.
To evaluate classification performance, we conducted experiments on three benchmark datasets and one real-world medical dataset. Each dataset was divided into 70% training and 30% testing sets. The details of the datasets are summarized in Table 1.
Table 1.
Summary of datasets used in the experiments.
- Breast Cancer Dataset: A widely used dataset containing features extracted from digitized images of breast masses, used to classify tumors as benign or malignant.
- Heart Disease Dataset: A standard dataset used to predict the presence of heart disease based on clinical attributes.
- Diabetes Dataset: Contains diagnostic data for predicting the onset of diabetes in patients.
- Hypertension Dataset: A real-world dataset collected by Sripat Medical Center, Faculty of Medicine, Chiang Mai University.
Table 2 summarizes the parameter settings for each algorithm compared in our experiments.
Table 2.
Parameter settings for each algorithm.
In addition, the following settings were consistently applied across all experimental setups:
- Regularization parameter: .
- Activation function: Sigmoid, .
- Number of hidden nodes: .
- Contraction mapping: .
- In Algorithm 6, is defined by
To assess and compare the classification performance of each algorithm, we employed four widely used evaluation metrics: accuracy, precision, recall, and F1-score.
Accuracy measures the proportion of correctly classified samples, both positive and negative, relative to the total number of samples. It is computed as
where and are the true positives and true negatives, respectively; is the number of false positives (incorrectly predicting a patient as diseased); and is the number of false negatives (failing to detect a diseased patient).
Precision reflects the proportion of true positives among all instances predicted as positive:
Recall, or sensitivity, represents the proportion of actual positive cases that are correctly identified:
F1-score is the harmonic mean of precision and recall, providing a balanced measure of model performance, particularly in imbalanced datasets:
The performance of each algorithm is analyzed at the 1000th iteration, as presented in Table 3. Four datasets, breast cancer, heart disease, diabetes, and hypertension, were utilized to evaluate and compare the effectiveness of Algorithms 6 and 7 using standard classification metrics including accuracy, precision, recall, and F1-score on both training and testing data.
Table 3.
Performance comparison between Algorithms 6 and 7 on each dataset.
The results indicate that Algorithm 7 consistently performs well across all datasets. In particular, in the hypertension dataset, which reflects real-world conditions, Algorithm 7 achieves high accuracy and balanced precision–recall performance. This demonstrates its strong generalization capability and suitability for real-world medical applications that require reliable predictions and low error sensitivity.
To evaluate model performance with respect to both goodness-of-fit and model complexity, we utilize the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). These criteria are defined as follows:
- Akaike Information Criterion (AIC):where k is the number of estimated parameters in the model and is the maximum value of the likelihood function.
- Bayesian Information Criterion (BIC):where n is the number of observations, k is the number of parameters, and is the maximum likelihood of the model.
Lower AIC and BIC values indicate better models in terms of balancing accuracy and simplicity.
To assess the consistency of the performance of model across multiple trials or datasets, we compute the mean and standard deviation (std) of AIC and BIC values.
- Mean AIC and BIC:
- Standard Deviation of AIC and BIC:
These statistics indicate the central tendency and dispersion of the AIC and BIC scores, where smaller standard deviations imply more stable model performance across different experiments.
To understand how well each algorithm fits the data without too much complexity, we compare their AIC and BIC values, as shown in Table 4. Both AIC and BIC are commonly used to measure how good a model is; lower values mean that the model is more efficient and avoids overfitting.
Table 4.
Statistical comparison of Algorithms 6 and 7 based on AIC and BIC values.
The results show that Algorithm 7 gives lower AIC and BIC values than Algorithm 6 for all datasets. This means that Algorithm 7 is simpler and better at handling the data. The difference is most noticeable in the hypertension dataset, which comes from real-world health data. These results confirm that Algorithm 7 is a strong choice for real-world applications, where the model needs to be both accurate and not too complicated.
4.2.2. Application to Convex Bilevel Optimization Problems
The TELM model can also be formulated within the framework of convex bilevel optimization to better capture hierarchical learning structures. In this setting, we interpret the output weight learning (final step of TELM) as the solution to a lower-level convex problem, and the optimization of the hidden transformation weights (e.g., ) as the upper-level objective.
In our TELM-based learning problem, this bilevel formulation arises naturally:
- The inner problem corresponds to learning the output weights u given the fixed transformation , and can be cast as a LASSO-type convex minimization:where is the second hidden layer output and is the target.
- The outer problem focuses on optimizing the hidden transformation weights based on the optimal solution from the inner problem. The upper-level loss is given by
Solving this bilevel problem directly is challenging due to the implicit constraint . However, by leveraging our proposed algorithm and proximal operator techniques, we can solve both levels efficiently and with guaranteed convergence under mild assumptions. This makes TELM highly suitable for structured learning tasks where the learning objectives are nested and interdependent.
To assess the performance of Algorithm 8 in solving convex bilevel optimization problems, we conducted experiments on the same datasets used in the convex optimization setting (see Section 4.2.1). These include the breast cancer, heart disease, diabetes, and hypertension datasets, with a 70%/30% split for training and testing, respectively.
We evaluated classification performance using the same metrics—accuracy, precision, recall, and F1-score—to ensure consistency across experiments.
In this bilevel setting, we compared our method against Algorithm 1 (BiGSAM), Algorithm 2 (iBiGSAM), Algorithm 3 (aiBiGSAM), Algorithm 4 (miBiGSAM), and Algorithm 5 (amiBiGSAM).
All algorithms were configured according to the parameter settings summarized in Table 5, ensuring fair and reproducible evaluation across all methods.
Table 5.
The setting of parameters for each algorithms.
In addition, the following settings were consistently applied across all experimental setups:
- Regularization parameter: .
- Activation function: Sigmoid, .
- Number of hidden nodes: .
To evaluate the effectiveness of the proposed algorithm (Algorithm 8), we conducted experiments on four datasets. Each algorithm was trained for 1000 iterations, and the performance was measured in terms of accuracy, precision, recall, and F1-score for both training and testing phases. The comparative results of all algortithms are summarized in Table 6.
Table 6.
Performance comparison of algorithms on each dataset.
As shown in Table 6, the proposed algorithm (Algorithm 8) consistently outperforms other methods across all datasets in both training and testing phases. In particular, for the breast cancer and diabetes datasets, Algorithm 8 achieves the highest test accuracy and F1-scores, demonstrating its strong generalization capability and classification performance.
Notably, in the hypertension dataset, which represents real-world medical data with high variability and complexity, the proposed method maintains superior accuracy and F1-score compared to baseline algorithms. This highlights the robustness and practical applicability of Algorithm 8 in real-world clinical settings.
Overall, the results support the effectiveness and stability of the proposed algorithm, making it a promising approach for medical classification tasks across diverse domains.
To statistically evaluate the performance of each algorithm, we computed the AIC and BIC values, including their mean and standard deviation, for both the training and testing phases. The experiments were conducted on four datasets: breast cancer, heart disease, diabetes, and hypertension. The summarized results presented in Table 7 serve to compare the statistical efficiency of each algorithm.
Table 7.
Comparison of AIC and BIC scores (mean and standard deviation) for all algorithms.
According to the results in Table 7, the proposed algorithm (Algorithm 8) consistently shows lower AIC and BIC values across several datasets. This means that the model fits well and is less likely to overfit the data. In particular, in the hypertension dataset, which contains real and complex medical data, Algorithm 8 achieves the lowest and most consistent scores. This shows that the algorithm can handle real-world situations effectively and gives reliable results.
From Table 6 and Table 7, it is evident that Algorithm 8 consistently outperforms all variants of BiG-SAM, including the improved versions (Algorithms 2–5).
In all datasets considered in this work (see Table 6 and Table 7), Algorithm 8 achieves the highest classification performance and also yields the lowest AIC and BIC scores, suggesting a better model fit with lower complexity. Moreover, its standard deviations are relatively small, indicating robustness and stability across different runs. Therefore, Algorithm 8 can be considered the most effective and reliable algorithm among those evaluated.
5. Conclusions
We proposed the Double Inertial Viscosity Forward–Backward Algorithm (DIVFBA) and the Bilevel Double Inertial Forward–Backward Algorithm (BDIFBA), which is the modification of Algorithm 6, to solve variational inclusion and bilevel optimization problems, respectively. The proposed algorithms ensure strong convergence and achieve higher accuracy and stability compared to existing algorithms. We applied the proposed algorithms to train TELM models, where they consistently outperformed other existing algorithms in terms of evaluation metrics, and statistic values. These results confirm the effectiveness of the proposed algorithms in practical applications.
Author Contributions
Software, E.P.; writing—original draft, P.S.-j.; writing—review and editing, S.S. All authors have read and agreed to the published version of the manuscript.
Funding
The APC was funded by the Fundamental Fund 2025, Chiang Mai University.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Acknowledgments
This research was partially supported by Chiang Mai University and the Fundamental Fund 2025, Chiang Mai University. The first author would like to thank the CMU Presidential Scholarship for the financial support.
Conflicts of Interest
The authors declare no conflicts of interest and the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
- Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef]
- Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
- Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 2021, 36, 1–19. [Google Scholar] [CrossRef]
- Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2023, 72, 2517–2545. [Google Scholar] [CrossRef]
- Wattanataweekul, R.; Janngam, K.; Suantai, S. A novel two-step inertial viscosity algorithm for bilevel optimization problems applied to image recovery. Mathematics 2023, 11, 3518. [Google Scholar] [CrossRef]
- Sae-jia, P.; Suantai, S. A new two-step inertial algorithm for solving convex bilevel optimization problems with application in data classification problems. AIMS Math. 2024, 9, 8476–8496. [Google Scholar] [CrossRef]
- Sae-jia, P.; Suantai, S. A novel accelerated fixed point algorithm for convex bilevel optimization problems with applications to machine learning for data classification. J. Nonlinear Funct. Anal. 2025, 2025, 1–21. [Google Scholar] [CrossRef]
- Moudafi, A.; Oliny, M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003, 155, 447–454. [Google Scholar] [CrossRef]
- Peeyada, P.; Suparatulatorn, R.; Cholamjiak, W. An inertial Mann forward-backward splitting algorithm of variational inclusion problems and its applications. Chaos Solitons Fractals 2022, 158, 112048. [Google Scholar] [CrossRef]
- Kesornprom, S.; Peeyada, P.; Cholamjiak, W.; Ngamkhum, T.; Jun-on, N. New iterative method with inertial technique for split variational inclusion problem to classify tpack level of pre-service mathematics teachers. Thai J. Math. 2023, 21, 351–365. [Google Scholar]
- Inkrong, P.; Cholamjiak, P. Multi-Inertial Forward-Backward Methods for Solving Variational Inclusion Problems and Applications in Image Deblurring. Thai J. Math. 2025, 23, 263–277. [Google Scholar]
- Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
- Hanjing, A.; Thongpaen, P.; Suantai, S. A new accelerated algorithm with a linesearch technique for convex bilevel optimization problems with applications. AIMS Math. 2024, 9, 22366–22392. [Google Scholar] [CrossRef]
- Goebel, K.; Kirk, W.A. Topics in Metric Fixed Point Theory; No. 28; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
- López, G.; Martín-Márquez, V.; Wang, F.; Xu, H.K. Forward-backward splitting methods for accretive operators in Banach spaces. Abstr. Appl. Anal. 2012, 2012, 109236. [Google Scholar] [CrossRef]
- Saejung, S.; Yotkaew, P. Approximation of zeros of inverse strongly monotone operators in Banach spaces. Nonlinear Anal. Theory Methods Appl. 2012, 75, 742–750. [Google Scholar] [CrossRef]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
- Janngam, K.; Suantai, S.; Wattanataweekul, R. A novel fixed-point based two-step inertial algorithm for convex minimization in deep learning data classification. AIMS Math. 2025, 10, 6209–6232. [Google Scholar] [CrossRef]
- Qu, B.Y.; Lang, B.F.; Liang, J.J.; Qin, A.K.; Crisalle, O.D. Two-hidden-layer extreme learning machine for regression and classification. Neurocomputing 2016, 175, 826–834. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).