A New Accelerated Forward–Backward Splitting Algorithm for Monotone Inclusions with Application to Data Classification

Puntita Sae-jia; Eakkpop Panyahan; Suthep Suantai

doi:10.3390/math13172783

,

and

¹

PhD Degree Program in Mathematics, Department of Mathematics, Faculty of Science, Chiang Mai University, Under the CMU Presidential Scholarship, Chiang Mai 50200, Thailand

²

Department of Statistics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

³

Research Center in Optimization and Computational Intelligence for Big Data Prediction, Department of Mathematics, Faculty of Science, Chiang Mai University, Chiang Mai 50200, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(17), 2783;https://doi.org/10.3390/math13172783

This article belongs to the Special Issue Variational Analysis, Optimization, and Equilibrium Problems

Version Notes

Order Reprints

Abstract

This paper proposes a new accelerated fixed-point algorithm based on a double-inertial extrapolation technique for solving structured variational inclusion and convex bilevel optimization problems. The underlying framework leverages fixed-point theory and operator splitting methods to address inclusion problems of the form

0 \in (A + B) (x)

, where A is a cocoercive operator and B is a maximally monotone operator defined on a real Hilbert space. The algorithm incorporates two inertial terms and a relaxation step via a contractive mapping, resulting in improved convergence properties and numerical stability. Under mild conditions of step sizes and inertial parameters, we establish strong convergence of the proposed algorithm to a point in the solution set that satisfies a variational inequality with respect to a contractive mapping. Beyond theoretical development, we demonstrate the practical effectiveness of the proposed algorithm by applying it to data classification tasks using Deep Extreme Learning Machines (DELMs). In particular, the training processes of Two-Hidden-Layer ELM (TELM) models is reformulated as convex regularized optimization problems, enabling robust learning without requiring direct matrix inversions. Experimental results on benchmark and real-world medical datasets, including breast cancer and hypertension prediction, confirm the superior performance of our approach in terms of evaluation metrics and convergence. This work unifies and extends existing inertial-type forward–backward schemes, offering a versatile and theoretically grounded optimization tool for both fundamental research and practical applications in machine learning and data science.

Keywords:

variational inclusion; bilevel optimization; accelerated algorithm; data classification; Two-Hidden-Layer ELM (TELM)

MSC:

47H10; 65K10; 90C25

1. Introduction

The convex bilevel optimization problem plays an important role in real-world applications such as image and signal processing, data classification problems, medical image, machine learning, and so on. Recently, deep learning has become an important tool in many areas, such as image classification, speech recognition, and medical data analysis.

The convex bilevel optimization consists of the following two levels.

The outer-level problem is

min_{x \in Γ} ϕ (x),

(1)

where

ϕ : H \to R

is a strongly convex and differentiable function over a real Hilbert space

H

and

Γ

is the set of solutions to the inner-level problem:

\underset{x \in H}{arg min} {f (x) + g (x)},

(2)

where f is convex and differentiable and

g \in Γ_{0} (H)

is the class of proper, lower semicontinuous, convex functions. The implicit nature of the constraint set

Γ

makes the bilevel problem particularly challenging and well-suited for operator-theoretic approaches.

Various algorithms have been developed to solve Problems (1) and (2). Among these, the Bilevel Gradient Sequential Averaging Method (BiG-SAM, Algorithm 1) was proposed by Sabach and Shtern [] as follows:

Algorithm 1 BiG-SAM

Input: $x_{1} \in R^{m}, α_{n} \in (0, 1), γ_{n} \in (0, \frac{1}{L_{f}})$ and $s \in (0, \frac{2}{L_{ϕ} + σ})$ where $L_{f}$ and $L_{ϕ}$ are the Lipschitz constants of $\nabla f$ and $\nabla ϕ$ , respectively.
For $n \geq 1$ :
Compute:

$\{\begin{matrix} \begin{matrix} y_{n} & = {prox}_{γ_{n} g} (x_{n} - γ_{n} \nabla f (x_{n})), \\ x_{n + 1} & = α_{n} (x_{n} - s \nabla ϕ (x_{n})) + (1 - α_{n}) y_{n}, \end{matrix} \end{matrix}$

where $\nabla f$ and $\nabla ϕ$ are gradients of f and $ϕ$ , respectively.

They showed that

x_{n} \to x \in Ω

, where

Ω

is the set of all solutions to Problem (1).

The inertial technique for accelerating the convergence behavior of the algorithms was first proposed by Polyak []. Since then, this technique has been continuously employed in various algorithms.

Shehu et al. [] designed the inertial Bilevel Gradient Sequential Averaging Method (iBiG-SAM, Algorithm 2) as an extension of Algorithm 1, with inertial technique for improving its convergence rate.

Algorithm 2 iBiG-SAM

Input: $x_{0}, x_{1} \in R^{m}, α \geq 3, α_{n} \in (0, 1), γ_{n} \in (0, \frac{2}{L_{f}}), s \in (0, \frac{2}{L_{ϕ} + σ}]$ where $L_{f}$ and $L_{ϕ}$ are the Lipschitz constants of $\nabla f$ and $\nabla ϕ$ , respectively.
For $n \geq 1$ :
Choose: $θ_{n} \in [0, \bar{θ_{n}}]$ where $\bar{θ_{n}}$ is defined by

$\bar{θ_{n}} : = \{\begin{matrix} \begin{matrix} min {\frac{n - 1}{n + α - 1}, \frac{ξ_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}, \\ \frac{n - 1}{n + α - 1}, & otherwise . \end{matrix} \end{matrix}$
Compute:

$\{\begin{matrix} \begin{matrix} y_{n} & = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ t_{n} & = {prox}_{γ_{n} g} (y_{n} - γ_{n} \nabla f (y_{n})), \\ w_{n} & = y_{n} - s \nabla ϕ (y_{n}), \\ x_{n + 1} & = α_{n} w_{n} + (1 - α_{n}) t_{n}, \end{matrix} \end{matrix}$

where $\nabla f$ and $\nabla ϕ$ are gradients of f and $ϕ$ , respectively.

They further demonstrated that the sequence

{x_{n}}

converges to some

x \in Ω

when the control sequence

{α_{n}}

satisfies the following criteria:

lim_{n \to \infty} α_{n} = 0 and \sum_{n = 1}^{\infty} α_{n} = \infty .

(3)

To further improve the convergence performance of Algorithm 2, Duan and Zhang [] developed three related methods; namely, the alternated inertial Bilevel Gradient Sequential Averaging Method (aiBiG-SAM, Algorithm 3), multi-step inertial Bilevel Gradient Sequential Averaging Method (miBiG-SAM, Algorithm 4), and multi-step alternative inertial Bilevel Gradient Sequential Averaging Method (amiBiG-SAM, Algorithm 5), which were defined as follows:

Algorithm 3 aiBiG-SAM

Input: $x_{0}, x_{1} \in R^{m}, α \geq 3, ξ > 0, γ_{n} \in (0, \frac{2}{L_{f}}), ξ_{n} \in (0, 1), s \in (0, \frac{2}{L_{ϕ} + σ}]$ where $L_{f}$ and $L_{ϕ}$ are the Lipschitz constants of $\nabla f$ and $\nabla ϕ$ , respectively. Let ${α_{n}}$ be a sequence in $(0, 1)$ satisfying (3).
For $n \geq 1$ :
Step 1. Compute:

$y_{n} = \{\begin{matrix} \begin{matrix} x_{n} + θ_{n} (x_{n} - x_{n - 1}), & if n is odd, \\ x_{n}, & otherwise . \end{matrix} \end{matrix}$

When n is odd, choose $θ_{n}$ such that $0 \leq | θ_{n} | \leq \bar{θ_{n}}$ with $\bar{θ_{n}}$ defined by

$\bar{θ_{n}} : = \{\begin{matrix} \begin{matrix} min {\frac{n}{n + α - 1}, \frac{ξ_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}, \\ \frac{n}{n + α - 1}, & otherwise . \end{matrix} \end{matrix}$

When n is even, $θ_{n} = 0 .$
Step 2. Compute:

$\{\begin{matrix} \begin{matrix} t_{n} & = {prox}_{γ_{n} g} (y_{n} - γ_{n} \nabla f (y_{n})), \\ w_{n} & = y_{n} - s \nabla ϕ (y_{n}), \\ x_{n + 1} & = α_{n} w_{n} + (1 - α_{n}) t_{n}, \end{matrix} \end{matrix}$

where $\nabla f$ and $\nabla ϕ$ are gradients of f and $ϕ$ , respectively.
Step 3. If $∥ x_{n} - x_{n - 1} ∥ < ξ$ , then stop. Otherwise, set $n = n + 1$ and go to Step 1.

Algorithm 4 miBiG-SAM

Input: $x_{0}, x_{1} \in R^{m}, α \geq 3, ξ > 0, γ_{n} \in (0, \frac{2}{L_{f}}), ξ_{n} \in (0, 1), s \in (0, \frac{2}{L_{ϕ} + σ}]$ where $L_{f}$ and $L_{ϕ}$ are the Lipschitz constants of $\nabla f$ and $\nabla ϕ$ , respectively. Let ${α_{n}}$ be a sequence in $(0, 1)$ satisfying (3).
For $n \geq 1$ :
Step 1. Given $x_{n}, x_{n - 1}, \dots, x_{n - q + 1}$ and compute

$y_{n} = x_{n} + \sum_{i \in Q} θ_{i, n} (x_{n - i} - x_{n - 1 - i}),$

where $Q = {0, 1, \dots, q - 1}$ . Choose $θ_{i, n}$ such that $0 \leq | θ_{i, n} | \leq {\bar{θ}}_{n}$ with ${\bar{θ}}_{n}$ defined by

${\bar{θ}}_{n} : = \{\begin{matrix} \begin{matrix} min {\frac{n}{n + α - 1}, \frac{ξ_{n}}{\sum_{i \in Q} ∥ x_{n - i} - x_{n - 1 - i} ∥}}, & if \sum_{i \in Q} ∥ x_{n - i} - x_{n - 1 - i} ∥ \neq 0, \\ \frac{n}{n + α - 1}, & otherwise . \end{matrix} \end{matrix}$
Step 2. Compute:

$\{\begin{matrix} \begin{matrix} t_{n} & = {prox}_{γ_{n} g} (y_{n} - γ_{n} \nabla f (y_{n})), \\ w_{n} & = y_{n} - s \nabla ϕ (y_{n}), \\ x_{n + 1} & = α_{n} w_{n} + (1 - α_{n}) t_{n}, \end{matrix} \end{matrix}$

where $\nabla f$ and $\nabla ϕ$ are gradients of f and $ϕ$ , respectively.
Step 3. If $∥ x_{n} - x_{n - 1} ∥ < ξ$ , then stop. Otherwise, set $n = n + 1$ and go to Step 1.

Algorithm 5 amiBiG-SAM

Input: $x_{0}, x_{1} \in R^{m}, α \geq 3, ξ > 0, γ_{n} \in (0, \frac{2}{L_{f}}), ξ_{n} \in (0, 1), s \in (0, \frac{2}{L_{ϕ} + σ}]$ where $L_{f}$ and $L_{ϕ}$ are the Lipschitz constants of $\nabla f$ and $\nabla ϕ$ , respectively. Let ${α_{n}}$ be a sequence in $(0, 1)$ satisfying (3).
For $n \geq 1$ :
Step 1. Given $x_{n}, x_{n - 1}, \dots, x_{n - q + 1}$ and compute

$y_{n} = \{\begin{matrix} \begin{matrix} x_{n} + \sum_{i \in Q} θ_{i, n} (x_{n - i} - x_{n - 1 - i}), & if n is odd, \\ x_{n}, & otherwise, \end{matrix} \end{matrix}$

where $Q = {0, 1, \dots, q - 1}$ . When n is odd, choose $θ_{i, n}$ such that $0 \leq | θ_{i, n} | \leq {\bar{θ}}_{n}$ with ${\bar{θ}}_{n}$ defined by

${\bar{θ}}_{n} : = \{\begin{matrix} \begin{matrix} min {\frac{n}{n + α - 1}, \frac{ξ_{n}}{\sum_{i \in Q} ∥ x_{n - i} - x_{n - 1 - i} ∥}}, & if \sum_{i \in Q} ∥ x_{n - i} - x_{n - 1 - i} ∥ \neq 0, \\ \frac{n}{n + α - 1}, & otherwise . \end{matrix} \end{matrix}$

When n is even, $θ_{n} = 0 .$
Step 2. Compute:

$\{\begin{matrix} \begin{matrix} t_{n} & = {prox}_{γ_{n} g} (y_{n} - γ_{n} \nabla f (y_{n})), \\ w_{n} & = y_{n} - s \nabla ϕ (y_{n}), \\ x_{n + 1} & = α_{n} w_{n} + (1 - α_{n}) t_{n}, \end{matrix} \end{matrix}$

where $\nabla f$ and $\nabla ϕ$ are gradients of f and $ϕ$ , respectively.
Step 3. If $∥ x_{n} - x_{n - 1} ∥ < ξ$ , then stop. Otherwise, set $n = n + 1$ and go to Step 1.

The convergence analysis revealed that Algorithms 3–5 achieve better performance than Algorithms 1 and 2 (see more details in Duan and Zhang []).

We note that Algorithms 1–5 were developed based on fixed-point techniques. Subsequently, viscosity approximation methods combined with the fixed-point method and inertial technique were employed to develop accelerated algorithms for solving convex bilevel optimization problems (see [,,]).

The convex minimization problem (2) is one of the most fundamental and crucial problems in applied mathematics, medical image, data science, data classification, and computer science settings.

It is well known that

x^{*}

is a solution to Problem (2) if, and only if,

0 \in \nabla f (x^{*}) + \partial g (x^{*})

(4)

This leads to the more general framework of variational inclusion problems, which unifies many classes of problems by seeking a point

x^{*}

, such that

0 \in A (x^{*}) + B (x^{*})

(5)

where

A : H \to 2^{H}

is a monotone operator and B is a Lipschitz continuous mapping. The solution set of Problem (5) is denoted by

{(A + B)}^{- 1} (0)

. Variational inclusion problems generalize fixed-point problems, monotone equations, and variational inequalities, and provide a flexible structure for handling nonsmooth terms and constraints.

The variational inclusion Problem (5) can be reformulated as the fixed point equation

x^{*} = T x^{*},

where

T : = J_{γ}^{B} (I - γ A)

and

J_{γ}^{B} (x) : = {(I + γ B)}^{- 1} (x)

for some

γ > 0

.

Over the years, various iterative schemes have been proposed for solving variational inclusion Problem (5). A well known and extensively studied one is the forward–backward method (FBM), defined by

x_{1} \in H, x_{n + 1} = J_{γ_{n}}^{B} (I - γ_{n} A) x_{n}, n \geq 1,

where

γ_{n}

is a positive step size.

To improve the performance of the classical forward–backward method in solving monotone inclusion problems, Moudafi and Oliny (2003) [] proposed an inertial variant known as the Inertial Forward–Backward Algorithm (IFBA). This method incorporates a momentum-like term inspired by inertial techniques and is designed for finding a zero of the sum of two monotone operators. The iterative scheme was given by

\begin{matrix} x_{0}, x_{1} \in H, y_{n} & = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ x_{n + 1} & = J_{γ_{n}}^{B} (y_{n} - γ_{n} A x_{n}), \forall n \geq 1, \end{matrix}

where

γ_{n} \in (0, \frac{2}{L_{A}})

and

L_{A}

denotes the Lipschitz constant of the monotone operator A. The inclusion of the extrapolation parameter

θ_{n}

aims to accelerate the convergence rate of proposed algorithm. Under suitable assumptions on

θ_{n}

, they proved that the generated sequence converges weakly to a solution of the inclusion problem.

Recently, Peeyada [] proposed the Inertial Mann Forward–Backward Splitting Algorithm (IMFBSA, Algorithm 6) as a refined approach that combines Mann-type iterations with inertial extrapolation, as follows:

Algorithm 6 IMFBSA

Input: $x_{0}, x_{1} \in H, {α_{n}} \subset (0, 1), {γ_{n}} \subset (0, 2 β), {θ_{n}} \subset [0, \infty)$ satisfies the following conditions

$0 < lim inf_{n \to \infty} γ_{n} \leq lim sup_{n \to \infty} γ_{n} < 2 β and \sum_{n = 1}^{\infty} θ_{n} ∥ x_{n} - x_{n - 1} ∥ < \infty .$
Step 1. Compute:

$\{\begin{matrix} \begin{matrix} y_{n} & = x_{n} + θ_{n} (x_{n} - x_{n - 1}), \\ z_{n} & = y_{n} + η_{n} (x_{n} - y_{n}), \\ x_{n + 1} & = J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} \end{matrix} \end{matrix}$
Step 2. Set $n = n + 1$ and go to Step 1.

Moreover, several authors have developed the algorithms by using multi-inertial forward–backward schemes for solving variational inclusion problems, which ensure convergence and demonstrate efficiency in practical applications such as image deblurring (see [,]). These developments illustrate the increasing interest in designing inertial-type iterative methods for monotone inclusion and convex optimization problems.

Building upon and inspired by the above-mentioned studies, in this work, we aim to propose a new accelerated fixed-point algorithm designed to address variational inclusion and convex bilevel optimization problems, and prove its strong convergence behavior, including comparison its effectiveness in data classification with the existing algorithms.

The structure of this paper is as follows. In Section 2, we introduce some fundamental definitions and key lemmas used in the later sections. The main theoretical contributions of our study are presented in Section 3. In Section 4, we apply the proposed algorithm to data classification problems using breast cancer and hypertension datasets, and compare its performance with other existing methods. Finally, the conclusion of our work is given in Section 5.

2. Preliminaries

Throughout this work, let

H

be a real Hilbert space and

T : H \to H

a mapping. We use

x_{n} \to x

for strong convergence and

x_{n} ⇀ x

for weak convergence.

Definition 1.

A mapping

T : H \to H

is called Lipschitzian if

∥ T x - T y ∥ \leq L ∥ x - y ∥, \forall x, y \in H,

for some

L \geq 0

. If

L \in [0, 1)

, then T is a k-contraction, and if

L = 1

, then T is nonexpansive.

Definition 2.

Let

⌀ \neq D \subset H,

and let

β \in R_{+ +}

. Then, T is β-cocoercive if

β T

is firmly nonexpansive, i.e.,

(\forall x \in D) (\forall y \in D) ⟨ x - y, T x - T y ⟩ \geq {β ∥ T x - T y ∥}^{2} .

Definition 3.

Let

x \in H

and

⌀ \neq C \subset H

be closed and convex. Then, there is a unique point

x^{*} \in C

, such that

∥ x^{*} - x ∥ \leq ∥ y - x ∥, \forall y \in C .

The mapping

P_{C} : H \to C

, defined by

P_{C} x = x^{*}

, is called the metric projection onto C.

We conclude this section with several auxiliary lemmas and propositions essential for supporting the main results.

Lemma 1

([]). Let

x, y \in H

and

λ \in [0, 1]

. Then, the following identities and inequality hold:

(1): ${∥ x \pm y ∥}^{2} = {∥ x ∥}^{2} \pm 2 ⟨ x, y ⟩ + {∥ y ∥}^{2};$
(2): ${∥ x + y ∥}^{2} \leq {∥ x ∥}^{2} + 2 ⟨ y, x + y ⟩;$
(3): ${∥ λ x + (1 - λ) y ∥}^{2} = {λ ∥ x ∥}^{2} + {(1 - λ) ∥ y ∥}^{2} - λ (1 - λ) {∥ x - y ∥}^{2} .$

Proposition 1.

Let

⌀ \neq C \subset H

be convex and let

x \in H, x^{*} \in C

. Then,

x^{*} = P_{C} x \Leftrightarrow ⟨ x - x^{*}, y - x^{*} ⟩ \leq 0, \forall y \in C .

Proposition 2

([]). Suppose

ϕ : H \to R

is strongly convex with parameter

σ > 0

and continuously differentiable, such that

\nabla ϕ

is Lipschitz continuous with constant

L_{ϕ}

. Then, the mapping

I - s \nabla ϕ

is k-contraction for all

0 \leq s \leq \frac{2}{L_{ϕ} + σ}

, where

k = \sqrt{1 - \frac{2 σ s L_{ϕ}}{σ + L_{ϕ}}}

and I is the identity mapping.

Lemma 2

([]). Let

T : H \to H

be a nonexpansive mapping with

Fix (T) \neq ⌀

. If there exists a sequence

{x_{n}} \subset H

such that

x_{n} ⇀ x \in H

and

∥ x_{n} - T x_{n} ∥ \to 0

, then

x \in F i x (T)

.

Lemma 3

([]). Let

A : H \to H

be a β-cocoercive mapping and

B : H \to 2^{H}

a maximal monotone mapping. Then, we have

(1): $Fix (J_{γ}^{B} (I - γ A)) = {(A + B)}^{- 1} (0), \forall γ > 0$
(2): $x \in H, ∥ x - J_{γ}^{B} (I - γ A) x ∥ \leq 2 ∥ x - J_{\bar{γ}}^{B} (I - \bar{γ} A) x ∥, \forall x \in H, 0 < γ \leq \bar{γ}$ .

Lemma 4

([]). Let

{a_{n}} \subset R_{+}, {b_{n}} \subset R,

and

{ξ_{n}} \subset (0, 1)

satisfy

\sum_{n = 1}^{\infty} ξ_{n} = \infty

and

a_{n + 1} \leq (1 - ξ_{n}) a_{n} + ξ_{n} b_{n}, \forall n \in N .

If for every subsequence

{a_{n_{i}}}

of

{a_{n}}

, such that

{lim inf}_{i \to \infty} (a_{n_{i} + 1} - a_{n_{i}}) \geq 0

, we have

{lim sup}_{i \to \infty} b_{n_{i}} \leq 0

, then

{lim}_{n \to \infty} a_{n} = 0

.

3. Main Results

In this section, we introduce the Double Inertial Viscosity Forward–Backward Algorithm (DIVFBA) which is a modification of Algorithm 6, introduced by Peeyada [], in order to accelerate its convergence by using a double inertial step at the first step and the viscosity approximation method at the final step. It is worth to mentioning that Algorithm 6 obtained only weak convergence, while we aim to prove strong convergence of our proposed algorithm. Moreover, we will compare the performance of our algorithm and the others in data classification in the next section.

Throughout this section, let

S : H \to H

be a k-contraction mapping with

k \in [0, 1)

, A be a

β

-cocoercive mapping on

H

, and B be a maximal monotone operator from

H

into

2^{H}

, such that

{(A + B)}^{- 1} (0) \neq ⌀

.

We are now ready to present our accelerated fixed-point algorithm (Algorithm 7).

Algorithm 7 DIVFBA

Initialization: Choose $x_{1}, x_{0}, x_{- 1} \in H, {γ_{n}} \subset (0, 2 β)$ and ${α_{n}}, {β_{n}}, {η_{n}}, {τ_{n}} \subset (0, 1)$ . Take ${ξ_{n}}, {μ_{n}} \subset (0, \infty)$ and ${ρ_{n}} \subset (- \infty, 0)$ .
Iterative steps: For $n \geq 1$ , calculate $x_{n + 1}$ as follows:
Step 1. Compute the inertial step:

$\begin{matrix} θ_{n} & = \{\begin{matrix} min {μ_{n}, \frac{ξ_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}; \\ μ_{n}, & otherwise, \end{matrix} \end{matrix}$

(6)

and

$\begin{matrix} δ_{n} & = \{\begin{matrix} max {ρ_{n}, \frac{- ξ_{n}}{∥ x_{n - 1} - x_{n - 2} ∥}}, & if x_{n - 1} \neq x_{n - 2}; \\ ρ_{n}, & otherwise, \end{matrix} \end{matrix}$

(7)
Step 2. Compute

$y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}) + δ_{n} (x_{n - 1} - x_{n - 2}),$

(8)

$z_{n} = y_{n} + η_{n} (x_{n} - y_{n}),$

(9)

$v_{n} = (1 - β_{n}) x_{n} + β_{n} J_{γ_{n}}^{B} (I - γ_{n} A) z_{n},$

(10)

$w_{n} = (1 - τ_{n}) J_{γ_{n}}^{B} (I - γ_{n} A) y_{n} + τ_{n} J_{γ_{n}}^{B} (I - γ_{n} A) v_{n},$

(11)

$x_{n + 1} = α_{n} S (x_{n}) + (1 - α_{n}) w_{n} .$

(12)

Set n := n + 1 and return to Step 1.

Theorem 1.

Let

{x_{n}}

be a sequence generated by Algorithm 7, such that the following additional conditions hold:

(i): $lim_{n \to \infty} \frac{ξ_{n}}{α_{n}} = 0$ ,
(ii): $lim_{n \to \infty} α_{n} = 0$ and $\sum_{n = 1}^{\infty} α_{n} = \infty$ ,
(iii): $0 < \underset{n \to \infty}{lim inf} τ_{n} \leq \underset{n \to \infty}{lim sup} τ_{n} < 1$ ,
(iv): $0 < \underset{n \to \infty}{lim inf} β_{n} \leq \underset{n \to \infty}{lim sup} β_{n} < 1$ ,
(v): $0 < \underset{n \to \infty}{lim inf} γ_{n} \leq \underset{n \to \infty}{lim sup} γ_{n} < 2 β$ .

Then,

x_{n} \to p \in {(A + B)}^{- 1} (0)

, such that

p = P_{{(A + B)}^{- 1} (0)} S (p)

.

Proof.

Let

p \in {(A + B)}^{- 1} (0)

. Firstly, we prove that the sequence

{x_{n}}

is bounded.

From (8), we have

\begin{matrix} ∥ y_{n} - p ∥ & = & ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) + δ_{n} (x_{n - 1} - x_{n - 2}) - p ∥ \\ \leq & ∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ . \end{matrix}

(13)

From (9) and (13), we have

\begin{matrix} ∥ z_{n} - p ∥ & = & ∥ y_{n} + η_{n} (x_{n} - y_{n}) - p ∥ \\ \leq & η_{n} ∥ x_{n} - p ∥ + (1 - η_{n}) ∥ y_{n} - p ∥ \\ \leq & η_{n} ∥ x_{n} - p ∥ + (1 - η_{n}) ∥ x_{n} - p ∥ \\ + (1 - η_{n}) [θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥] \\ \leq & ∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ . \end{matrix}

(14)

From (10) and the nonexpansiveness of

J_{γ_{n}}^{B} (I - γ_{n} A)

, we have

\begin{matrix} ∥ v_{n} - p ∥ & = & ∥ (1 - β_{n}) x_{n} + β_{n} J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} - p ∥ \\ \leq & (1 - β_{n}) ∥ x_{n} - p ∥ \\ + β_{n} ∥ J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p ∥ \\ \leq & (1 - β_{n}) ∥ x_{n} - p ∥ + β_{n} ∥ z_{n} - p ∥ \\ \leq & (1 - β_{n}) ∥ x_{n} - p ∥ + β_{n} ∥ x_{n} - p ∥ \\ + β_{n} [θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥] \\ \leq & ∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ . \end{matrix}

(15)

From (11), (13), and (15), we have

\begin{matrix} ∥ w_{n} - p ∥ & = & ∥ (1 - τ_{n}) J_{γ_{n}}^{B} (I - γ_{n} A) y_{n} + τ_{n} J_{γ_{n}}^{B} (I - γ_{n} A) v_{n} - p ∥ \\ \leq & (1 - τ_{n}) ∥ J_{γ_{n}}^{B} (I - γ_{n} A) y_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p ∥ \\ + τ_{n} ∥ J_{γ_{n}}^{B} (I - γ_{n} A) v_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p ∥ \\ \leq & (1 - τ_{n}) ∥ y_{n} - p ∥ + τ_{n} ∥ v_{n} - p ∥ \\ \leq & ∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ . \end{matrix}

(16)

From (12), we obtain

\begin{matrix} ∥ x_{n} - p ∥ & = & ∥ α_{n} S (x_{n}) + (1 - α_{n}) w_{n} - p ∥ \\ \leq & α_{n} ∥ S (x_{n}) - p ∥ + (1 - α_{n}) ∥ w_{n} - p ∥ \\ \leq & α_{n} ∥ S (x_{n}) - S (p) ∥ + α_{n} ∥ S (p) - p ∥ + (1 - α_{n}) ∥ w_{n} - p ∥ \\ \leq & α_{n} k ∥ x_{n} - p ∥ + α_{n} ∥ S (p) - p ∥ + (1 - α_{n}) ∥ w_{n} - p ∥ \\ \leq & α_{n} k ∥ x_{n} - p ∥ + α_{n} ∥ S (p) - p ∥ \\ + (1 - α_{n}) [∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ \leq & (1 - α_{n} (1 - k)) ∥ x_{n} - p ∥ + α_{n} ∥ S (p) - p ∥ \\ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ = & (1 - α_{n} (1 - k)) ∥ x_{n} - p ∥ \\ + α_{n} (1 - k) [\frac{1}{1 - k} (\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + \frac{| δ_{n} |}{α_{n}} \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + ∥ S (p) - p ∥)] \\ \leq & max {∥ x_{n} - p ∥, \frac{1}{1 - k} (\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ + \frac{| δ_{n} |}{α_{n}} \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + ∥ S (p) - p ∥)} . \end{matrix}

(17)

By (6) and the condition (i), we have

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0 as n \to \infty .

Hence, there is

M_{1} > 0

, such that

\frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ < M_{1}, \forall n \in N .

(18)

Similarly, we also have that

\frac{| δ_{n} |}{α_{n}} \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \to 0 as n \to \infty,

and

\frac{| δ_{n} |}{α_{n}} \cdot ∥ x_{n - 1} - x_{n - 2} ∥ < M_{2}, \exists M_{2} > 0 \forall n \in N .

(19)

From (17), we obtain

\begin{matrix} ∥ x_{n + 1} - p ∥ & \leq & max {∥ x_{n} - p ∥, \frac{1}{1 - k} (M_{1} + M_{2} + ∥ S (p) - p ∥)} \\ ⋮ \\ \leq & max {∥ x_{1} - p ∥, \frac{1}{1 - k} (M_{1} + M_{2} + ∥ S (p) - p ∥)} . \end{matrix}

This implies that

{x_{n}}

is bounded and so are

{y_{n}}, {z_{n}}, {w_{n}}, {v_{n}}

, and

{S (x_{n})}

.

By (8), we have

\begin{matrix} ∥ y_{n} {- p ∥}^{2} & = & ∥ x_{n} + θ_{n} (x_{n} - x_{n - 1}) + δ_{n} (x_{n - 1} - x_{n - 2}) {- p ∥}^{2} \\ = & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ⟨ x_{n} - p, x_{n} - x_{n - 1} ⟩ \\ + 2 δ_{n} ⟨ x_{n} - p, x_{n - 1} - x_{n - 2} ⟩ \\ + ∥ θ_{n} (x_{n} - x_{n - 1}) + δ_{n} (x_{n - 1} - x_{n - 2}) ∥^{2} \\ = & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ⟨ x_{n} - p, x_{n} - x_{n - 1} ⟩ \\ + 2 δ_{n} ⟨ x_{n} - p, x_{n - 1} - x_{n - 2} ⟩ \\ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} + 2 θ_{n} δ_{n} ⟨ x_{n} - x_{n - 1}, x_{n - 1} - x_{n - 2} ⟩ \\ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} \\ \leq & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ + θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥^{2} + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} . \end{matrix}

(20)

From (9) and (20), we have

\begin{matrix} ∥ z_{n} {- p ∥}^{2} & = & ∥ y_{n} + η_{n} (x_{n} - y_{n}) {- p ∥}^{2} \\ \leq & η_{n} ∥ x_{n} {- p ∥}^{2} + (1 - η_{n}) {∥ y_{n} - p ∥}^{2} \\ \leq & η_{n} ∥ x_{n} {- p ∥}^{2} + (1 - η_{n}) [∥ x_{n} {- p ∥}^{2} \\ + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ + θ_{n}^{2} ∥ x_{n} - x_{n - 1} ∥^{2} + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ + δ_{n}^{2} ∥ x_{n - 1} - x_{n - 2} ∥^{2}] \\ \leq & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} . \end{matrix}

(21)

By the nonexpansiveness of

J_{γ_{n}}^{B} (I - γ_{n} A)

and (21), we have

\begin{matrix} ∥ v_{n} {- p ∥}^{2} & = & ∥ (1 - β_{n}) x_{n} + β_{n} J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} {- p ∥}^{2} \\ = & ∥ (1 - β_{n}) (x_{n} - p) + β_{n} (J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p) ∥^{2} \\ \leq & (1 - β_{n}) ∥ x_{n} {- p ∥}^{2} + β_{n} {∥ z_{n} - p ∥}^{2} \\ - β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ \leq & (1 - β_{n}) ∥ x_{n} {- p ∥}^{2} + β_{n} [∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} ∥ x_{n - 1} - x_{n - 2} ∥^{2}] \\ - β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ \leq & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} \\ - β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} . \end{matrix}

(22)

Using (20) and (22), we have

\begin{matrix} ∥ w_{n} {- p ∥}^{2} & = & ∥ (1 - τ_{n}) J_{γ_{n}}^{B} (I - γ_{n} A) y_{n} + τ_{n} J_{γ_{n}}^{B} (I - γ_{n} A) v_{n} {- p ∥}^{2} \\ \leq & (1 - τ_{n}) {∥ J_{γ_{n}}^{B} (I - γ_{n} A) y_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p ∥}^{2} \\ + τ_{n} {∥ J_{γ_{n}}^{B} (I - γ_{n} A) v_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) p ∥}^{2} \\ \leq & (1 - τ_{n}) ∥ y_{n} {- p ∥}^{2} + τ_{n} {∥ v_{n} - p ∥}^{2} \\ \leq & ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} \\ - τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} . \end{matrix}

(23)

It follows from Lemma 1 and (23), that

\begin{matrix} ∥ x_{n + 1} {- p ∥}^{2} & = & ∥ α_{n} S (x_{n}) + (1 - α_{n}) w_{n} {- p ∥}^{2} \\ = & ∥ α_{n} (S (x_{n}) - S (p)) + (1 - α_{n}) (w_{n} - p) + α_{n} (S (p) - p) ∥^{2} \\ \leq & ∥ α_{n} (S (x_{n}) - S (p)) + (1 - α_{n}) (w_{n} - p) ∥^{2} \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ \leq & α_{n} ∥ S (x_{n}) - S (p) ∥^{2} + (1 - α_{n}) {∥ w_{n} - p ∥}^{2} \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ \leq & α_{n} k^{2} ∥ x_{n} {- p ∥}^{2} + (1 - α_{n}) {∥ w_{n} - p ∥}^{2} \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ \leq & α_{n} k ∥ x_{n} {- p ∥}^{2} + (1 - α_{n}) [∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} ∥ x_{n - 1} - x_{n - 2} ∥^{2}] \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ - (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ = & (1 - α_{n} (1 - k)) ∥ x_{n} {- p ∥}^{2} + 2 θ_{n} ∥ x_{n} - p ∥ \cdot ∥ x_{n} - x_{n - 1} ∥ \\ + 2 | δ_{n} | \cdot ∥ x_{n} - p ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + θ_{n}^{2} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + 2 θ_{n} | δ_{n} | \cdot ∥ x_{n} - x_{n - 1} ∥ \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + δ_{n}^{2} {∥ x_{n - 1} - x_{n - 2} ∥}^{2} \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ - (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ = & (1 - α_{n} (1 - k)) {∥ x_{n} - p ∥}^{2} \\ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ [2 ∥ x_{n} - p ∥ + θ_{n} ∥ x_{n} - x_{n - 1} ∥ + 2 | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥] \\ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ [2 ∥ x_{n} - p ∥ + | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥] \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ - (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} . \end{matrix}

(24)

Since

θ_{n} ∥ x_{n} - x_{n - 1} ∥ = α_{n} \cdot \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥ \to 0 as n \to \infty,

and

| δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + α_{n} \cdot \frac{| δ_{n} |}{α_{n}} ∥ x_{n - 1} - x_{n - 2} ∥ \to 0 as n \to \infty,

there exist the positive constants

M_{3}

and

M_{4}

, such that for all

n \geq 1

,

θ_{n} ∥ x_{n} - x_{n - 1} ∥ \leq M_{3},

(25)

| δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \leq M_{4} .

(26)

We may deduce from (24) that for all

n \geq 1

,

\begin{matrix} ∥ x_{n + 1} {- p ∥}^{2} & \leq & (1 - α_{n} (1 - k)) {∥ x_{n} - p ∥}^{2} \\ + 5 M_{5} θ_{n} ∥ x_{n} - x_{n - 1} ∥^{2} + 3 M_{5} | δ_{n} | \cdot ∥ x_{n - 1} - x_{n - 2} ∥ \\ + 2 α_{n} ⟨ S (p) - p, x_{n + 1} - p ⟩ \\ - (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ = & (1 - α_{n} (1 - k)) {∥ x_{n} - p ∥}^{2} \\ + α_{n} (1 - k) [\frac{5 M_{5}}{1 - k} \cdot \frac{θ_{n}}{α_{n}} {∥ x_{n} - x_{n - 1} ∥}^{2} \\ + \frac{3 M_{5}}{1 - k} \cdot \frac{| δ_{n} |}{α_{n}} \cdot ∥ x_{n - 1} - x_{n - 2} ∥ + \frac{2}{1 - k} ⟨ S (p) - p, x_{n + 1} - p ⟩] \\ - (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2}, \end{matrix}

(27)

where

M_{5} = max {sup_{n} ∥ x_{n} - p ∥, M_{3}, M_{4}}

.

From (27), we set

a_{n} : = {∥ x_{n} - p ∥}^{2},

b_{n} : = \frac{1}{1 - k} [5 M_{5} \frac{θ_{n}}{α_{n}} ∥ x_{n} - x_{n - 1} ∥^{2} + 3 M_{5} \frac{| δ_{n} |}{α_{n}} ∥ x_{n - 1} - x_{n - 2} ∥ + 2 ⟨ S (p) - p, x_{n + 1} - p ⟩]

and

ε_{n} : = α_{n} (1 - k) .

Hence, we obtain

a_{n + 1} = (1 - ε_{n}) α_{n} + ε_{n} b_{n} .

Suppose there is a subsequence

{a_{n_{i}}}

of

{a_{n}}

, satisfying

\underset{i \to \infty}{lim inf} (a_{n_{i} + 1} - a_{n_{i}}) \geq 0 .

It follows from (27) that

(1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \leq a_{n} - a_{n + 1} + α_{n} (1 - k) M,

(28)

where

M = sup_{n} b_{n} .

By conditions (ii), (iii), and (iv), we obtain

\begin{matrix} \underset{i \to \infty}{lim sup} & (1 - α_{n}) τ_{n} β_{n} (1 - β_{n}) {∥ x_{n} - J_{γ_{n}}^{B} (I - γ_{n} A) z_{n} ∥}^{2} \\ \leq & \underset{i \to \infty}{lim sup} (a_{n_{i}} - a_{n_{i} + 1} + (1 - k) M lim_{i \to \infty} α_{n_{i}} \\ = & - \underset{i \to \infty}{lim inf} (a_{n_{i}} - a_{n_{i} + 1}) \\ \leq & 0, \end{matrix}

(29)

which implies

∥ x_{n_{i}} - J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} ∥^{2} = 0 .

(30)

From (8), we have

∥ y_{n_{i}} - x_{n_{i}} ∥ + θ_{n_{i}} ∥ x_{n_{i}} - x_{n_{i} - 1} ∥ + | δ_{n_{i}} | \cdot ∥ x_{n_{i} - 1} - x_{n_{i} - 2} ∥ \to 0,

(31)

as

i \to \infty

.

From (9) and (31), we have

∥ z_{n_{i}} - y_{n_{i}} ∥ + η_{n_{i}} ∥ x_{n_{i}} - y_{n_{i}} ∥ \to 0,

(32)

and

∥ z_{n_{i}} - x_{n_{i}} ∥ + (1 - η_{n_{i}}) ∥ y_{n_{i}} - x_{n_{i}} ∥ \to 0,

(33)

as

i \to \infty .

From (10), (30), and (32), we have

\begin{matrix} ∥ v_{n_{i}} - z_{n_{i}} ∥ & = & ∥ (1 - β_{n_{i}}) x_{n_{i}} + β_{n_{i}} J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - z_{n_{i}} ∥ \\ \leq & (1 - β_{n_{i}}) ∥ x_{n_{i}} - z_{n_{i}} ∥ + β_{n_{i}} ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - z_{n_{i}} ∥ \\ \to & 0, \end{matrix}

(34)

and

\begin{matrix} ∥ v_{n_{i}} - x_{n_{i}} ∥ & = & ∥ (1 - β_{n_{i}}) x_{n_{i}} + β_{n_{i}} J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - x_{n_{i}} ∥ \\ \leq & β_{n_{i}} ∥ x_{n_{i}} - J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} ∥ \\ \to & 0, \end{matrix}

(35)

as

i \to \infty .

Using (30), (32), (34), and (35), we obtain

\begin{matrix} ∥ w_{n_{i}} - v_{n_{i}} ∥ & = & ∥ (1 - τ_{n_{i}}) J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) y_{n_{i}} + τ_{n_{i}} J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) v_{n_{i}} - v_{n_{i}} ∥ \\ \leq & (1 - τ_{n_{i}}) ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) y_{n_{i}} - v_{n_{i}} ∥ \\ + τ_{n_{i}} ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) v_{n_{i}} - v_{n_{i}} ∥ \\ \leq & (1 - τ_{n_{i}}) ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) y_{n_{i}} - J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} ∥ \\ + (1 - τ_{n_{i}}) ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - v_{n_{i}} ∥ \\ + τ_{n_{i}} ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) v_{n_{i}} - J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} ∥ \\ + τ_{n_{i}} ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - v_{n_{i}} ∥ \\ \leq & (1 - τ_{n_{i}}) ∥ y_{n_{i}} - z_{n_{i}} ∥ + ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - v_{n_{i}} ∥ \\ + τ_{n_{i}} ∥ v_{n_{i}} - z_{n_{i}} ∥ \\ \leq & (1 - τ_{n_{i}}) ∥ y_{n_{i}} - z_{n_{i}} ∥ + ∥ J_{γ_{n_{i}}}^{B} (I - γ_{n_{i}} A) z_{n_{i}} - x_{n_{i}} ∥ \\ + ∥ x_{n_{i}} - v_{n_{i}} ∥ + τ_{n_{i}} ∥ v_{n_{i}} - z_{n_{i}} ∥ \\ \to & 0, \end{matrix}

(36)

as

i \to \infty .

Using condition (ii), (31), (32), (34), and (36), we obtain

\begin{matrix} ∥ x_{n_{i} + 1} - x_{n_{i}} ∥ & = & ∥ α_{n_{i}} S (x_{n_{i}}) + (1 - α_{n_{i}}) w_{n_{i}} - x_{n_{i}} ∥ \\ \leq & α_{n_{i}} ∥ S (x_{n_{i}}) - x_{n_{i}} ∥ + (1 - α_{n_{i}}) ∥ w_{n_{i}} - x_{n_{i}} ∥ \\ \leq & α_{n_{i}} ∥ S (x_{n_{i}}) - x_{n_{i}} ∥ + ∥ w_{n_{i}} - y_{n_{i}} ∥ + ∥ y_{n_{i}} - x_{n_{i}} ∥ \\ \leq & α_{n_{i}} ∥ S (x_{n_{i}}) - x_{n_{i}} ∥ + ∥ w_{n_{i}} - z_{n_{i}} ∥ + ∥ z_{n_{i}} - y_{n_{i}} ∥ \\ + ∥ y_{n_{i}} - x_{n_{i}} ∥ \\ \leq & α_{n_{i}} ∥ S (x_{n_{i}}) - x_{n_{i}} ∥ + ∥ w_{n_{i}} - v_{n_{i}} ∥ + ∥ v_{n_{i}} - z_{n_{i}} ∥ \\ + ∥ z_{n_{i}} - y_{n_{i}} ∥ + ∥ y_{n_{i}} - x_{n_{i}} ∥ \\ \to & 0, \end{matrix}

(37)

as

i \to \infty .

We next show that

\underset{i \to \infty}{lim sup} ⟨ S (p) - p, x_{n_{i} + 1} - p ⟩ \leq 0

.

Let

{x_{n_{i_{j}}}}

be a subsequence of

{x_{n_{i}}}

, such that

lim_{j \to \infty} ⟨ S (p) - p, x_{n_{i_{j}} + 1} - p ⟩ = \underset{i \to \infty}{lim sup} ⟨ S (p) - p, x_{n_{i} + 1} - p ⟩ .

Since

{x_{n_{i_{j}}}}

is bounded, there exists a subsequence

{x_{n_{i_{j_{k}}}}}

of

{x_{n_{i_{j}}}}

, such that

x_{n_{i_{j_{k}}}} ⇀ x \in H

. Without loss of generality, we may assume that

x_{n_{i_{j}}} ⇀ x

.

From

γ_{n_{i_{j}}} \in (0, 2 β)

, we know that the mapping

J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A)

is nonexpansive. Due to (30) and (33), the following result is obtained:

\begin{matrix} ∥ x_{n_{i_{j}}} - J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A) x_{n_{i_{j}}} ∥ & \leq & ∥ x_{n_{i_{j}}} - J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A) z_{n_{i_{j}}} ∥ \\ + ∥ J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A) z_{n_{i_{j}}} - J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A) x_{n_{i_{j}}} ∥ \\ \leq & ∥ x_{n_{i_{j}}} - J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A) z_{n_{i_{j}}} ∥ + ∥ z_{n_{i_{j}}} - x_{n_{i_{j}}} ∥ \\ \to & 0 as j \to \infty . \end{matrix}

(38)

Using Lemmas 2 and 3, we obtain

x \in F (J_{γ_{n_{i_{j}}}}^{B} (I - γ_{n_{i_{j}}} A)) = {(A + B)}^{- 1} (0) .

Since

lim_{i \to \infty} ∥ x_{n_{i} + 1} - x_{n_{i}} ∥ = 0,

we have

x_{n_{i_{j}} + 1} ⇀ x .

From

p = P_{{(A + B)}^{- 1} (0)} S (p)

, it is implied Proposition 1 that

\begin{matrix} \underset{i \to \infty}{lim sup} ⟨ S (p) - p, x_{n_{i} + 1} - p ⟩ & = & lim_{j \to \infty} ⟨ S (p) - p, x_{n_{i_{j}} + 1} - p ⟩ \\ = & ⟨ S (p) - p, x - p ⟩ \\ \leq & 0 . \end{matrix}

(39)

By Lemma 4, we can conclude that

x_{n} \to p .

□

Remark 1.

Note that Algorithm 7 is a modification of Algorithm 6 in order to accelerate its convergence by using a double inertial step at the first step and the viscosity approximation method at the final step. Moreover, our proposed algorithm has a strong convergence result while Algorithm 6 obtained only weak convergence. Furthermore, Algorithm 7 can be applied to solve convex bilevel optimization problems, as seen in Theorem 2, while Algorithm 6 cannot be used to solve such problems.

Next, we employ the Bilevel Double Inertial Forward–Backward Algorithm (BDIFBA, Algorithm 8) to solve the convex bilevel optimization Problem (1), by replacing A and B in Algorithm 7 with

\nabla f

and

\partial g

, respectively.

Algorithm 8 BDIFBA

Initialization: Choose $x_{1}, x_{0}, x_{- 1} \in H, {γ_{n}} \subset (0, 2 β)$ and ${α_{n}}, {β_{n}}, {η_{n}}, {τ_{n}} \subset (0, 1)$ . Take ${ξ_{n}}, {μ_{n}} \subset (0, \infty)$ and ${ρ_{n}} \subset (- \infty, 0)$ .
Iterative steps: For $n \geq 1$ , calculate $x_{n + 1}$ as follows:
Step 1. Compute the inertial step:

$\begin{matrix} θ_{n} & = \{\begin{matrix} min {μ_{n}, \frac{ξ_{n}}{∥ x_{n} - x_{n - 1} ∥}}, & if x_{n} \neq x_{n - 1}; \\ μ_{n}, & otherwise, \end{matrix} \end{matrix}$

(40)

and

$\begin{matrix} δ_{n} & = \{\begin{matrix} max {ρ_{n}, \frac{- ξ_{n}}{∥ x_{n - 1} - x_{n - 2} ∥}}, & if x_{n - 1} \neq x_{n - 2}; \\ ρ_{n}, & otherwise, \end{matrix} \end{matrix}$

(41)
Step 2. Compute

$y_{n} = x_{n} + θ_{n} (x_{n} - x_{n - 1}) + δ_{n} (x_{n - 1} - x_{n - 2}),$

(42)

$z_{n} = y_{n} + η_{n} (x_{n} - y_{n}),$

(43)

$v_{n} = (1 - β_{n}) x_{n} + β_{n} {prox}_{γ g} (I - γ_{n} f) z_{n},$

(44)

$w_{n} = (1 - τ_{n}) {prox}_{γ_{n} g} (I - γ_{n} f) y_{n} + τ_{n} {prox}_{γ_{n} g} (I - γ_{n} f) v_{n},$

(45)

$x_{n + 1} = α_{n} (I - s \nabla ϕ) (x_{n}) + (1 - α_{n}) w_{n} .$

(46)

Set n := n + 1 and return to Step 1.

The following result is obtained directly by Theorem 1.

Theorem 2.

Let

{x_{n}}

be a sequence generated by Algorithm 8 with the same condition as in Theorem 1. Then,

x_{n} \to p \in Γ

where

p = P_{Γ} S (p)

.

Proof.

Set

S = I - s \nabla ϕ

in Theorem 1. From Proposition 2, we know that

I - s \nabla ϕ

is a contraction. By Theorem 1, we obtain that

x_{n} \to p \in Γ,

where

p = P_{Γ} S (p)

. From Proposition 1, it can be obtained that for any

x \in Ω

,

\begin{matrix} 0 & \geq & ⟨ S (p) - p, x - p) ⟩ \\ = & ⟨ (p - s \nabla ϕ (p)) - p, x - p ⟩ \\ = & - s ⟨ \nabla ϕ (p), x - p ⟩ \end{matrix}

(47)

hence

⟨ \nabla ϕ (p), x - p ⟩ \geq 0

, that is,

p \in Ω .

□

4. Application

In this section, we apply our proposed algorithm to improve the training of deep learning models by reformulating their training tasks as structured convex optimization problems. Our approach is based on fixed-point theory, which provides strong theoretical guarantees for convergence and solution reliability. This makes the training process more stable, efficient, and robust, especially in the presence of noise or ill-conditioned data.

We focus on a class of models called Extreme Learning Machines (ELM) and their deeper extensions, Two-Hidden-Layer ELM (TELM). These models are known for their fast training and competitive accuracy. Unlike traditional neural networks, ELMs randomly assign hidden layer weights and only compute output weights, typically by solving a least-squares problem.

However, when the hidden layer output matrix is ill-conditioned or the data is noisy, direct pseudoinverse computations become unstable and prone to overfitting. To address this, we reformulate the training process as a convex minimization problem with regularization. This structure naturally fits into the framework of fixed-point problems, allowing us to apply our algorithm without relying on explicit matrix inversion.

4.1. Application to ELM

ELM is a neural network model initially proposed by Huang et al. []. ELM is well-known for its rapid training capability and strong generalization performance. By integrating our algorithm into the ELM framework, we aim to boost both optimization efficiency and predictive accuracy.

Let us define the training dataset as

(x_{i}, t_{i}) \in R^{n} \times R^{m} : i = 1, 2, \dots, s

, consisting of s input–target pairs, where

x_{i}

denotes the input vector and

t_{i}

denotes the associated target output.

ELM is designed for Single-Layer Feedforward Networks (SLFNs) and operates based on the following functional form:

o_{i} = \sum_{j = 1}^{h} η_{j} G (⟨ ω_{j}, x_{i} ⟩ + b_{j}), i = 1, \dots, s,

where

o_{i}

is the predicted output, h denotes the number of hidden neurons,

G (\cdot)

is the activation function,

ω_{j}

and

η_{j}

are weight vectors for input and output connections of the j-th hidden node, and

b_{j}

is the corresponding bias term.

Let the hidden layer output matrix

H \in R^{s \times h}

be defined as

H_{i j} = G (⟨ ω_{j}, x_{i} ⟩ + b_{j}) i = 1, \dots, s, j = 1, \dots, h .

The training objective is to find a solution that best approximates the output target:

t_{i} = \sum_{j = 1}^{h} η_{j} G (⟨ ω_{j}, x_{i} ⟩ + b_{j}), i = 1, 2, \dots, s .

which can be compactly written in matrix form as

H u = T,

(48)

where

u = {[η_{1}^{T}, \dots, η_{h}^{T}]}^{T}

is the output weight vector and

T = {[t_{1}^{T}, \dots, t_{s}^{T}]}^{T}

is the desired output matrix.

To enhance generalization and reduce overfitting, a LASSO regularization term is introduced. The resulting optimization problem becomes

min_{u} {∥ H u - T ∥}_{2}^{2} + λ {∥ u ∥}_{1},

(49)

where

{∥ \cdot ∥}_{1}

denotes the

l_{1}

-norm and

λ > 0

is a regularization coefficient that controls sparsity.

4.2. Application to TELM

TELM is an extension of the traditional ELM that improves learning capacity by incorporating two hidden layers. Unlike conventional backpropagation-based multi-layer networks, TELM retains the fast training characteristics of ELM by leveraging analytic solutions in both stages. It is particularly suitable for modeling complex nonlinear relationships in high-dimensional data while avoiding the computational cost of iterative optimization.

A work by Janngam et al. [] demonstrated that TELM, when trained using their proposed algorithm, not only converges significantly faster than standard ELM but also achieves higher classification accuracy on various medical and benchmark datasets. Additionally, earlier work by Qu et al. [] showed that TELM consistently outperforms traditional ELM, especially in nonlinear and high-dimensional settings, by yielding better average accuracy with fewer hidden neurons.

These cumulative findings reinforce the choice of TELM as the core learning model for our study, particularly when enhanced with the proposed algorithm.

Let the training set be defined as

{(x_{i}, t_{i}) \in R^{n} \times R^{m} : i = 1, 2, \dots, s}

, where

x_{i}

is the input vector and

t_{i}

is the corresponding target output.

: Stage 1: Initial Feature Transformation and Output Weights.

To simplify the initialization process, TELM begins by temporarily combining the two hidden layers into a single equivalent hidden layer. The combinated hidden layer matrix

H

is defined as

H = G (X W + B),

(50)

where

X \in R^{s \times n}

is the input matrix,

W \in R^{n \times h}

is the randomly initialized weight matrix for the first hidden layer,

B \in R^{s \times h}

is the bias matrix, and

G (\cdot)

is the activation function.

The output weights

u \in R^{h \times m}

connecting the hidden layer to the output layer are determined based on the linear system:

H u = T,

(51)

where

T \in R^{s \times m}

is the target matrix.

We find the optimal weight u using Algorithms 7 and 8 for solving the convex optimization problem with LASSO regularization as follows:

min_{u} {∥ H u - T ∥}_{2}^{2} + λ {∥ u ∥}_{1},

(52)

where

λ > 0

is the regularization parameter that controls model complexity and prevents overfitting.

: Stage 2: Separation and Refinement of Hidden Layers.

After computing the initial output weights u from the first stage using (52), the two hidden layers are separated to allow independent refinement.

To estimate the expected output of the second hidden layer, denoted as

H_{1} \in R^{s \times h}

, we express that it satisfies the following equation:

H_{1} u = T .

(53)

However, rather than computing

H_{1}

directly from matrix inversion, we apply our proposed algorithm to solve the following convex optimization problem with LASSO regularization:

min_{H_{1}} ∥ H_{1} {u - T ∥}_{2}^{2} + λ {∥ H_{1} ∥}_{1},

(54)

where

λ > 0

is the regularization parameter.

Next, TELM updates the weights and bias between the first and second hidden layers, denoted as

W_{1} \in R^{h \times h}

and

B_{1} \in R^{s \times h}

, respectively, using the expected output

H_{1}

from (57). Ideally, the following equation describes the connection between layers:

H_{1} = G (H W_{1} + B_{1}) .

(55)

However, since both

W_{1}

and

B_{1}

are unknown, solving (55) directly is not feasible. To address this, we reformulate the equation as

H_{1} = G (H_{E} W_{HE}),

(56)

where

H_{E} = [1 H] \in R^{s \times (h + 1)}

is the extended input matrix and

W_{HE} = {[B_{1} W_{1}]}^{T} \in R^{(h + 1) \times h}

combines the weights and biases into a single matrix.

To estimate

W_{HE}

, we solve the following convex optimization problem with LASSO regularization:

min_{W_{HE}} ∥ H_{E} W_{HE} - G^{- 1} (H_{1}) ∥_{2}^{2} + λ {∥ W_{HE} ∥}_{1},

(57)

where

G^{- 1} (\cdot)

denotes the inverse of the activation function G, and

λ > 0

is the regularization parameter.

Finally, using the estimated

W_{HE}

from (57), the refined output of the second hidden layer

H_{2}

is computed as

H_{2} = G (H_{E} W_{HE}),

(58)

where

H_{2} \in R^{s \times h}

represents the updated hidden layer of the second hidden layer after adjusting the weights and biases.

: Final Stage: Output Layer Update.

Finally, TELM updates the output weight matrix

u_{new} \in R^{h \times m}

, which connects the second hidden layer to the output layer, by solving

H_{2} u_{new} = T .

(59)

To obtain

u_{new}

, we solve the following convex optimization problem using the LASSO technique:

min_{u_{new}} ∥ H_{2} u_{new} {- T ∥}_{2}^{2} + λ {∥ u_{new} ∥}_{1},

(60)

where

λ > 0

is the regularization parameter. Once

u_{new}

is obtained, the predicted output matrix

Y \in R^{s \times m}

is computed as

Y = H_{2} u_{new} .

(61)

This approach enhances numerical stability and improves the model ability to handle high-dimensional or noisy real-world data.

4.2.1. Experiments: Data Classification for Minimization Problems

Data classification is a fundamental task in machine learning, where the objective is to assign each input sample to one of several predefined categories. Common applications include medical diagnosis, object recognition, and fraud detection. In this work, we apply our proposed algorithm to train TELM for practical classification tasks.

To evaluate classification performance, we conducted experiments on three benchmark datasets and one real-world medical dataset. Each dataset was divided into 70% training and 30% testing sets. The details of the datasets are summarized in Table 1.

Table 1. Summary of datasets used in the experiments.

Breast Cancer Dataset: A widely used dataset containing features extracted from digitized images of breast masses, used to classify tumors as benign or malignant.
Heart Disease Dataset: A standard dataset used to predict the presence of heart disease based on clinical attributes.
Diabetes Dataset: Contains diagnostic data for predicting the onset of diabetes in patients.
Hypertension Dataset: A real-world dataset collected by Sripat Medical Center, Faculty of Medicine, Chiang Mai University.

Table 2 summarizes the parameter settings for each algorithm compared in our experiments.

Table 2. Parameter settings for each algorithm.

In addition, the following settings were consistently applied across all experimental setups:

Regularization parameter: $λ = 10^{- 5}$ .
Activation function: Sigmoid, $g (x) = \frac{1}{1 + e^{- x}}$ .
Number of hidden nodes: $h = 30$ .
Contraction mapping: $S = \frac{1}{3} x + 1$ .
In Algorithm 6, $θ_{n}$ is defined by

$\begin{matrix} θ_{n} & = \{\begin{matrix} \frac{\bar{θ_{n}}}{n^{2} ∥ x_{n} - x_{n - 1} ∥} & if x_{n} \neq x_{n - 1} \\ \bar{θ_{n}}, & otherwise . \end{matrix} \end{matrix}$

To assess and compare the classification performance of each algorithm, we employed four widely used evaluation metrics: accuracy, precision, recall, and F1-score.

Accuracy measures the proportion of correctly classified samples, both positive and negative, relative to the total number of samples. It is computed as

Accuracy (acc) = \frac{T P + T N}{T P + T N + F P + F N} \times 100,

(62)

where

T P

and

T N

are the true positives and true negatives, respectively;

F P

is the number of false positives (incorrectly predicting a patient as diseased); and

F N

is the number of false negatives (failing to detect a diseased patient).

Precision reflects the proportion of true positives among all instances predicted as positive:

Precision (pre) = \frac{T P}{T P + F P} .

(63)

Recall, or sensitivity, represents the proportion of actual positive cases that are correctly identified:

Recall (rec) = \frac{T P}{T P + F N} .

(64)

F1-score is the harmonic mean of precision and recall, providing a balanced measure of model performance, particularly in imbalanced datasets:

F 1 -score (F 1) = \frac{2 \times p r e \times r e c}{p r e + r e c} .

(65)

The performance of each algorithm is analyzed at the 1000th iteration, as presented in Table 3. Four datasets, breast cancer, heart disease, diabetes, and hypertension, were utilized to evaluate and compare the effectiveness of Algorithms 6 and 7 using standard classification metrics including accuracy, precision, recall, and F1-score on both training and testing data.

Table 3. Performance comparison between Algorithms 6 and 7 on each dataset.

The results indicate that Algorithm 7 consistently performs well across all datasets. In particular, in the hypertension dataset, which reflects real-world conditions, Algorithm 7 achieves high accuracy and balanced precision–recall performance. This demonstrates its strong generalization capability and suitability for real-world medical applications that require reliable predictions and low error sensitivity.

To evaluate model performance with respect to both goodness-of-fit and model complexity, we utilize the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC). These criteria are defined as follows:

Akaike Information Criterion (AIC):

$AIC = 2 k - 2 ln (\hat{L})$

(66)

where k is the number of estimated parameters in the model and $\hat{L}$ is the maximum value of the likelihood function.
Bayesian Information Criterion (BIC):

$BIC = k ln (n) - 2 ln (\hat{L})$

(67)

where n is the number of observations, k is the number of parameters, and $\hat{L}$ is the maximum likelihood of the model.

Lower AIC and BIC values indicate better models in terms of balancing accuracy and simplicity.

To assess the consistency of the performance of model across multiple trials or datasets, we compute the mean and standard deviation (std) of AIC and BIC values.

Mean AIC and BIC:

$Mean AIC = \frac{1}{m} \sum_{i = 1}^{m} {AIC}_{i}, Mean BIC = \frac{1}{m} \sum_{i = 1}^{m} {BIC}_{i}$

(68)
Standard Deviation of AIC and BIC:

$Std AIC = \sqrt{\frac{1}{m - 1} \sum_{i = 1}^{m} {({AIC}_{i} - Mean AIC)}^{2}}$

(69)

$Std BIC = \sqrt{\frac{1}{m - 1} \sum_{i = 1}^{m} {({BIC}_{i} - Mean BIC)}^{2}}$

(70)

These statistics indicate the central tendency and dispersion of the AIC and BIC scores, where smaller standard deviations imply more stable model performance across different experiments.

To understand how well each algorithm fits the data without too much complexity, we compare their AIC and BIC values, as shown in Table 4. Both AIC and BIC are commonly used to measure how good a model is; lower values mean that the model is more efficient and avoids overfitting.

Table 4. Statistical comparison of Algorithms 6 and 7 based on AIC and BIC values.

The results show that Algorithm 7 gives lower AIC and BIC values than Algorithm 6 for all datasets. This means that Algorithm 7 is simpler and better at handling the data. The difference is most noticeable in the hypertension dataset, which comes from real-world health data. These results confirm that Algorithm 7 is a strong choice for real-world applications, where the model needs to be both accurate and not too complicated.

4.2.2. Application to Convex Bilevel Optimization Problems

The TELM model can also be formulated within the framework of convex bilevel optimization to better capture hierarchical learning structures. In this setting, we interpret the output weight learning (final step of TELM) as the solution to a lower-level convex problem, and the optimization of the hidden transformation weights (e.g.,

W_{H E}

) as the upper-level objective.

In our TELM-based learning problem, this bilevel formulation arises naturally:

The inner problem corresponds to learning the output weights u given the fixed transformation $W_{H E}$ , and can be cast as a LASSO-type convex minimization:

$f (u) = ∥ H_{2} {u - T ∥}_{2}^{2}, g (u) = λ_{1} {∥ u ∥}_{1},$

(71)

where $H_{2}$ is the second hidden layer output and $T$ is the target.
The outer problem focuses on optimizing the hidden transformation weights $W_{H E}$ based on the optimal solution $u^{*} (W_{H E})$ from the inner problem. The upper-level loss is given by

$ϕ (W_{H E}) = \frac{1}{2} {∥ W_{H E} ∥}_{2}^{2} .$

(72)

Solving this bilevel problem directly is challenging due to the implicit constraint

Γ

. However, by leveraging our proposed algorithm and proximal operator techniques, we can solve both levels efficiently and with guaranteed convergence under mild assumptions. This makes TELM highly suitable for structured learning tasks where the learning objectives are nested and interdependent.

To assess the performance of Algorithm 8 in solving convex bilevel optimization problems, we conducted experiments on the same datasets used in the convex optimization setting (see Section 4.2.1). These include the breast cancer, heart disease, diabetes, and hypertension datasets, with a 70%/30% split for training and testing, respectively.

We evaluated classification performance using the same metrics—accuracy, precision, recall, and F1-score—to ensure consistency across experiments.

In this bilevel setting, we compared our method against Algorithm 1 (BiGSAM), Algorithm 2 (iBiGSAM), Algorithm 3 (aiBiGSAM), Algorithm 4 (miBiGSAM), and Algorithm 5 (amiBiGSAM).

All algorithms were configured according to the parameter settings summarized in Table 5, ensuring fair and reproducible evaluation across all methods.

Table 5. The setting of parameters for each algorithms.

In addition, the following settings were consistently applied across all experimental setups:

Regularization parameter: $λ = 10^{- 5}$ .
Activation function: Sigmoid, $g (x) = \frac{1}{1 + e^{- x}}$ .
Number of hidden nodes: $h = 30$ .

To evaluate the effectiveness of the proposed algorithm (Algorithm 8), we conducted experiments on four datasets. Each algorithm was trained for 1000 iterations, and the performance was measured in terms of accuracy, precision, recall, and F1-score for both training and testing phases. The comparative results of all algortithms are summarized in Table 6.

Table 6. Performance comparison of algorithms on each dataset.

As shown in Table 6, the proposed algorithm (Algorithm 8) consistently outperforms other methods across all datasets in both training and testing phases. In particular, for the breast cancer and diabetes datasets, Algorithm 8 achieves the highest test accuracy and F1-scores, demonstrating its strong generalization capability and classification performance.

Notably, in the hypertension dataset, which represents real-world medical data with high variability and complexity, the proposed method maintains superior accuracy and F1-score compared to baseline algorithms. This highlights the robustness and practical applicability of Algorithm 8 in real-world clinical settings.

Overall, the results support the effectiveness and stability of the proposed algorithm, making it a promising approach for medical classification tasks across diverse domains.

To statistically evaluate the performance of each algorithm, we computed the AIC and BIC values, including their mean and standard deviation, for both the training and testing phases. The experiments were conducted on four datasets: breast cancer, heart disease, diabetes, and hypertension. The summarized results presented in Table 7 serve to compare the statistical efficiency of each algorithm.

Table 7. Comparison of AIC and BIC scores (mean and standard deviation) for all algorithms.

According to the results in Table 7, the proposed algorithm (Algorithm 8) consistently shows lower AIC and BIC values across several datasets. This means that the model fits well and is less likely to overfit the data. In particular, in the hypertension dataset, which contains real and complex medical data, Algorithm 8 achieves the lowest and most consistent scores. This shows that the algorithm can handle real-world situations effectively and gives reliable results.

From Table 6 and Table 7, it is evident that Algorithm 8 consistently outperforms all variants of BiG-SAM, including the improved versions (Algorithms 2–5).

In all datasets considered in this work (see Table 6 and Table 7), Algorithm 8 achieves the highest classification performance and also yields the lowest AIC and BIC scores, suggesting a better model fit with lower complexity. Moreover, its standard deviations are relatively small, indicating robustness and stability across different runs. Therefore, Algorithm 8 can be considered the most effective and reliable algorithm among those evaluated.

5. Conclusions

We proposed the Double Inertial Viscosity Forward–Backward Algorithm (DIVFBA) and the Bilevel Double Inertial Forward–Backward Algorithm (BDIFBA), which is the modification of Algorithm 6, to solve variational inclusion and bilevel optimization problems, respectively. The proposed algorithms ensure strong convergence and achieve higher accuracy and stability compared to existing algorithms. We applied the proposed algorithms to train TELM models, where they consistently outperformed other existing algorithms in terms of evaluation metrics, and statistic values. These results confirm the effectiveness of the proposed algorithms in practical applications.

Author Contributions

Software, E.P.; writing—original draft, P.S.-j.; writing—review and editing, S.S. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the Fundamental Fund 2025, Chiang Mai University.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

This research was partially supported by Chiang Mai University and the Fundamental Fund 2025, Chiang Mai University. The first author would like to thank the CMU Presidential Scholarship for the financial support.

Conflicts of Interest

The authors declare no conflicts of interest and the funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Sabach, S.; Shtern, S. A first order method for solving convex bilevel optimization problems. SIAM J. Optim. 2017, 27, 640–660. [Google Scholar] [CrossRef]
Polyak, B.T. Some methods of speeding up the convergence of iteration methods. USSR Comput. Math. Math. Phys. 1964, 4, 1–17. [Google Scholar] [CrossRef]
Shehu, Y.; Vuong, P.T.; Zemkoho, A. An inertial extrapolation method for convex simple bilevel optimization. Optim. Methods Softw. 2021, 36, 1–19. [Google Scholar] [CrossRef]
Duan, P.; Zhang, Y. Alternated and multi-step inertial approximation methods for solving convex bilevel optimization problems. Optimization 2023, 72, 2517–2545. [Google Scholar] [CrossRef]
Wattanataweekul, R.; Janngam, K.; Suantai, S. A novel two-step inertial viscosity algorithm for bilevel optimization problems applied to image recovery. Mathematics 2023, 11, 3518. [Google Scholar] [CrossRef]
Sae-jia, P.; Suantai, S. A new two-step inertial algorithm for solving convex bilevel optimization problems with application in data classification problems. AIMS Math. 2024, 9, 8476–8496. [Google Scholar] [CrossRef]
Sae-jia, P.; Suantai, S. A novel accelerated fixed point algorithm for convex bilevel optimization problems with applications to machine learning for data classification. J. Nonlinear Funct. Anal. 2025, 2025, 1–21. [Google Scholar] [CrossRef]
Moudafi, A.; Oliny, M. Convergence of a splitting inertial proximal method for monotone operators. J. Comput. Appl. Math. 2003, 155, 447–454. [Google Scholar] [CrossRef]
Peeyada, P.; Suparatulatorn, R.; Cholamjiak, W. An inertial Mann forward-backward splitting algorithm of variational inclusion problems and its applications. Chaos Solitons Fractals 2022, 158, 112048. [Google Scholar] [CrossRef]
Kesornprom, S.; Peeyada, P.; Cholamjiak, W.; Ngamkhum, T.; Jun-on, N. New iterative method with inertial technique for split variational inclusion problem to classify tpack level of pre-service mathematics teachers. Thai J. Math. 2023, 21, 351–365. [Google Scholar]
Inkrong, P.; Cholamjiak, P. Multi-Inertial Forward-Backward Methods for Solving Variational Inclusion Problems and Applications in Image Deblurring. Thai J. Math. 2025, 23, 263–277. [Google Scholar]
Takahashi, W. Introduction to Nonlinear and Convex Analysis; Yokohama Publishers: Yokohama, Japan, 2009. [Google Scholar]
Hanjing, A.; Thongpaen, P.; Suantai, S. A new accelerated algorithm with a linesearch technique for convex bilevel optimization problems with applications. AIMS Math. 2024, 9, 22366–22392. [Google Scholar] [CrossRef]
Goebel, K.; Kirk, W.A. Topics in Metric Fixed Point Theory; No. 28; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar]
López, G.; Martín-Márquez, V.; Wang, F.; Xu, H.K. Forward-backward splitting methods for accretive operators in Banach spaces. Abstr. Appl. Anal. 2012, 2012, 109236. [Google Scholar] [CrossRef]
Saejung, S.; Yotkaew, P. Approximation of zeros of inverse strongly monotone operators in Banach spaces. Nonlinear Anal. Theory Methods Appl. 2012, 75, 742–750. [Google Scholar] [CrossRef]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Janngam, K.; Suantai, S.; Wattanataweekul, R. A novel fixed-point based two-step inertial algorithm for convex minimization in deep learning data classification. AIMS Math. 2025, 10, 6209–6232. [Google Scholar] [CrossRef]
Qu, B.Y.; Lang, B.F.; Liang, J.J.; Qin, A.K.; Crisalle, O.D. Two-hidden-layer extreme learning machine for regression and classification. Neurocomputing 2016, 175, 826–834. [Google Scholar] [CrossRef]

Table 1. Summary of datasets used in the experiments.

Dataset	Total Samples	Training	Testing	Features	Classes
Breast Cancer	683	478	205	9	2
Heart Disease	303	212	91	13	2
Diabetes	436	305	131	8	2
Hypertension	6108	4275	1833	9	2

Table 2. Parameter settings for each algorithm.

Parameter	Algorithm 7	Algorithm 6
$μ_{n}$	$1 - \frac{1}{n + 1}$	-
$ρ_{n}$	$- \frac{1}{n}$	-
$ξ_{n}$	$\frac{33 \times 10^{20}}{n^{2}}$	-
$β_{n}$	$\frac{n}{2 n + 1}$	-
$τ_{n}$	$0.5 + \frac{1}{2 n}$	-
$α_{n}$	$\frac{n}{2 n + 1}$	-
$η_{n}$	$\frac{1}{n + 2}$	$\frac{1}{n + 2}$
$γ_{n}$	$\frac{1}{L_{f}}$	$\frac{1}{L_{f}}$
$\bar{θ_{n}}$	-	$\frac{1}{∥ x_{n} - x_{n - 1} ∥^{5} + n^{5}}$

Table 3. Performance comparison between Algorithms 6 and 7 on each dataset.

Datasets	Algorithm	Accuracy (%)		Precision		Recall		F1
Datasets	Algorithm	Train	Test	Train	Test	Train	Test	Train	Test
Breast Cancer	Algorithm 7	97.2995	96.7775	0.9576	0.9552	0.9656	0.9540	0.9616	0.9539
Breast Cancer	Algorithm 6	96.8277	96.9246	0.9414	0.9464	0.9698	0.9705	0.9553	0.9570
Heart Disease	Algorithm 7	84.4153	81.4946	0.8363	0.8141	0.8875	0.8717	0.8611	0.8376
Heart Disease	Algorithm 6	77.4129	77.1828	0.7980	0.7941	0.7886	0.8007	0.7925	0.7931
Diabetes	Algorithm 7	92.2782	91.7495	0.9320	0.9279	0.9340	0.9280	0.9330	0.9266
Diabetes	Algorithm 6	87.6405	87.6321	0.9534	0.9602	0.8256	0.8208	0.8848	0.8811
Hypertension	Algorithm 7	89.0435	88.9163	0.8509	0.8501	0.9241	0.9221	0.8860	0.8846
Hypertension	Algorithm 6	87.8447	87.4593	0.8259	0.8225	0.9328	0.9282	0.8761	0.8721

Table 4. Statistical comparison of Algorithms 6 and 7 based on AIC and BIC values.

Datasets	Algorithm	Train				Test
Datasets	Algorithm	AIC Mean	AIC Std	BIC Mean	BIC Std	AIC Mean	AIC Std	BIC Mean	BIC Std
Breast Cancer	Algorithm 7	545.90	57.4635	1872.24	57.4886	591.21	46.2082	1258.38	46.0285
Breast Cancer	Algorithm 6	1848.79	31.1606	3175.13	31.2089	737.36	5.8845	1404.53	5.8996
Heart Disease	Algorithm 7	1306.05	13.1770	2821.57	12.8471	895.66	11.8465	1488.29	11.3720
Heart Disease	Algorithm 6	1579.11	5.8859	3094.63	6.1726	921.98	1.3402	1514.61	6.2564
Diabetes	Algorithm 7	930.65	25.1995	2003.17	25.2196	584.11	20.1605	1063.36	20.7454
Diabetes	Algorithm 6	1600.26	35.6280	2672.77	35.7123	657.04	6.4499	1136.29	7.1045
Hypertension	Algorithm 7	8142.36	63.5788	10,126.00	63.5817	1438.92	74.3951	2763.35	74.4419
Hypertension	Algorithm 6	14,998.70	334.1825	16,982.34	334.1837	2200.95	38.0503	3525.38	38.0505

Table 5. The setting of parameters for each algorithms.

Parameters	Algorithm 8	Algorithm 1	Algorithm 2	Algorithm 3	Algorithm 4	Algorithm 5
$μ_{n}$	$1 - \frac{1}{n + 1}$	-	-	-	-	-
$ρ_{n}$	$- \frac{1}{n}$	-	-	-	-	-
$η_{n}$	$\frac{1}{n + 2}$	-	-	-	-	-
$β_{n}$	$\frac{n}{2 n + 1}$	-	-	-	-	-
$τ_{n}$	$0.5 + \frac{1}{2 n}$	-	-	-	-	-
$ξ_{n}$	$\frac{33 \times 10^{20}}{n^{2}}$	-	$\frac{1}{{(n + 1)}^{2}}$	$\frac{1}{{(n + 1)}^{2}}$	$\frac{1}{{(n + 1)}^{2}}$	$\frac{1}{{(n + 1)}^{2}}$
$α_{n}$	$\frac{n}{2 n + 1}$	$\frac{1}{n + 1}$	$\frac{1}{n + 1}$	$\frac{1}{n + 1}$	$\frac{1}{n + 1}$	$\frac{1}{n + 1}$
$γ_{n}$	$\frac{1}{L_{f}}$	$\frac{n \cdot 10^{- 5}}{(n + 1) \cdot L_{F}}$	$\frac{n \cdot 10^{- 5}}{(n + 1) \cdot L_{F}}$	$\frac{1}{L_{F}}$	$\frac{1}{L_{F}}$	$\frac{1}{L_{F}}$
s	0.001	0.001	0.001	0.001	0.001	0.001
$α$	-	-	3	3	3	3
q	-	-	-	-	4	4

Table 6. Performance comparison of algorithms on each dataset.

Datasets	Algorithm	Accuracy (%)		Precision		Recall		F1
Datasets	Algorithm	Train	Test	Train	Test	Train	Test	Train	Test
Breast Cancer	Algorithm 8	97.4234	97.3636	0.9455	0.9454	0.9828	0.9833	0.9638	0.9633
	Algorithm 1	96.1933	95.4561	0.9449	0.9495	0.9470	0.9207	0.9453	0.9320
	Algorithm 2	96.8928	96.3427	0.9533	0.9548	0.9582	0.9415	0.9557	0.9473
	Algorithm 3	97.0555	96.7796	0.9490	0.9480	0.9679	0.9625	0.9583	0.9546
	Algorithm 4	97.3320	97.0716	0.9485	0.9447	0.9768	0.9750	0.9624	0.9591
	Algorithm 5	97.2508	96.9246	0.9488	0.9482	0.9740	0.9667	0.9612	0.9567
Heart Disease	Algorithm 8	87.3486	84.7957	0.8627	0.85	0.9131	0.8846	0.8871	0.8635
	Algorithm 1	83.2051	81.1613	0.8531	0.8411	0.8357	0.8114	0.8441	0.8213
	Algorithm 2	86.6150	83.4731	0.8629	0.8419	0.8970	0.8658	0.8794	0.8496
	Algorithm 3	86.6517	83.4731	0.8634	0.8419	0.8970	0.8658	0.8797	0.8496
	Algorithm 4	86.2853	83.7957	0.8532	0.8376	0.9037	0.8779	0.8776	0.8540
	Algorithm 5	86.4311	83.8065	0.8637	0.8428	0.8916	0.8721	0.8774	0.8532
Diabetes	Algorithm 8	94.9031	94.2812	0.9332	0.9254	0.9819	0.9800	0.9569	0.9517
	Algorithm 1	87.3346	86.9609	0.9594	0.9588	0.8145	0.8085	0.8810	0.8730
	Algorithm 2	94.0367	93.5888	0.9186	0.9177	0.9836	0.9760	0.9500	0.9458
	Algorithm 3	94.0367	93.5888	0.9186	0.9177	0.9836	0.9760	0.9500	0.9458
	Algorithm 4	93.5011	91.5275	0.9341	0.9147	0.9544	0.9402	0.9441	0.9264
	Algorithm 5	94.1641	93.8161	0.9208	0.9182	0.9832	0.9800	0.9510	0.9480
Hypertension	Algorithm 8	89.0544	88.9980	0.8714	0.8698	0.8945	0.8955	0.8827	0.8822
	Algorithm 1	65.1008	65.1416	0.6071	0.6075	0.5895	0.5953	0.5934	0.5964
	Algorithm 2	88.0175	88.0157	0.8313	0.8311	0.9282	0.9289	0.8771	0.8772
	Algorithm 3	88.0175	88.0157	0.8313	0.8311	0.9282	0.9289	0.8771	0.8772
	Algorithm 4	87.9284	87.9011	0.8279	0.8278	0.9315	0.9314	0.8767	0.8765
	Algorithm 5	87.9393	87.9666	0.8288	0.8290	0.9303	0.9310	0.8766	0.8770

Table 7. Comparison of AIC and BIC scores (mean and standard deviation) for all algorithms.

Datasets	Algorithm	Train				Test
Datasets	Algorithm	AIC Mean	AIC Std	BIC Mean	BIC Std	AIC Mean	AIC Std	BIC Mean	BIC Std
Breast Cancer	Algorithm 8	519.3877	56.3868	1845.7279	56.3719	584.6227	33.4939	1251.7889	33.2445
	Algorithm 1	2238.5854	200.3185	3564.9257	200.2692	782.9862	19.9621	1450.1524	20.5936
	Algorithm 2	635.4704	58.1649	1961.8106	58.1312	605.9556	33.9859	1273.1218	34.3455
	Algorithm 3	635.6881	56.0138	1962.0283	55.9914	605.9556	33.9859	1273.1218	34.3455
	Algorithm 4	540.8845	50.3097	1867.2247	50.3208	589.6425	34.4505	1256.8087	34.2693
	Algorithm 5	558.5773	35.3092	1884.9175	35.3339	598.3528	34.6955	1265.5191	34.1824
Heart Disease	Algorithm 8	1238.6305	16.5334	2754.1462	16.5096	887.8190	10.7577	1480.4534	11.6136
	Algorithm 1	1609.6390	3.4581	3125.1548	3.8379	925.7843	1.2751	1518.4187	7.5333
	Algorithm 2	1253.4817	11.9895	2768.9974	11.8911	890.0700	10.7155	1482.7044	11.1995
	Algorithm 3	1253.4817	11.9895	2768.9974	11.8911	890.3966	10.7470	1483.0310	11.5066
	Algorithm 4	1261.5734	12.5241	2777.0892	12.2814	889.8314	11.5517	1482.4658	11.8088
	Algorithm 5	1258.9803	13.1210	2774.4961	12.7994	890.0805	10.7364	1482.7149	11.6097
Diabetes	Algorithm 8	756.0233	21.2332	1828.5392	21.1804	562.4426	25.2120	1041.6909	26.6428
	Algorithm 1	1479.6453	16.5049	2552.1612	16.7075	643.3454	6.2591	1122.5938	7.2984
	Algorithm 2	818.2987	26.2085	1890.8146	26.2338	567.9688	24.2671	1047.2172	25.1373
	Algorithm 3	818.2987	26.2085	1890.8146	26.2338	567.9688	24.2671	1047.2172	25.1373
	Algorithm 4	852.5711	29.2081	1925.0869	29.0782	579.1190	19.4671	1058.3673	20.9319
	Algorithm 5	818.4230	24.7334	1890.9388	24.8507	567.2419	22.9262	1046.4903	23.5958
Hypertension	Algorithm 8	7848.57	82.94	9832.17	82.94	1403.93	75.19	2728.36	75.19
	Algorithm 1	14,416.87	2736.23	16400.47	2736.24	2145.51	286.74	3469.94	286.68
	Algorithm 2	8350.16	63.68	10333.76	63.68	1457.44	71.74	2781.87	71.73
	Algorithm 3	8348.93	61.94	10332.53	61.94	1457.44	71.74	2781.87	71.73
	Algorithm 4	8414.86	61.84	10398.46	61.84	1464.74	69.33	2789.17	69.34
	Algorithm 5	8402.47	62.29	10386.07	62.29	1467.41	63.79	2791.84	63.80

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A New Accelerated Forward–Backward Splitting Algorithm for Monotone Inclusions with Application to Data Classification

Abstract

1. Introduction

2. Preliminaries

3. Main Results

4. Application

4.1. Application to ELM

4.2. Application to TELM

4.2.1. Experiments: Data Classification for Minimization Problems

4.2.2. Application to Convex Bilevel Optimization Problems

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics