1. Introduction
Let
H be a real Hilbert space with the norm
$\parallel \phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\parallel $ and
C be a nonempty closed convex subset of
$H.$ A mapping
$T:C\to C$ is said to be
nonexpansive if it satisfies the following symmetric contractivetype condition:
for all
$x,y\in C$; see [
1].
The element $x\in C$ is a fixed point of T if $Tx=x$ and $F\left(T\right):=\{x\in C:x=Tx\}$ stands for the set of all fixed points of T.
Fixed point theory, i.e., the study of the conditions under which a map admits a fixed point, is an extensive area of research due to its numerous applications in many fields. It started with Banach’s work, which established the existence of a unique fixed point for a contraction using a classical theorem known as the Banach contraction principle; see [
2]. The contraction principle of Banach has been expanded and generalized in various directions due to its applications in mathematics and other fields. One of the more recent generalizations is due to Jachymski.
Jachymski [
3] introduced the structure of the graph on metric spaces using fixed point theory and obtained certain conditions for selfmapping to be a Picard operator. Several authors [
4,
5,
6,
7] proved fixed point theorems for a new type of contraction on a metric space endowed with graphs. Aleomraninejad et al. [
8] used the idea of Reich et al. [
9] and proved a strong convergence theorem for
Gcontractive and
Gnonexpansive mappings. On hyperbolic metric spaces, Alfuraidan and Khamsi [
10] gave a definition of
Gmonotone nonexpansive multivalued mappings and proved the existence of a fixed point for multivalued contraction and monotone singlevalued mappings. Later on, Alfuraidan [
11] presented and study the existence of fixed points for Gmonotone nonexpansive and extended the results of Jachymski [
3]. For approximating common fixed points of a finite family of
Gnonexpansive mappings, Suantai et al. [
12] used the shrinking projection with the parallel monotone hybrid method. They also used a graph to prove a strong convergence theorem in Hilbert spaces under specific conditions, and they then applied their iterative scheme to signal recovery.
In the past decade, algorithms for approximating fixed points of
Gnonexpansive mappings without inertial techniques have been proposed by many researchers; see [
3,
8,
10,
11,
12,
13,
14,
15,
16,
17,
18]. We require more efficient algorithms for solving such problems. As a result, some accelerated fixedpoint algorithms using inertial techniques have been proposed to improve convergence behavior; see [
19,
20,
21,
22,
23,
24,
25,
26,
27]. Recently, Janngam et al. [
28] proved the weak convergence theorem for a countable family of
Gnonexpansive mappings in a Hilbert space by using a coordinate affine structure with an inertial technique. They also applied their method to image recovery.
Inspired by previous research described above, we introduce a new accelerated algorithm based on the concept of the inertial technique for finding a common fixed point of a family of Gnonexpansive mappings in Hilbert spaces. We employ our result to solve data classification and convex minimization problems and also compare our algorithm efficiency to that of FISTAS, FISTA, and nAGA.
This paper is classified as follows: in
Section 2, we give certain terminology as well as some facts that will be useful in later sections. We investigate and prove our algorithm’s weak convergence in
Section 3. For application, we apply our method for solving convex minimization and data classification problems in
Section 4 and provide numerical experiments of classification problems in
Section 5. The last section of our paper,
Section 6, is a summary.
2. Preliminaries
Let C be a nonempty subset of a real Banach space X. Let $\Delta =\left\{\right(u,u):u\in C\}$, where Δ stands for the diagonal of the Cartesian product $C\times C$. Consider a directed graph G in which the set $V\left(G\right)$ of its vertices corresponds to C, and the set $E\left(G\right)$ of its edges contains all loops. A directed graph G is said to have parallel edges if two or more edges with both the same tail vertex and the same head vertex.
Assume that
G does not have parallel edges. Then,
$G=\left(V\right(G),E(G\left)\right)$. The conversion of a graph
G is denoted by
${G}^{1}$. Thus, we have
Let us give some definitions of basic graph properties which are used in this paper (see [
29] for more details).
Definition 1. A graph G is said to be
 (i)
symmetric if for all $(x,y)\in E\left(G\right);$ we have $(y,x)\in E\left(G\right);$
 (ii)
transitive if for any $u,v,w\in V\left(G\right)$ with $(u,v),(v,w)\in E\left(G\right);$ then, $(u,w)\in E\left(G\right);$
 (iii)
connected if there is a path between any two vertices of the graph $G.$
The definition of
Gcontraction [
3] and
Gnonexpansive mappings [
13] are given as follows.
Definition 2. A mapping $T:C\to C$ is said to be
 (i)
Gcontraction if
 (a)
T is edgepreserving, i.e., $(Tu,Tv)\in E\left(G\right)$ for all $(u,v)\in E\left(G\right).$
 (b)
There exists $\rho \in [0,1)$ such that $\parallel TuTv\parallel \le \rho \parallel uv\parallel $ for all $(u,v)\in E\left(G\right),$ where ρ is called a contraction factor.
 (ii)
Gnonexpansive if
 (a)
T is edgepreserving.
 (b)
$\parallel TuTv\parallel \le \parallel uv\parallel $ for all $(u,v)\in E\left(G\right).$
Example 1. Let $C=[0,2]\subset \mathbb{R}$. Suppose that $(u,v)\in E\left(G\right)$ if and only if $0.4\le u,v\le 1.6$ or $u=v$, where $u,v\in \mathbb{R}$. Let $S,T:C\to C$ be defined byfor all $u\in C$. Then, both S and T are Gnonexpansive but not nonexpansive (see [30] for more details). Definition 3. A mapping $T:C\to C$ is called Gdemiclosed at 0 if ${u}_{n}\rightharpoonup u$ and $T{u}_{n}\to 0$, then $Tu=0$ for all sequence $\left\{{u}_{n}\right\}\subseteq C$ with $({u}_{n},{u}_{n+1})\in E\left(G\right)$.
To prove our main result, we need the definition of the coordinate affine of the graph G as follows.
Definition 4 ([
28])
. Assume that $\Lambda :={\cap}_{n=1}^{\infty}F\left({T}_{n}\right)\ne \varnothing $ and $\Lambda \times \Lambda \subseteq E\left(G\right)$. Then, $E\left(G\right)$ is said to be (i)
left coordinate affine if $\alpha (x,y)+\beta (u,y)\in E\left(G\right)$ for all $(x,y)$, $(u,y)\in E\left(G\right)$ and all α, $\beta \in \mathbb{R}$ with $\alpha +\beta =1.$
 (ii)
right coordinate affine if $\alpha (x,y)+\beta (x,z)\in E\left(G\right)$ for all $(x,y)$, $(x,z)\in E\left(G\right)$ and all α, $\beta \in \mathbb{R}$ with $\alpha +\beta =1.$
We say that $E\left(G\right)$ is coordinate affine if $E\left(G\right)$ is both left and right coordinate affine.
The results of the following lemmas can be used to prove our main theorem; see also [
19,
31,
32].
Lemma 1 ([
31])
. Let $\left\{{\eta}_{n}\right\},\left\{{\nu}_{n}\right\}$ and $\left\{{\vartheta}_{n}\right\}$ be sequences of nonnegative real numbers such that ${\sum}_{n=1}^{\infty}{\vartheta}_{n}<\infty $ and ${\sum}_{n=1}^{\infty}{\nu}_{n}<\infty $. Suppose thatThen, ${lim}_{n\to \infty}{\eta}_{n}$ exists.
Lemma 2 ([
32])
. For a real Hilbert space H, the following results hold:(i) For any $u,\upsilon \in H$ and $\gamma \in [0,1],$ (ii) For any $u,\upsilon \in H,$ Lemma 3 ([
19])
. Let $\left\{{\upsilon}_{n}\right\}$ and $\left\{{\mu}_{n}\right\}$ be sequences of nonnegative real numbers such thatThen,where $M=max\{{\upsilon}_{1},{\upsilon}_{2}\}.$ If ${\sum}_{n=1}^{\infty}{\mu}_{n}<\infty ,$ then $\left\{{\upsilon}_{n}\right\}$ is bounded. Let $\left\{{u}_{n}\right\}$ be a sequence in $X.$ We write ${u}_{n}\rightharpoonup u$ to indicate that a sequence $\left\{{u}_{n}\right\}$ converges weakly to a point $u\in H.$ Similarly, ${u}_{n}\to u$ will symbolize the strong convergence. For $v\in C,$ if there is a subsequence $\left\{{u}_{{n}_{k}}\right\}$ of $\left\{{u}_{n}\right\}$ such that ${u}_{{n}_{k}}\rightharpoonup v,$ then v is called a weak cluster point of $\left\{{u}_{n}\right\}.$ The set of all weak cluster points of $\left\{{u}_{n}\right\}$ is denoted by ${\omega}_{w}\left({u}_{n}\right)$.
The following lemma was proved by Moudafi and AlShemas; see [
33].
Lemma 4 ([
33])
. Let $\left\{{u}_{n}\right\}$ be a sequence in a real Hilbert space H such that there exists $\varnothing \ne \Lambda \subset H$ satisfying:(i) For any $p\in \Lambda ,{lim}_{n\to \infty}\parallel {u}_{n}p\parallel $ exists.
(ii) Any weak cluster point of $\left\{{u}_{n}\right\}\in \Lambda .$
Then, there exists ${x}^{*}\in \Lambda $ such that ${u}_{n}\rightharpoonup {x}^{*}.$
Let
$\left\{{T}_{n}\right\}$ and
$\psi $ be families of nonexpansive mappings of
C into itself such that
$\varnothing \ne F\left(\psi \right)\subset {\cap}_{n=1}^{\infty}F\left({T}_{n}\right),$ where
$F\left(\psi \right)$ is the set of all common fixed points of each
$T\in \psi .$ A sequence
$\left\{{T}_{n}\right\}$ satisfies the NSTcondition (I) with
$\psi $ if, for any bounded sequence
$\left\{{u}_{n}\right\}$ in
$C,$
for all
$T\in \psi $; see [
34]. If
$\psi =\left\{T\right\},$ then
$\left\{{T}_{n}\right\}$ satisfies the NSTcondition (I) with
$T.$Example 2 ([
30])
. Let $T\in \psi $. Define ${T}_{n}={\beta}_{n}I+(1{\beta}_{n})T$, where $0<s\le {\beta}_{n}\le t<1$ for all $n\in \mathbb{N}$. Therefore, $\left\{{T}_{n}\right\}$ is a family of Gnonexpansive mappings and satisfies the NSTcondition. Let $A:H\to {2}^{H}$ be a maximal monotone operator and $c>0$. The resolvent of A is defined by ${J}_{cA}={(I+cA)}^{1}$, where I is an identity operator. If $A=\partial f$ for some $f\in {\Gamma}_{0}\left(H\right)$, where ${\Gamma}_{0}\left(H\right)$ stands for the set of proper lower semicontinuous convex functions from $H\to (\infty ,+\infty ]$, then ${J}_{cA}=pro{x}_{cf}$. The forwardbackward operator of lower semicontinuous and convex functions of $f,g:{\mathbb{R}}^{n}\to (\infty ,+\infty ]$ has the following definition:
A forwardbackward operator
T is defined by
$T:=pro{x}_{\lambda g}(I\lambda \nabla f)$ for
$\lambda >0$, where
$\nabla f$ is the gradient operator of function
f and
$pro{x}_{\lambda g}x:=argmi{n}_{y\in H}\left\{g\left(y\right)+\frac{1}{2\lambda}{\parallel yx\parallel}^{2}\right\}$ (see [
35,
36]). Moreau [
37] defined the operator
$pro{x}_{\lambda g}$ as the proximity operator with respect to
$\lambda $ and
g. If
$\lambda \in (0,2/L)$, then
T is nonexpansive and
L is a Lipschitz constant of
$\nabla f$.
We have the following remark for the definition of the proximity operator; see [
38].
Remark 1. Let $g:{\mathbb{R}}^{n}\to \mathbb{R}$ be given by $g\left(x\right)={\lambda \parallel x\parallel}_{1}$. The proximity operator of g is evaluated by the following formulawhere $x=({x}_{1},{x}_{2},\cdots ,{x}_{n})$ and ${\parallel x\parallel}_{1}={\sum}_{i=1}^{n}\left{x}_{i}\right$. The following lemma was proved by Bussaban et al.; see [
20].
Lemma 5. Let H be a real Hilbert space and T be the forwardbackward operator of f and g, where g is a proper lower semicontinuous convex function from H into $\mathbb{R}\cup \{\infty \}$, and f is a convex differentiable function from H into $\mathbb{R}$ with gradient $\nabla f$ being LLipschitz constant for some $L>0$. If $\left\{{T}_{n}\right\}$ is the forwardbackward operator of f and g such that ${a}_{n}\to a$ with a, ${a}_{n}\in (0,2/L)$, then $\left\{{T}_{n}\right\}$ satisfies the $NST$condition (I) with T.
3. Main Results
Let C be a nonempty closed and convex subset of a real Hilbert space H with a directed graph $G=\left(V\right(G),E(G\left)\right)$ such that $V\left(G\right)=C$. Let $\left\{{T}_{n}\right\}$ be a family of Gnonexpansive mappings of C into itself such that $\varnothing \ne {\cap}_{n=1}^{\infty}F\left({T}_{n}\right)$.
The following proposition is useful for our main theorem.
Proposition 1. Let ${x}^{*}\in {\cap}_{n=1}^{\infty}F\left({T}_{n}\right)$ and ${y}_{0},{x}_{1}\in C$ be such that $({x}^{*},{y}_{0})$, $({x}^{*},{x}_{1})\in E\left(G\right)$. Let $\left\{{x}_{n}\right\}$ be a sequence generated by Algorithm 1. Suppose $E\left(G\right)$ is symmetric, transitive, and a right coordinate affine. Then, $({x}^{*},{x}_{n}),$ $({x}^{*},{y}_{n}),$ $({x}^{*},{z}_{n})$ and $({x}_{n},{x}_{n+1})\in E\left(G\right)$ for all $n\in \mathbb{N}.$
Algorithm 1: (ASA) An Accelerated Salgorithm 
 1:
Initial. Take ${y}_{0}$, ${x}_{1}\in C$ are arbitrary, $n=1$, ${\beta}_{n}\in [a,b]\subset (0,1)$, ${\theta}_{n}\ge 0$ and ${\sum}_{n=1}^{\infty}{\theta}_{n}<\infty $ and ${\alpha}_{n}\to 1$.  2:
Step 1. Compute ${y}_{n},{z}_{n}$ and ${x}_{n+1}$ by
Then, $n:=n+1$ and go to Step 1.

Proof. We shall prove the results by using strong mathematical induction. From Algorithm 1, we obtain
Since
$({x}^{*},{x}_{1})\in E\left(G\right)$ and
${T}_{n}$ is edge preserving, we obtain
$({x}^{*},{z}_{1})\in E\left(G\right).$ Again, by Algorithm 1, we obtain
Since
$({x}^{*},{z}_{1})\in E\left(G\right)$ and
${T}_{n}$ is edge preserving, we obtain
$({x}^{*},{y}_{1})\in E\left(G\right).$ Next, we assume that
$({x}^{*},{z}_{k}),({x}^{*},{y}_{k})$ and
$({x}^{*},{x}_{k})\in E\left(G\right)$ for all
$k<n.$ By Algorithm 1, we obtain
and
By (
1)–(
3) and since
$E\left(G\right)$ is right coordinate affine and
${T}_{n}$ is edge preserving, we obtain
$({x}^{*},{x}_{k+1}),({x}^{*},{y}_{k+1})$ and
$({x}^{*},{z}_{k+1})$ are in
$E\left(G\right).$ By strong mathematical induction, we conclude that
$({x}^{*},{x}_{n}),({x}^{*},{y}_{n}),$ $({x}^{*},{z}_{n})\in E\left(G\right)$ for all
$n\in \mathbb{N}.$ Since
$E\left(G\right)$ is symmetric, we obtain
$({x}_{n},{x}^{*})\in E\left(G\right).$ Since
$({x}_{n},{x}^{*}),({x}^{*},{x}_{n+1})\in E\left(G\right)$ and
$E\left(G\right)$ is transitive, we obtain
$({x}_{n},{x}_{n+1})\in E\left(G\right)$ as required. □
We now prove the weak convergence of Gnonexpansive mapping in a real Hilbert space by using Algorithm 1.
Theorem 1. Let C be a nonempty closed and convex subset of a real Hilbert space H with a directed graph $G=\left(V\right(G),E(G\left)\right)$ with $V\left(G\right)=C$ and $E\left(G\right)$ is symmetric, transitive, and right coordinate affine. Let ${y}_{0}$, ${x}_{1}\in C$ and $\left\{{x}_{n}\right\}$ be a sequence in H defined by Algorithm 1. Suppose $\left\{{T}_{n}\right\}$ satisfies NSTcondition (I) with T such that $\varnothing \ne F\left(T\right)\subset {\cap}_{n=1}^{\infty}F\left({T}_{n}\right)$ and $({x}^{*},{y}_{0}),({x}^{*},{x}_{1})\in E\left(G\right)$ for all ${x}^{*}\in {\cap}_{n=1}^{\infty}F\left({T}_{n}\right).$ Then, ${x}_{n}$ $\rightharpoonup {x}^{*}\in F\left(T\right).$
Proof. Let
${x}^{*}\in {\cap}_{n=1}^{\infty}F\left({T}_{n}\right).$ By the definition of
${z}_{n}$ and
${y}_{n},$ we obtain
and
which implies that
By the definition of
${x}_{n},$ we obtain
From (
6) and (
7), we obtain
Applying Lemma 3, we obtain
$\parallel {x}^{*}{y}_{n}\parallel \le M\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}{\prod}_{j=1}^{n}(1+2{\theta}_{j}),$ where
$M=max\{\parallel {x}^{*}{y}_{1}\parallel ,\parallel {x}^{*}{y}_{2}\parallel \}.$ Since
${\sum}_{n=1}^{\infty}{\theta}_{n}<\infty ,$ we obtain that
$\left\{{y}_{n}\right\}$ is bounded and so are
$\left\{{z}_{n}\right\}$ and
$\left\{{x}_{n}\right\}$. Thus,
By Lemma 1 and (
9), we obtain
${lim}_{n\to \infty}\parallel {x}^{*}{y}_{n}\parallel $ exists. By Lemma 2(i) and the definition of
${z}_{n},$ we obtain
Let
${lim}_{n\to \infty}\parallel {x}^{*}{y}_{n}\parallel =a.$ From the boundedness of
$\left\{{x}_{n}\right\}$ and (
6), we obtain
Since
$\parallel {x}^{*}{x}_{n1}\parallel \le \parallel {x}^{*}{y}_{n}\parallel +{\theta}_{n}\parallel {y}_{n}{y}_{n1}\parallel $ and (
10), we obtain
It follows from (
12) and (
13) that
From (
4), one can easily see that
${lim\; sup}_{n\to \infty}\parallel {x}^{*}{z}_{n}\parallel \le a.$ By (
5) with
${\alpha}_{n}\to 1,$ we obtain that
$a\le {lim\; inf}_{n\to \infty}\parallel {x}^{*}{z}_{n}\parallel .$ Thus,
Use the facts that (
11), (
14), and (
15) yield
According to $\left\{{T}_{n}\right\}$ satisfying the NSTcondition (I) with $T,$ we obtain that $\parallel T{x}_{n}{x}_{n}\parallel \to 0$ as $n\to \infty .$ Let ${\omega}_{w}\left({x}_{n}\right)$ be the set of all weak cluster point of $\left\{{x}_{n}\right\}.$ Thus, ${\omega}_{w}\left({x}_{n}\right)\subset F\left(T\right)$ by demicloseness of $IT$ at $0.$ From Lemma 4, we conclude that ${x}_{n}\rightharpoonup {x}^{*}$ with ${x}^{*}\in F\left(T\right)$ as required. □
Corollary 1. Let C be a nonempty closed and convex subset of a real Hilbert space H and let $\left\{{T}_{n}\right\}$ be a family of nonexpansive mappings of C into itself. Let ${y}_{0}$, ${x}_{1}\in C$, and $\left\{{x}_{n}\right\}$ be a sequence in H defined by Algorithm 1. Suppose that $\left\{{T}_{n}\right\}$ satisfies NSTcondition (I) with T such that $\varnothing \ne F\left(T\right)\subset {\cap}_{n=1}^{\infty}F\left({T}_{n}\right)$. Then, $\left\{{x}_{n}\right\}$ converges weakly to a point in $F\left(T\right)$.
4. Applications
In the past decade, extreme learning machine (ELM) [
39], a new learning algorithm for singlehidden layer feedforward networks (SLFNs), has been extensively studied in various research topics for machine learning and artificial intelligence such as face classification, image segmentation, regression, and data classification problems. ELM was theoretically proven to have extremely fast learning speed and good performance better than the gradientbased learning such as backpropagation in most of the cases. The target of this model is to find the parameter
$\beta $ that solves the following minimization problem, called ordinary least square (OLS),
where
$\parallel \phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}\parallel $ is the
${l}_{2}$norm defined by
${\parallel x\parallel}_{2}=\sqrt{{\sum}_{i=1}^{n}{\left{x}_{i}\right}^{2}}$,
$\mathbf{T}\in {\mathbb{R}}^{N\times m}$ is the target of data,
$\beta \in {\mathbb{R}}^{M\times m}$ is a weight which connects hidden layer and output layer, and
$H\in {\mathbb{R}}^{N\times M}$ is the hidden layer output matrix. In general mathematical modeling, there are several methods to estimate the solution of (
16); in this case, the solution
$\beta $ obtained by
$\beta ={\mathbf{H}}^{\u2020}\mathbf{T}$, where
${\mathbf{H}}^{\u2020}$ is the Moore–Penrose generalized inverse of
$\mathbf{H}$. However, in a real situation, the number of unknown variable
M is much more than the number of training data
N, which causes the network to possibly lead to overfitting. On the other hand, the accuracy is low while the number of hidden nodes
M is small. Thus, in order to improve (
16), several regularization methods were introduced. The classical two standard techniques for improving (
16) are subset selection and ridge regression (sometimes called Tikhonov regularization) [
40].
In this paper, we focus on the following problem, called least absolute shrinkage and selection operator (LASSO) [
41],
where
$\lambda $ is a regularization parameter. LASSO tries to retain the good features of both subset selection and ridge regression [
41]. After the regularization methods and the original ELM were introduced for improving performance of OLS, five years later, the regularized extreme learning machine [
42] was proposed and applied to solve regression problems. In general, the (
17) can be rewritten as minimization of
$f+g$, that is,
where
f is a smooth convex function with gradient having Lipschitz constant
L and
g is a convex smooth (or possible nonsmooth) function. The solution of (
18) can be rewritten into
$\tilde{x}$ is a minimizer of
$(f+g)$ if and only if
$0\in \nabla f\left(\tilde{x}\right)+\partial g\left(\tilde{x}\right)$, where
$\nabla f\left(\tilde{x}\right)$ is the gradient of
f and
$\partial g\left(\tilde{x}\right)$ is a subdifferential of
g by using Fermat’s rule (see [
35] for more details). In fixed point theory, Parikh et al. [
43] characterized (
18) as follows:
$\tilde{x}$ is a minimizer of
$f+g$ if and only if
where
$pro{x}_{\lambda g}$ is the proximity operator of
$\lambda g$,
$\lambda >0$ and
${J}_{\partial g}$ is defined by
${J}_{\partial g}={(I+\partial g)}^{1}$,
${J}_{\partial g}$ is the resolvent of
$\partial g$ and
I is an identity operator. The problem (
18) can be rewritten into a general problem, called a zero of sum of two operators problem, by finding
$\tilde{x}$ such that
where
$A,B:H\to {2}^{H}$ are two setvalued operators and
$zer(A+B):=\{x:0\in Ax+Bx\}$. In this case, we assume that
$A:H\to {2}^{H}$ is a maximal monotone operator and
$B:H\to H$ is an
LLipschitz operator. For convenience, (
19) also can be rewritten as:
where
$T=pro{x}_{\lambda g}(I\lambda \nabla f)$. It is also known that
T is nonexpansive if
$\lambda \in (0,2/L)$ when
L is a Lipschitz constant of
$\nabla f$.
We are interested in applying our proposed method for solving a convex minimization problem and compared the convergence behavior of our proposed algorithm with the others and give some applications to solve classification problems. Our proposed method will be used to solve (
18). Over the past two decades, several algorithms have been introduced for solving the problem (
18). A simple and classical algorithm is the forwardbackward algorithm (FBA), which was introduced by Lions and Mercier [
21].
The forwardbackward algorithm (FBA) is defined by
where
$n\ge 1$,
${x}_{0}\in H$ and
L is a Lipschitz constant of
$\nabla f$,
$\gamma \in (0,2/L)$,
$\delta =2(\gamma L/2)$ and
$\left\{{\rho}_{n}\right\}$ is a sequence in
$[0,\delta ]$ such that
${\sum}_{n\in \mathbb{N}}{\rho}_{n}(\delta {\rho}_{n})=+\infty $. A technique for improving speed and giving a better convergence behavior of the algorithms was introduced firstly by Polyak [
44] by adding an inertial step called inertial techniques. Since then, many authors have employed the inertial technique to accelerate their algorithms for various kinds of problems; see [
19,
20,
22,
23,
24,
25,
26]. The performance of FBA can be improved using an iterative method with the inertial steps described below.
A fast iterative shrinkagethresholding algorithm (FISTA) [
25] is defined by
where
$n\ge 1$,
${s}_{1}=1$,
${x}_{1}={y}_{0}\in {\mathbb{R}}^{n}$,
$T:=pro{x}_{\frac{1}{L}g}(I\frac{1}{L}\nabla f)$ and
${\mu}_{n}$ is the inertial step size. Beck and Teboulle [
25] solved the image recovery and proved the convergence rate using FISTA. The inertial step size
${\mu}_{n}$ of the FISTA was firstly introduced by Nesterov [
45].
A fast iterative shrinkagethresholding algorithmSiteration (FISTAS) [
27] is defined by
where
${x}_{0}$,
${x}_{1}\in H$,
${\alpha}_{n}$,
${\beta}_{n}\in [a,b]\subset (0,1)$ ${T}_{n}=pro{x}_{{c}_{n}g}(I{c}_{n}\nabla f)$ and
$T=pro{x}_{cg}(Ic\nabla f)$. Bussaban et al. [
27] solved the image recovery and proved the weak convergence theorem using FISTAS.
A new accelerated proximal gradient algorithm (nAGA) [
26] is defined by
where
$n\ge 1$,
${T}_{n}=pro{x}_{{a}_{n}g}(I{a}_{n}\nabla f)$ with
${a}_{n}\in (0,2/L)$ and
$\left\{{\mu}_{n}\right\},\left\{{\rho}_{n}\right\}$ are sequences in
$(0,1)$ and
$\frac{\parallel {x}_{n}{x}_{n1}{\parallel}_{2}}{{\mu}_{n}}\to 0$. The nAGA was introduced for proving a convergence theorem by Verma and Shukla [
26]. The nonsmooth convex minimization problem with sparsity, including regularizers, was solved using this method for the multitask learning framework.
Theorem 2. Let H be a Hilbert space, $A:H\to {2}^{H}$ be maximal monotone operator and $B:H\to H$ be an LLipschitz operator. Let $a\in (0,2/L)$ and $\left\{{a}_{n}\right\}\subset (0,2/L)$ such that ${a}_{n}\to a$. Define ${T}_{n}={J}_{{c}_{n}A}(I{c}_{n}B)$ and $T={J}_{cA}(IcB)$. Suppose that $zer(A+B)\ne \varnothing $. Let $\left\{{x}_{n}\right\}$ be a sequence in H defined by Algorithm 1. Then, $\left\{{x}_{n}\right\}$ converges weakly to a point in $zer(A+B)$.
Proof. Using Proposition 26.1(iv) (see [
35]), we have
$\left\{{T}_{n}\right\}$ and
T are nonexpansive mappings such that
$F\left(T\right)=F\left({T}_{n}\right)=zer(A+B).$ Then, the proof is completed by Theorem 1 and Lemma 5. □
The convergence of Algorithm 2 is obtained by using our main result.
Algorithm 2: (FBASA) A forwardbackward accelerated Salgorithm 
 1:
Initial. Take ${y}_{0}$, ${x}_{1}\in C$ are arbitary, $n=1$, ${\beta}_{n}\in [a,b]\subset (0,1),$ ${\theta}_{n}\ge 0,$ ${\sum}_{n=1}^{\infty}{\theta}_{n}<\infty $ and ${\alpha}_{n}\to 1.$  2:
Step 1. Compute ${y}_{n},$${z}_{n}$ and ${x}_{n+1}$ by using
Then, $n:=n+1$ and go to Step 1.

Theorem 3. For $f,g:{\mathbb{R}}^{n}\to (\infty ,\infty ]$, f is a smooth convex function with a gradient having a Lipschitz constant L and g is a convex function. Let ${a}_{n}\in (0,2/L)$ be such that $\left\{{a}_{n}\right\}$ converges to a and let $T:=pro{x}_{ag}(Ia\nabla f)$ and ${T}_{n}:=pro{x}_{{a}_{n}g}(I{a}_{n}\nabla f)$ and let $\left\{{x}_{n}\right\}$ be a sequence generated by Algorithm 2, where ${\beta}_{n},{\alpha}_{n}$ and ${\theta}_{n}$ are the same as in Algorithm 1. Then,
(i) $\parallel {x}^{*}{x}_{n+1}\parallel \le M\phantom{\rule{0.166667em}{0ex}}\xb7\phantom{\rule{0.166667em}{0ex}}{\prod}_{j=1}^{n}(2{\theta}_{j}+1)$, where $M=max\{\parallel {x}^{*}{x}_{1}\parallel ,\parallel {x}^{*}{x}_{2}\parallel \}$ and ${x}^{*}\in Argmin(f+g);$
(ii) $\left\{{x}_{n}\right\}$ converges weakly to a point in Argmin $(f+g)$.
Proof. We know that
T and
$\left\{{T}_{n}\right\}$ are nonexpansive operators, and
$F\left(T\right)={\cap}_{n=1}^{\infty}F\left({T}_{n}\right)=Argmin(f+g)$ for all
n; see [
35]. Then,
$\left\{{T}_{n}\right\}$ satisfies the NSTcondition (I) with
T by using Lemma 5. We get the desired result immediately from Theorem 1 by putting
$G={\mathbb{R}}^{n}\times {\mathbb{R}}^{n}$, the complete graph, on
${\mathbb{R}}^{n}$. □
5. Numerical Experiments
The classification problem is one of the most important problems in the convex minimization problem. We illustrate the process of reformulating the data classification problem in machine learning.
We first present a basic idea of extreme learning machines for data classification problem, and use our algorithm to find this problem through numerical experiments. Moreover, the performance of Algorithm 2, FISTAS, FISTA, and nAGA are compared.
Extreme learning machine (ELM). Let
R:=
$\{({x}_{k},{t}_{k}):{x}_{k}\in {\mathbb{R}}^{n}$,
${t}_{k}\in {\mathbb{R}}^{m}$,
$k=1,2,\cdots ,N\}$ be a training set of different samples
N, where
${x}_{k}$ is input data and
${t}_{k}$ is a target. A standard SLFNs with activation function
$\Psi \left(x\right)$ (for instance sigmoid), and
M hidden nodes can be rewritten as
where
${\beta}_{j}$ is the weight vector connecting the
jth the output node and hidden node,
${w}_{j}$ is the weight vector connecting the
jth the input node and hidden node, and
${c}_{j}$ is the threshold of the
jth hidden node. The objective of standard SLFNs is to estimate these
N different samples with
${\sum}_{i=1}^{N}\parallel {t}_{i}{o}_{i}\parallel =0$, that is, there exist
${\beta}_{j}$,
${w}_{j}$,
${c}_{j}$ such that
We can derive a simple equation from the above
N equations as follows:
A standard SLFN goal is to estimate
${\beta}_{j}$,
${w}_{j}$, and
${c}_{j}$ to solve (
18), whereas an ELM goal is to find only
${\beta}_{j}$ with
${w}_{j}$ and
${c}_{j}$ chosen at random.
In an experiment on classification problems, we employ the model (
17) to solve the convex minimization problem. We set
$f\left(x\right)={\parallel \mathbf{H}\beta \mathbf{T}\parallel}_{2}^{2}$ and
$g\left(x\right)={\lambda \parallel \beta \parallel}_{1}$. Next, we use the Iris dataset to classify iris plant types, and the Heart Disease UCI dataset to identify heart patients which are detailed as follows:
Iris dataset [
46]. This dataset has three classes of 50 examples, each of which represents a different variety of iris plant. The purpose is to identify each iris plant species based on the length of its sepals and petals.
Heart Disease UCI dataset [
47]. Although there are 76 attributes in the original dataset, all published experiments only use 14 of them. Data on patients with heart disease are provided in this dataset. We divide the data into two classes based on the predicted attributes.
All control parameters are set to the values shown in
Table 1,
$L=2\parallel {\mathbf{H}}_{\mathbf{1}}{\parallel}^{2}$, where
${\mathbf{H}}_{\mathbf{1}}$ is a hidden layer output matrix of a training matrix,
$M=100$, and
$\Psi \left(x\right)$ is sigmoid. Each dataset is given a training set, as indicated in
Table 2. We evaluated the output data’s accuracy by
From the results in
Table 3, we conclude that the proposed learning algorithm under selection with the identical number of hidden nodes
M has high performance in terms of the accuracy. The weight computed by Algorithm 2 converges faster to the optimal weight and performs accuracy better than those computed by FISTAS, FISTA, and nAGA.