1. Introduction
Let 
H be a real Hilbert space with the norm 
 and 
C be a nonempty closed convex subset of 
 A mapping 
 is said to be 
nonexpansive if it satisfies the following symmetric contractive-type condition:
      for all 
; see [
1].
The element  is a fixed point of T if  and  stands for the set of all fixed points of T.
Fixed point theory, i.e., the study of the conditions under which a map admits a fixed point, is an extensive area of research due to its numerous applications in many fields. It started with Banach’s work, which established the existence of a unique fixed point for a contraction using a classical theorem known as the Banach contraction principle; see [
2]. The contraction principle of Banach has been expanded and generalized in various directions due to its applications in mathematics and other fields. One of the more recent generalizations is due to Jachymski.
Jachymski [
3] introduced the structure of the graph on metric spaces using fixed point theory and obtained certain conditions for self-mapping to be a Picard operator. Several authors [
4,
5,
6,
7] proved fixed point theorems for a new type of contraction on a metric space endowed with graphs. Aleomraninejad et al. [
8] used the idea of Reich et al. [
9] and proved a strong convergence theorem for 
G-contractive and 
G-nonexpansive mappings. On hyperbolic metric spaces, Alfuraidan and Khamsi [
10] gave a definition of 
G-monotone nonexpansive multivalued mappings and proved the existence of a fixed point for multivalued contraction and monotone single-valued mappings. Later on, Alfuraidan [
11] presented and study the existence of fixed points for G-monotone nonexpansive and extended the results of Jachymski [
3]. For approximating common fixed points of a finite family of 
G-nonexpansive mappings, Suantai et al. [
12] used the shrinking projection with the parallel monotone hybrid method. They also used a graph to prove a strong convergence theorem in Hilbert spaces under specific conditions, and they then applied their iterative scheme to signal recovery.
In the past decade, algorithms for approximating fixed points of 
G-nonexpansive mappings without inertial techniques have been proposed by many researchers; see [
3,
8,
10,
11,
12,
13,
14,
15,
16,
17,
18]. We require more efficient algorithms for solving such problems. As a result, some accelerated fixed-point algorithms using inertial techniques have been proposed to improve convergence behavior; see [
19,
20,
21,
22,
23,
24,
25,
26,
27]. Recently, Janngam et al. [
28] proved the weak convergence theorem for a countable family of 
G-nonexpansive mappings in a Hilbert space by using a coordinate affine structure with an inertial technique. They also applied their method to image recovery.
Inspired by previous research described above, we introduce a new accelerated algorithm based on the concept of the inertial technique for finding a common fixed point of a family of G-nonexpansive mappings in Hilbert spaces. We employ our result to solve data classification and convex minimization problems and also compare our algorithm efficiency to that of FISTA-S, FISTA, and nAGA.
This paper is classified as follows: in 
Section 2, we give certain terminology as well as some facts that will be useful in later sections. We investigate and prove our algorithm’s weak convergence in 
Section 3. For application, we apply our method for solving convex minimization and data classification problems in 
Section 4 and provide numerical experiments of classification problems in 
Section 5. The last section of our paper, 
Section 6, is a summary.
  2. Preliminaries
Let C be a nonempty subset of a real Banach space X. Let , where Δ stands for the diagonal of the Cartesian product . Consider a directed graph G in which the set  of its vertices corresponds to C, and the set  of its edges contains all loops. A directed graph G is said to have parallel edges if two or more edges with both the same tail vertex and the same head vertex.
Assume that 
G does not have parallel edges. Then, 
. The conversion of a graph 
G is denoted by 
. Thus, we have
      
Let us give some definitions of basic graph properties which are used in this paper (see [
29] for more details).
Definition 1.  A graph G is said to be
- (i)
- symmetric if for all  we have  
- (ii)
- transitive if for any  with  then,  
- (iii)
- connected if there is a path between any two vertices of the graph  
 The definition of 
G-contraction [
3] and 
G-nonexpansive mappings [
13] are given as follows.
Definition 2.  A mapping  is said to be
- (i)
- G-contraction if - (a)
- T is edge-preserving, i.e.,  for all  
- (b)
- There exists  such that  for all  where ρ is called a contraction factor. 
 
- (ii)
- G-nonexpansive if - (a)
- T is edge-preserving. 
- (b)
-  for all  
 
 Example 1.  Let . Suppose that  if and only if  or , where . Let  be defined byfor all . Then, both S and T are G-nonexpansive but not nonexpansive (see [30] for more details).  Definition 3.  A mapping  is called G-demiclosed at 0 if  and , then  for all sequence  with .
 To prove our main result, we need the definition of the coordinate affine of the graph G as follows.
Definition 4  ([
28])
. Assume that  and . Then,  is said to be- (i)
- left coordinate affine if  for all ,  and all α,  with  
- (ii)
- right coordinate affine if  for all ,  and all α,  with  
We say that  is coordinate affine if  is both left and right coordinate affine.
 The results of the following lemmas can be used to prove our main theorem; see also [
19,
31,
32].
Lemma 1  ([
31])
. Let  and  be sequences of nonnegative real numbers such that  and . Suppose thatThen,  exists.
 Lemma 2  ([
32])
. For a real Hilbert space H, the following results hold:(i) For any  and   Lemma 3  ([
19])
. Let  and  be sequences of nonnegative real numbers such thatThen,where  If  then  is bounded.  Let  be a sequence in  We write  to indicate that a sequence  converges weakly to a point  Similarly,  will symbolize the strong convergence. For   if there is a subsequence  of  such that  then v is called a weak cluster point of  The set of all weak cluster points of  is denoted by .
The following lemma was proved by Moudafi and Al-Shemas; see [
33].
Lemma 4  ([
33])
. Let  be a sequence in a real Hilbert space H such that there exists  satisfying:(i) For any  exists.
(ii) Any weak cluster point of 
Then, there exists  such that 
 Let 
 and 
 be families of nonexpansive mappings of 
C into itself such that 
 where 
 is the set of all common fixed points of each 
 A sequence 
 satisfies the NST-condition (I) with 
 if, for any bounded sequence 
 in 
      for all 
; see [
34]. If 
 then 
 satisfies the NST-condition (I) with 
Example 2  ([
30])
. Let . Define , where  for all . Therefore,  is a family of G-nonexpansive mappings and satisfies the NST-condition. Let  be a maximal monotone operator and . The resolvent of A is defined by , where I is an identity operator. If  for some , where  stands for the set of proper lower semicontinuous convex functions from , then . The forward-backward operator of lower semi-continuous and convex functions of  has the following definition:
A forward-backward operator 
T is defined by 
 for 
, where 
 is the gradient operator of function 
f and 
 (see [
35,
36]). Moreau [
37] defined the operator 
 as the proximity operator with respect to 
 and 
g. If 
, then 
T is nonexpansive and 
L is a Lipschitz constant of 
.
We have the following remark for the definition of the proximity operator; see [
38].
Remark 1.  Let  be given by . The proximity operator of g is evaluated by the following formulawhere  and .  The following lemma was proved by Bussaban et al.; see [
20].
Lemma 5.  Let H be a real Hilbert space and T be the forward-backward operator of f and g, where g is a proper lower semi-continuous convex function from H into , and f is a convex differentiable function from H into  with gradient  being L-Lipschitz constant for some . If  is the forward-backward operator of f and g such that  with a, , then  satisfies the -condition (I) with T.
   3. Main Results
Let C be a nonempty closed and convex subset of a real Hilbert space H with a directed graph  such that . Let  be a family of G-nonexpansive mappings of C into itself such that .
The following proposition is useful for our main theorem.
Proposition 1.  Let  and  be such that , . Let  be a sequence generated by Algorithm 1. Suppose  is symmetric, transitive, and a right coordinate affine. Then,    and  for all 
 | Algorithm 1: (ASA) An Accelerated S-algorithm | 
| 1:Initial. Take ,  are arbitrary, , ,  and  and .2:Step 1. Compute   and   by
                   Then,  and go to Step 1.
 | 
Proof.  We shall prove the results by using strong mathematical induction. From Algorithm 1, we obtain
        
Since 
 and 
 is edge preserving, we obtain 
 Again, by Algorithm 1, we obtain
        
Since 
 and 
 is edge preserving, we obtain 
 Next, we assume that 
 and 
 for all 
 By Algorithm 1, we obtain
        
        and
        
By (
1)–(
3) and since 
 is right coordinate affine and 
 is edge preserving, we obtain 
 and 
 are in 
 By strong mathematical induction, we conclude that 
  for all 
 Since 
 is symmetric, we obtain 
 Since 
 and 
 is transitive, we obtain 
 as required. □
 We now prove the weak convergence of G-nonexpansive mapping in a real Hilbert space by using Algorithm 1.
Theorem 1.  Let C be a nonempty closed and convex subset of a real Hilbert space H with a directed graph  with  and  is symmetric, transitive, and right coordinate affine. Let ,  and  be a sequence in H defined by Algorithm 1. Suppose  satisfies NST-condition (I) with T such that  and  for all  Then,  
 Proof.  Let 
 By the definition of 
 and 
 we obtain
        
        and
        
        which implies that
        
By the definition of 
 we obtain
        
From (
6) and (
7), we obtain
        
Applying Lemma 3, we obtain 
 where 
 Since 
 we obtain that 
 is bounded and so are 
 and 
. Thus,
        
By Lemma 1 and (
9), we obtain 
 exists. By Lemma 2(i) and the definition of 
 we obtain
        
Let 
 From the boundedness of 
 and (
6), we obtain
        
Since 
 and (
10), we obtain
        
It follows from (
12) and (
13) that
        
From (
4), one can easily see that 
 By (
5) with 
 we obtain that 
 Thus,
        
Use the facts that (
11), (
14), and (
15) yield
        
According to  satisfying the NST-condition (I) with  we obtain that  as  Let  be the set of all weak cluster point of  Thus,  by demicloseness of  at  From Lemma 4, we conclude that  with  as required. □
 Corollary 1.  Let C be a nonempty closed and convex subset of a real Hilbert space H and let  be a family of nonexpansive mappings of C into itself. Let , , and  be a sequence in H defined by Algorithm 1. Suppose that  satisfies NST-condition (I) with T such that . Then,  converges weakly to a point in .
   4. Applications
In the past decade, extreme learning machine (ELM) [
39], a new learning algorithm for single-hidden layer feedforward networks (SLFNs), has been extensively studied in various research topics for machine learning and artificial intelligence such as face classification, image segmentation, regression, and data classification problems. ELM was theoretically proven to have extremely fast learning speed and good performance better than the gradient-based learning such as backpropagation in most of the cases. The target of this model is to find the parameter 
 that solves the following minimization problem, called ordinary least square (OLS),
      
      where 
 is the 
-norm defined by 
, 
 is the target of data, 
 is a weight which connects hidden layer and output layer, and 
 is the hidden layer output matrix. In general mathematical modeling, there are several methods to estimate the solution of (
16); in this case, the solution 
 obtained by 
, where 
 is the Moore–Penrose generalized inverse of 
. However, in a real situation, the number of unknown variable 
M is much more than the number of training data 
N, which causes the network to possibly lead to overfitting. On the other hand, the accuracy is low while the number of hidden nodes 
M is small. Thus, in order to improve (
16), several regularization methods were introduced. The classical two standard techniques for improving (
16) are subset selection and ridge regression (sometimes called Tikhonov regularization) [
40].
In this paper, we focus on the following problem, called least absolute shrinkage and selection operator (LASSO) [
41],
      
      where 
 is a regularization parameter. LASSO tries to retain the good features of both subset selection and ridge regression [
41]. After the regularization methods and the original ELM were introduced for improving performance of OLS, five years later, the regularized extreme learning machine [
42] was proposed and applied to solve regression problems. In general, the (
17) can be rewritten as minimization of 
, that is,
      
      where 
f is a smooth convex function with gradient having Lipschitz constant 
L and 
g is a convex smooth (or possible non-smooth) function. The solution of (
18) can be rewritten into 
 is a minimizer of 
 if and only if 
, where 
 is the gradient of 
f and 
 is a subdifferential of 
g by using Fermat’s rule (see [
35] for more details). In fixed point theory, Parikh et al. [
43] characterized (
18) as follows: 
 is a minimizer of 
 if and only if
      
      where 
 is the proximity operator of 
, 
 and 
 is defined by 
, 
 is the resolvent of 
 and 
I is an identity operator. The problem (
18) can be rewritten into a general problem, called a zero of sum of two operators problem, by finding 
 such that
      
      where 
 are two set-valued operators and 
. In this case, we assume that 
 is a maximal monotone operator and 
 is an 
L-Lipschitz operator. For convenience, (
19) also can be rewritten as:
      where 
. It is also known that 
T is nonexpansive if 
 when 
L is a Lipschitz constant of 
.
We are interested in applying our proposed method for solving a convex minimization problem and compared the convergence behavior of our proposed algorithm with the others and give some applications to solve classification problems. Our proposed method will be used to solve (
18). Over the past two decades, several algorithms have been introduced for solving the problem (
18). A simple and classical algorithm is the forward-backward algorithm (FBA), which was introduced by Lions and Mercier [
21].
The forward-backward algorithm (FBA) is defined by
      
      where 
, 
 and 
L is a Lipschitz constant of 
, 
, 
 and 
 is a sequence in 
 such that 
. A technique for improving speed and giving a better convergence behavior of the algorithms was introduced firstly by Polyak [
44] by adding an inertial step called inertial techniques. Since then, many authors have employed the inertial technique to accelerate their algorithms for various kinds of problems; see [
19,
20,
22,
23,
24,
25,
26]. The performance of FBA can be improved using an iterative method with the inertial steps described below.
A fast iterative shrinkage-thresholding algorithm (FISTA) [
25] is defined by
      
      where 
, 
, 
, 
 and 
 is the inertial step size. Beck and Teboulle [
25] solved the image recovery and proved the convergence rate using FISTA. The inertial step size 
 of the FISTA was firstly introduced by Nesterov [
45].
A fast iterative shrinkage-thresholding algorithm-Siteration (FISTA-S) [
27] is defined by
      
      where 
, 
, 
, 
  and 
. Bussaban et al. [
27] solved the image recovery and proved the weak convergence theorem using FISTA-S.
A new accelerated proximal gradient algorithm (nAGA) [
26] is defined by
      
      where 
, 
 with 
 and 
 are sequences in 
 and 
. The nAGA was introduced for proving a convergence theorem by Verma and Shukla [
26]. The nonsmooth convex minimization problem with sparsity, including regularizers, was solved using this method for the multitask learning framework.
Theorem 2.  Let H be a Hilbert space,  be maximal monotone operator and  be an L-Lipschitz operator. Let  and  such that . Define  and . Suppose that . Let  be a sequence in H defined by Algorithm 1. Then,  converges weakly to a point in .
 Proof.  Using Proposition 26.1(iv) (see [
35]), we have 
 and 
T are nonexpansive mappings such that 
 Then, the proof is completed by Theorem 1 and Lemma 5. □
 The convergence of Algorithm 2 is obtained by using our main result.
      
| Algorithm 2: (FBASA) A forward-backward accelerated S-algorithm | 
| 1:Initial. Take ,  are arbitary, ,    and 2:Step 1. Compute   and   by using
                   Then,  and go to Step 1.
 | 
Theorem 3.  For , f is a smooth convex function with a gradient having a Lipschitz constant L and g is a convex function. Let  be such that  converges to a and let  and  and let  be a sequence generated by Algorithm 2, where  and  are the same as in Algorithm 1. Then,
(i) , where  and 
(ii)  converges weakly to a point in Argmin .
 Proof.  We know that 
T and 
 are nonexpansive operators, and 
 for all 
n; see  [
35]. Then, 
 satisfies the NST-condition (I) with 
T by using Lemma 5. We get the desired result immediately from Theorem 1 by putting 
, the complete graph, on 
.    □
   5. Numerical Experiments
The classification problem is one of the most important problems in the convex minimization problem. We illustrate the process of reformulating the data classification problem in machine learning.
We first present a basic idea of extreme learning machines for data classification problem, and use our algorithm to find this problem through numerical experiments. Moreover, the performance of Algorithm 2, FISTA-S, FISTA, and nAGA are compared.
Extreme learning machine (ELM). Let 
R:= 
, 
, 
 be a training set of different samples 
N, where 
 is input data and 
 is a target. A standard SLFNs with activation function 
 (for instance sigmoid), and 
M hidden nodes can be rewritten as
      
      where 
 is the weight vector connecting the 
j-th the output node and hidden node, 
 is the weight vector connecting the 
j-th the input node and hidden node, and 
 is the threshold of the 
j-th hidden node. The objective of standard SLFNs is to estimate these 
N different samples with 
, that is, there exist 
, 
, 
 such that
      
We can derive a simple equation from the above 
N equations as follows:
A standard SLFN goal is to estimate 
, 
, and 
 to solve (
18), whereas an ELM goal is to find only 
 with 
 and 
 chosen at random.
In an experiment on classification problems, we employ the model (
17) to solve the convex minimization problem. We set 
 and 
. Next, we use the Iris dataset to classify iris plant types, and the Heart Disease UCI dataset to identify heart patients which are detailed as follows:
Iris dataset [
46]. This dataset has three classes of 50 examples, each of which represents a different variety of iris plant. The purpose is to identify each iris plant species based on the length of its sepals and petals.
Heart Disease UCI dataset [
47]. Although there are 76 attributes in the original dataset, all published experiments only use 14 of them. Data on patients with heart disease are provided in this dataset. We divide the data into two classes based on the predicted attributes.
All control parameters are set to the values shown in 
Table 1, 
, where 
 is a hidden layer output matrix of a training matrix, 
, and 
 is sigmoid. Each dataset is given a training set, as indicated in 
Table 2. We evaluated the output data’s accuracy by
      
From the results in 
Table 3, we conclude that the proposed learning algorithm under selection with the identical number of hidden nodes 
M has high performance in terms of the accuracy. The weight computed by Algorithm 2 converges faster to the optimal weight and performs accuracy better than those computed by FISTA-S, FISTA, and nAGA.