Next Article in Journal
Universal Stabilisation System for Control Object Motion along the Optimal Trajectory
Previous Article in Journal
L1 Adaptive Control for Marine Structures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

OPT-RNN-DBSVM: OPTimal Recurrent Neural Network and Density-Based Support Vector Machine

by
Karim El Moutaouakil
1,*,
Abdellatif El Ouissari
1,
Adrian Olaru
2,*,
Vasile Palade
3,* and
Mihaela Ciorei
3
1
Engineering Science Laboratory, Taza Multidisciplinary Faculty, Sidi Mohamed Ben Abdellah University, Fez 30000, Morocco
2
Centre for Computational Science and Mathematical Modelling, Coventry University, Priory Road, Coventry CV1 5FB, UK
3
Department of Robotics and Production System, University Politehnica of Bucharest, 060042 Bucharest, Romania
*
Authors to whom correspondence should be addressed.
Mathematics 2023, 11(16), 3555; https://doi.org/10.3390/math11163555
Submission received: 3 July 2023 / Revised: 31 July 2023 / Accepted: 14 August 2023 / Published: 17 August 2023
(This article belongs to the Section Engineering Mathematics)

Abstract

:
When implementing SVMs, two major problems are encountered: (a) the number of local minima of dual-SVM increases exponentially with the number of samples and (b) the computer storage memory required for a regular quadratic programming solver increases exponentially as the problem size expands. The Kernel-Adatron family of algorithms, gaining attention recently, has allowed us to handle very large classification and regression problems. However, these methods treat different types of samples (i.e., noise, border, and core) in the same manner, which makes these algorithms search in unpromising areas and increases the number of iterations as well. This paper introduces a hybrid method to overcome such shortcomings, called the Optimal Recurrent Neural Network and Density-Based Support Vector Machine (Opt-RNN-DBSVM). This method consists of four steps: (a) the characterization of different samples, (b) the elimination of samples with a low probability of being a support vector, (c) the construction of an appropriate recurrent neural network to solve the dual-DBSVM based on an original energy function, and (d) finding the solution to the system of differential equations that govern the dynamics of the RNN, using the Euler–Cauchy method involving an optimal time step. Density-based preprocessing reduces the number of local minima in the dual-SVM. The RNN’s recurring architecture avoids the need to explore recently visited areas. With the optimal time step, the search moves from the current vectors to the best neighboring support vectors. It is demonstrated that RNN-SVM converges to feasible support vectors and Opt-RNN-DBSVM has very low time complexity compared to the RNN-SVM with a constant time step and the Kernel-Adatron algorithm–SVM. Several classification performance measures are used to compare Opt-RNN-DBSVM with different classification methods and the results obtained show the good performance of the proposed method.

1. Introduction

Many classification methods have been proposed in the literature and in a vast array of applications, among them a popular approach called support vector machine based on quadratic programming (QP) [1,2,3]. The difficulty in the implementation of SVMs on massive datasets lies in the fact that the quantity of storage memory required for a regular QP solver increases by an exponential magnitude as the problem size expands.
This paper introduces a new type of SVM that implements a preprocessing filter and a recurrent neural network, called the Optimal Recurrent Neural Network and Density-Based Support Vector Machine (Opt-RNN-DBSVM).
SVM approaches are based on the existence of a linear separator, which can be obtained by transforming the data in a higher-dimensional space through appropriate kernel functions. Among all possible hyperplanes, the SVM searches for the one with the most confident separation margin for good generalization. This issue takes the form of a nonlinear constrained optimization problem that is usually handled using optimization methods. Thanks to the Kuhen–Tuker conditions [4], all these methods transform the primal mathematical model into the dual version and use optimization methods to find the support vectors on which the optimal margin is built. Unfortunately, the complexity in time and memory grows exponentially with the size of the datasets; in addition, the number of local minima grows too, which influences the location of the separation margin and the quality of the predictions.
A primary area of research in learning from empirical data through support vector machines (SVMs) and addressing classification and regression issues is the development of incremental learning schemes when the size of the training dataset is massive [5]. Out of many possible candidates, avoiding the usage of regular quadratic programming (QP) solvers, the two learning methods gaining attention recently are iterative single-data algorithms (ISDA) and sequential minimal optimization (SMO) [6,7,8,9]. ISDAs operate from a single sample at hand (pattern-based learning) towards the best fit solution. The Kernel-Adatron (KA) is the primary ISDA for SVMs, using kernel functions to map data to the high-dimensional character space of SVMs [10] and conducting Adatron [11] processing in the character space. Platt’s SMO algorithm is an outlier among the so-called decomposition approaches introduced in [12,13], operating on a two-sample workset of samples at a time. Because the decision for the two-point workset may be determined analytically, the SMO does not require the involvement of standard QP solvers. Due to it being analytically driven, the SMO has been especially popular and is the most commonly utilized, analyzed, and further developed approach. Meanwhile, KA, while yielding somewhat comparable performance (accuracy and computational time) in resolving classification issues, has not gained as much traction. The reason for this is twofold. First, until recently [14], KA appeared to be restricted to classification tasks; second, it lacks the qualities of a robust theoretical framework. KA employs a gradient ascent procedure, and this fact also may have caused some researchers to be suspicious of the challenges posed by gradient ascent techniques in the presence of a perhaps ill-conditioned core array. In [15], lacking bias parameter b, the authors derive and demonstrate the equality of two apparently dissimilar ISDAs, namely a KA approach and an unbiased variant of the SMO training scheme [9], when constructing SVMs possessing positive definite kernels. The equivalence is applicable to both classification and regression tasks and gives additional insights into these apparently dissimilar methods of learning. Despite the richness of the toolbox set up to solve the quadratic programs from SVMs, and with the large amount of data generated by social networks, medical and agricultural fields, etc., the amount of computer memory required for a QP solver from the dual-SVM grows hyper-exponentially, and additional methods implementing different techniques and strategies are more than necessary.
Classical algorithms, namely ISDAs and SMO, do not distinguish between different types of samples (noise, border, and core), which causes searches in unpromising areas. In this work, we introduce a hybrid method to overcome these shortcomings, namely the Optimal Recurrent Neural Network Density-Based Support Vector Machine (Opt-RNN-DBSVM). This method proceeds in four steps: (a) the characterization of different samples based on the density of the datasets (noise, core, and border), (b) the elimination of samples with a low probability of being a support vector, namely core samples that are very far from the borders of different components of different classes, (c) the construction of an appropriate recurrent neural network based on an original energy function, ensuring a balance between the dual-SVM components (constraints and objective function) and ensuring the feasibility of the network equilibrium points [16,17], and (d) the solution of the system of differential equations, managing the dynamics of the RNN, using the Euler–Cauchy method involving an optimal time step. Density-based preprocessing reduces the number of local minima in the dual-SVM. The RNN’s recurrent architecture avoids the need to examine previously visited areas; this behavior is similar to a taboo search, which prohibits certain moves for a few iterations [18]. In addition, the optimal time step of the Euler–Cauchy algorithm speeds up the search for an optimal decision margin. On one hand, two main interesting fundamental results are demonstrated: the convergence of the RNN-SVM to feasible solutions, and the fact that Opt-RNN-DBSVM has very low time complexity compared to Const-RNN-SVM, SMO-SVM, ISDA-SVM, and L1QP-SVM. On the other hand, several experimental studies are conducted based on well-known datasets. Based on several performance measures (accuracy, F1-score, precision, recall), Opt-RNN-DBSVM outperforms recurrent neural network–SVM with a constant time step, the Kernel-Adatron algorithm–SVM family, and well-known non-kernel models. In fact, Opt-RNN-DBSVM improves the accuracy, the F1-score, the precision, and the recall. Moreover, the proposed method requires a very small number of support vectors.
The rest of this paper is organized as follows. Section 2 presents the flowchart of the proposed method. Section 3 gives the outline of our recent SVM version called Density-Based Support Vector Machine. Section 4 presents, in detail, the construction of the recurrent neural network associated with the dual-SVM and the Euler–Cauchy algorithm that implements an optimal time step. Section 5 gives some experimental results. Section 6 presents some conclusions and future extensions of Opt-RNN-DBSVM.

2. The Architecture of the Proposed Method

The Kernel-Adatron (KA) algorithms, namely ISDAs and SMO, treat different types of samples (noise, border, and core) in the same manner (all samples are considered for several iterations and supposed to be a support candidate with uniform probability), which causes searches in unpromising areas and increases the number of iterations. In this work, we introduce an efficient method to overcome these shortcomings, namely Optimal Recurrent Neural Network Density-Based Support Vector Machine (Opt-RNN-DBSVM). This method proceeds in four steps (see Figure 1).
(1)
The characterization of different samples based on the density of the datasets (noise, core, and border); to this end, two parameters are introduced: the size of the neighborhood of the current sample and the threshold that permits such categorization.
(2)
The elimination of samples with a low probability of being a support vector, namely core samples that are very far from the borders of different components of different classes and the noise samples that contain false information about the phenomenon under study. In our previews work [19], we demonstrated that such suppression does not influence the performance of the classifiers.
(3)
The construction of an appropriate recurrent neural network based on an original energy function, allowing a balance between the dual-SVM components (constraints and objective function) and ensuring the feasibility of the network equilibrium points [16,17].
(4)
Solving the system of differential equations, managing the dynamics of the RNN, using the Euler–Cauchy method involving an optimal time step. In this regard, the equation of the future state of each neuron, of the proposed RNN, is introduced into the energy function, which leads to a one-dimension quadratic optimization problem whose solution represents the optimal step of the Euler–Cauchy process that ensures the maximum decrease in the energy function [20]. The components of the produced equilibrium point represent the membership degrees of different samples to the support vector dataset.
Figure 1. Opt_RNN_DBSVM diagram [21,22,23,24].
Figure 1. Opt_RNN_DBSVM diagram [21,22,23,24].
Mathematics 11 03555 g001

3. Density-Based Support Vector Machine

In the following, let us denote by B D the set of N samples x 1 , , x N labeled, respectively, by y 1 , , y N , distributed via K class C 1 , , C K . In our case, K=2 and y i { 1 , + 1 } .

3.1. Classical Support Vector Machine

The hyperplane that the SVM searches must satisfy the equation w . x i + b = 0 , where w is the weight that defines this SVM separator that satisfies the constraint family given by i = 1 , , N   y i ( x i . w + b ) 1 . To ensure the maximum margin, we need to maximize 2 w . As the patterns are not linearly separable, the kernel function K is introduced (which satisfies the Mercer conditions [25]) to transform the data into an appropriate space.
By introducing the Lagrange relaxation and using the Kuhn–Tuker conditions, we obtain a quadratic optimization problem with a single linear constraint that must be solved to determine the support vectors [26].
To address the problem of saturated constraints, some researchers have added the notion of a soft margin [27]. They employ N supplementary slack variables ξ i 0 at every constraint y i ( x i . w + b ) 1 . The sum of the relaxed variables is weighted and included in the cost function:
M i n 1 2 w 2 + C i = 1 N ξ i Subject to : y i ( ϕ ( x i ) . w + b ) 1 ξ i ξ i 0 , i = 1 , , N
Here, ϕ represents the transformation function derived from the function kernel K. The following dual problem is obtained:
M a x i = 1 N α i 1 2 i = 1 N j = 1 N α i α j y i y j K ( x i , x j ) Subject to : i = 1 N α i y i = 0 0 α i C , i = 1 , , N
Several methods can be used to solve this optimization problem: gradient methods, linearization methods, the Frank–Wolf method, the generation column method, the Newton method applied to the Kuhn system, sub-gradient methods, the Dantzig algorithm, the Uzawa algorithm [4], recurrent neural networks [28], hill climbing, simulated annealing, search by calf, A*, genetic algorithms [29], ant colony, and the particle swarm optimization method [30], etc.
Several versions of SVMs are proposed in the literature, e.g., the least squares support vector machine classifiers (LS-SVM) introduced in [21], generalized support vector machine (G-SVM) [22], fuzzy support vector machine [31,32], one-class support vector machine (OC-SVM) [26,33], total support vector machine (T-SVM) [34], weighted support vector machine (W-SVM) [35], granular support vector machine (G-SVM) [36], smooth support vector machine (S-SVM) [37], proximity support vector machine classifiers (P-SVM) [23], multisurface proximal support vector machine classification via generalized eigenvalues (GEP-SVM) [24], and twin support vector machine (T-SVM) [38], etc.

3.2. Density-Based Support Vector Machine (DBSVM)

In this section, a short description of the DBVSM method is given. Let us introduce a real number r > 0 and the integer m p > 0 , called min-points, and three types of samples are defined: noise points, border points, and interior points (or core points). It is possible to show that the interior points do not change their nature even when they are projected into another space by the kernel functions. Furthermore, such points cannot be selected as support vectors [19].
Definition 1.
Let S R n . A point a R n is said to be an interior point (or core point) of S if there exists an r > 0 such that B ( a , r ) S . The set of all interior points of S is denoted by i n t ( S ) or S o .
Definition 2.
For a given dataset  B D , a non-negative real r, and an integer  m p , there exist three types of samples.
1. 
A sample x is called a C i -noise point ( N P i ) if | C i B ( x , r ) | < m p .
2. 
A sample x is called a C i -core point ( C P i ) if | C i B ( x , r ) | m p and x e n v o l ( C i ) o
3. 
A sample x is called a C i -border point ( B P i ) if | C i B ( x , r ) | < m p and there exists a C i -core point y such as x B ( y , r ) .
Let K be a kernel function allowing us to move from the space R n to the space R N using the transformation ϕ (here, n < N ).
Lemma 1
([19]). If a is a C i -core point for a given ϵ and min-points (mp), then ϕ ( a ) is also a C i -core point with an appropriate ϵ and the same min-points (mp).
Theorem 1
([19]). A core point is either a noise point or a border point.
Proposition 1
([19]). Let ϵ > 0 be a real number. The core point set corePoints (minPoints) is a decreasing function for the inclusion operator.
Let { α 1 , , α n } = B M C M N M be the set of the Lagrange multipliers, where B M , C M , and N M are the Lagrange multipliers of the border samples, core samples, and noise samples, respectively.
As the elements of N M and C M cannot be selected to be support vectors, the reduced dual problem is given by
( R D ) M a x α i B M α i 1 2 α i B M α j B M α i α j y i y j K ( x i x j ) Subject to : α i B M α i y i = 0 0 α i C α i B M i = 1 , , N
In this work, as the RD problem is quadratic with linear constraints, in order to solve this, we use a continuous Hopfield network by proposing an original energy function in the following section [39].

4. Recurrent Neural Network to Find Optimal Support Vectors

The continuous Hopfield network consists of interconnected neurons with a smooth sigmoid activation function (usually a hyperbolic tangent). The differential equation that governs the dynamics of the CHN is
d u d t = u τ + W . α + I
where u, α , W, and I are, respectively, the vectors of neuron states, the outputs, the weight matrix, and the biases. For a CHN of N neurons, the state u i and output α i of the neuron i are given by the equation α i = t a n h ( u i ) = f ( u i ) .
For an initial vector state u 0 R N , a vector u e R N is called an equilibrium point of the system 1, if and only if t e R + , such as t t e   u ( t ) = u e . It should be noted that if the energy function (or Layapunov function) exists, the equilibrium point exists as well. Hopfield proved that the symmetry of the matrix of the weight is a sufficient condition for the existence of the Lyapunov function [40].

4.1. Continuous Hopfield Network Based on Original Energy Function

To solve the obtained dual problem via a recurrent neural network [39,41,42], we propose the following energy function:
E ( α 1 , , α N ) = β 0 α i D α i β 0 2 α i B M α j B M α i α j y i y j K ( x i x j ) + β 1 α i B D α i y i + β 2 2 ( α i B D α i y i ) 2
To determine the vector of the neurons’ biases, we calculate the partial derivatives of E:
E α i ( α 1 , , α N ) = β 0 β 0 α j B M α j y i y j K ( x i x j ) + β 1 y i + β 2 y i α j B D α j y j
The components of the bias vector are given by
I i = E α i ( 0 ) = β 0 β 1 y i i = 1 , N
To determine the connection weights W between each neuron pair, the second partial derivative of E is calculated: 2 E α j α i ( α 1 , , α N ) = β 0 y i y j K ( x i , x j ) + β 2 y i y j .
The components of the weight W matrix are given by W i , j = 2 E α j α i ( 0 ) = β 0 y i y j K ( x i , x j ) β 2 y i y j .
To calculate the equilibrium point of the proposed recurrent neural network, we use the Euler–Cauchy iterative method:
(1)
Initialization: α 1 0 , , α N 0 and the step ρ 0 are randomly chosen;
(2)
Given α 1 t , , α N t and the step ρ t , the step ρ t + 1 is chosen such that E t + 1 is the maximum and u 1 , , u N are calculated using i = 1 , , N , u i t + 1 = u i t + ρ t + 1 ( j = 1 , , N W i , j α j t + I i )
Then, γ 1 , , γ N are calculated using the activation function f: γ i = f ( u i t + 1 ) .
Then, the α 1 t + 1 , , α N t + 1 are given by α t + 1 = P ( γ ) , where P is the projection operator on the set { α R N / i = 1 N α i y i = 0 } .
(3)
Return to (1) until α t + 1 α t ϵ , where 0 < ϵ .
Figure 2 shows the connection weights W between each pair of neurons.
Theorem 2.
If i = j , P i , j = 1 y i 2 N , else P i , j = y i y j N , where S = i = 1 , , N y i 2 .
Proof of Theorem 2.
We have P = I A t . ( A . A t ) 1 . A and A = [ y 1 , , y N ] .
Then, ( A . A t ) 1 = ( [ y 1 , , y N ] . [ y 1 , , y N ] t ) 1 = 1 N because N = i = 1 , , N y i 2 .
Thus A t . ( A . A t ) 1 . A = 1 N . [ y 1 , , y N ] t . [ y 1 , , y N ] .
Finally, for i = j , P i , j = 1 y i 2 N , and for i j , P i , j = y i . y j N . □
Concerning the constraint family satisfaction 0 α i C , i = 1 , , N , the activation function is used:
f ( x ) = C . t a n h ( x τ ) = C . e x e x e x + e x ,
where τ is supposed to be a very large positive real number, which ensures that x , C f ( x ) C .
Let us consider a kernel function K such that K ( x , x ) = C 0 .
Theorem 3.
A continuous Hopfield network has an equilibrium point if W i , i = 0 and W i , j = W j , i .
Theorem 4.
If C = β 2 β 0 , then CHN-SVM has an equilibrium point.
Proof of Theorem 4.
We have W i , j = ( β 0 K i , j β 2 ) y i y j and i a n d j W i , j = W j , i because K is symmetric.
On the other hand,
W i , i = β 0 y i K ( x i , x i ) β 2 y i 2 = β 0 × β 2 β 0 β 2 = 0
Then, CHN-SVM has an equilibrium point. □

4.2. Continuous Hopfield Network with Optimal Time Step

In this section, we chose, mathematically, the optimal time size in each iteration of the Euler–Cauchy method to solve the dynamical equation of the recurrent neural network proposed in this paper. At the end of the k t h iteration, we know α k and let s k be the next step size, which permits us to calculate α k + 1 using the formula
α k + 1 = α k + s k E ( α k )
and s k must be chosen such as E ( α k + 1 ) E ( α k ) at the maximum.
As the activation function of the proposed neural network is the t a n h , then d α i d t = 2 τ α i ( 1 α i ) E ( α ) .
The matrix form of the energy function is
E ( α ) = β 0 U t α β 0 2 ( α ) t T α + β 1 y t α + β 2 2 ( y t α ) 2
where U = ( 1 , , 1 ) t I R N , α = ( 1 , , 1 ) t I R N , and T i , j = y i y j K ( x i , x j ) for all i and j.
At the k t h iteration, the state α k is known, and α k + 1 is calculated by
α k + 1 = α k + s k d α k d t
where s k is the actual time step that must be optimal. To this end, α k + 1 is substituted by α k + s k d α k d t in E ( α k + 1 ) :
e ( s k ) = E ( α k + 1 ) = 0.5 A k s k 2 + B k s k + C k
where A k = β 2 y t d α k d t β 0 ( d α k d t ) t T d α k d t ,
B k = β 0 U t d α k d t β 0 ( α k ) t T d α k d t + β 1 y t d α k d t + β 2 ( y t α k ) ( y t d α k d t ) ,
C k = β 0 U t α k 0.5 β 0 ( α k ) t T α k + β 1 y t α k + 0.5 β 2 ( y t α k ) 2 .
Thus, the best time step is the minimum of e ( s k ) . Figure 3a–c gives different cases.

4.3. Opt-RNN-DBSVM Algorithm

In this section, the procedures described in Section 3.2, Section 4.1 and Section 4.2 are summarized into Algorithm 1.
The inputs of Algorithm 1 are the radius r (the size of the neighborhood of the current sample), the minimum of samples m p into B ( C u r r e n t _ s a m p l e , r ) (which determines the type of this sample), the three Lagrangian parameters β 0 , β 1 ,   and   β 2 (which allow a compromise between the dual components), the bound C of the SVM [19], and the number of iterations (which represents artificial convergence).
Algorithm 1 processes in three macro-steps: data preprocessing, RNN-SVM construction, and RNN-SVM equilibrium point estimation. The input of the first phase is the initial dataset with labeled samples. Based on r and m p , the algorithm determines the types of different samples based on the value of the current sample neighborhood’s discrete size. The output of this phase is a reduced sub-dataset (the initial dataset minus the core samples). The inputs of the second phase are the reduced dataset, the Lagrangian parameters β 0 , β 1 , β 2 , and the SVM bound C. Based on the energy function built in Section 4.1 and on the first and second derivatives, the architecture of CHN-SVM is constructed; the bias and connection weights, which represent the output of this phase, are calculated. These later represent the input of the third phase and the Euler–Cauchy algorithm is used to calculate the degree of membership of different samples in the set of support vectors; to ensure an optimal decrease in the energy function, at each iteration, an optimal step is determined by solving a quadratic one-dimension optimization problem; see Section 4.2. At convergence, the proposed algorithm produces the support vectors based on which Opt-RNN-DBSVM can predict the class of unseen samples.
Algorithm 1 Opt-RNN-DBSVM
  • Require:  m p , r , β 0 , β 1 , β 2 , C , I T E R
  • Ensure:  O p t i m a l s u p p o r t v e c t o r s
  • % Density based preprocessing:
  • C P ; B P ; N P ;
  • for all s in DS do
  •   if  | B ( s , r ) D S | > m p  then
  •     C P C P { s }
  •   else if  m p > | B ( s , r ) D S | > m p 2  then
  •     B P B P { s }
  •   else
  •     N P N P { s }
  •   end if
  • end for
  • R D S D S \ { N P C P }
  • % RNN-Building:
  • I β 0 + β 1 Y
  • W β 0 K β 2 Y 2
  • % Optimal Euler–Cauchy to RNN stability:
  • s t e p 0 = r a n d ( 0 , 1 )
  • α 0 r a n d ( 0 , 1 , | R D S | )
  • for k = 1, …, ITER do
  •    D k d α k d t
  •    A k β 2 y t D k β 0 D k t T D k
  •    B k β 0 U t D k β 0 ( α k ) t T D k + β 1 y t D k + β 2 ( y t α k ) ( y t D k )
  •    C k β 0 U t α k 0.5 β 0 ( α k ) t T D k + β 1 y t α k + 0.5 β 2 ( y t α k ) 2
  •    s k * a r g m i n s [ 0 , 1 ] e ( s )
  •    α k + 1 α k + s k * D k
  • end for
Proposition 2.
If N, r, and I T E R represent, respectively, the size of a labeled dataset B D , the number of remaining samples (output of the preprocessing phase), and the number of iterations, then the complexity of Algorithm 1 is O ( r 2 I T E R ) .
Proof. 
First, in the preprocessing phase, we calculate, for each sample ( x i , x j ) , the distance d ( x i , x j ) and execute N comparisons to determine the type of each sample; thus, the complexity of this phase is O ( N 2 ) .
Second, during I T E R iterations, the activation of each neuron is updated using the activation of all the other neurons to solve the reduced dual-SVM; thus, the third phase has complexity of O ( r 2 × I T E R ) .
Finally, the complexity of Algorithm 1 is O ( N 2 ) + O ( r 2 × I T E R ) . Let us denote Const-RNN-SVM as the SVM version that implements a recurrent neural network based on a constant time step. Following the same reasoning, the complexity of Const-RNN-SVM is O ( N 2 × I T E R ) .
Notes: As the Kernel-Adatron algorithm (KA) is the kernel version of SMO and ISDA, and KA implements two embedded N-loops in each iteration, then the complexity of SMO and ISDA is of [10]. In addition, it is considered that L1QP-SVM implements the numerical linear algebra Gauss–Seidel method [43], which implements two embedded N-loops in each iteration, and thus the complexity of SMO and ISDA is of O ( N 2 × I T E R ) . □
For a very large and high-density labeled dataset, we have r < < N ; thus,
I T E R o u r < < I T E R C H N S V M and I T E R o u r < < I T E R L 1 Q P S V M   I T E R o u r < < I T E R S M O S V M and I T E R o u r < < I T E R I S D A S V M .
Thus, c o m p l e x i t y ( o u r ) c o m p l e x i t y ( C H N S V M ) and c o m p l e x i t y ( o u r ) c o m p l e x i t y ( L 1 Q P S V M )   c o m p l e x i t y ( o u r ) c o m p l e x i t y ( S M O S V M ) , and c o m p l e x i t y ( o u r ) c o m p l e x i t y ( I S D A S V M ) .
Firstly, preprocessing the database reduces the number of local minima in the dual-SVM. Secondly, this reduction enables real-time decision making in big data problems. Finally, the optimal time step of the Euler–Cauchy algorithm speeds up the search for an optimal decision margin.

5. Experimentation

In this section, Opt-RNN-DBSVM is compared to several classifiers, Const-RNN-SVM (RNN-SVM using a constant Euler–Cauchy time step), SMO-SVM, ISDA-SVM, L1QP-SVM, and some non-kernel classifiers (Naive Bayes (NB), MLP, KNN, AdaBoostM1 (ABM1), Nearest Center Classifier (NCC), Decision Tree (DT), SGD Classifier (SGDC)). The classifiers were tested on several datasets: IRIS, ABALONE, WINE, ECOLI, BALANCE, LIVER, SPECT, SEED, and PIMA (collected from the University of California at Irvine (UCI) repository [44]). The performance measures used in this study are the accuracy, F1-score, precision, and recall.

5.1. Opt-RNN-DBSVM vs. Const-CHN-SVM

In this subsection, Opt-RNN-DBSVM is compared to Const-RNN-SVM by considering different values of the Euler–Cauchy time step s ∈ {0.1, 0.2, …, 0.9}. Table 1 and Table 2 show the different values of accuracy, F1-score, precision, and recall on the considered datasets. These results show the superiority of Opt-RNN-DBSVM over Const-CHN-SVM (stepSTEP = {0.1, 0.2, …, 0.9}). In fact, this superiority is quantified as follows:
3.43 % = max ( s , d ) S T E P × D A T A ( a c c u r a c y ( O p t R N N D B S V M ( s ) ) a c c u r a c y ( c o n s t R N N S V M ) )
2.31 % = max ( s , d ) S T E P × D A T A ( F 1 S c o r e ( O p t R N N D B S V M ( s ) ) F 1 S c o r e ( c o n s t R N N S V M ) )
7.52 % = max ( s , d ) S T E P × D A T A ( p r e c i s i o n ( O p t R N N D B S V M ( s ) ) p r e c i s i o n ( c o n s t R N N S V M ) )
6.5 % = max ( s , d ) S T E P × D A T A ( r e c a l l ( O p t R N N D B S V M ( s ) ) r e c a l l ( c o n s t R N N S V M ) )
where D A T A is the set of different considered data. These results are not unexpected, because Opt-RNN-SVM ensures an optimal decrease in the CHN energy function at each step. This superiority is normal, since a single time step of the Euler–Cauchy algorithm does not explore all the regions of the solution space of the dual problem associated with the SVM, and it also causes premature convergence to a poor local solution. On the other hand, the variable optimal time step of this algorithm allowed a higher-order decay in the energy function of the RNN associated with the dual-SVM.
Figure 4 and Figure A1, Figure A2 and Figure A3 give the series of optimal steps generated by Opt-RNN-DBSVM during iterations for different datasets. It is noted that all the optimal steps are taken from the interval [ 0.3 ; 0.4 ] , which explains why a single constant time step of the Euler–Cauchy algorithm can never produce satisfactory support vectors compared to Opt-RNN-DBSVM. However, this simulation provides an optimal domain for those using a CHN based on a constant time step instead of taking a random time step from [ 0 ; 1 ] .

5.2. Opt-RNN-DBSVM vs. Classical Optimizer–SVM

In this section, we give the performance of different Classical Optimizer–SVM models (L1QP-SVM, ISDA-SVM, and SMO-SVM) applied to several datasets and compare the number of support vectors obtained by the different Classical Optimizer–SVM models and Opt-RNN-SVM. Table 3 gives the values of accuracy, F1-score, precision, and recall for Classical Optimizer–SVM on different datasets. The results show the superiority of Opt-RNN-DBSVM. Indeed, when considering each of the performance measures, the proposed method achieves remarkable improvements of 30% for accuracy and F1-score, 50% for precision, and 40% for recall.
Figure 5, Figure 6, Figure 7 and Figure 8 illustrate, respectively, the support vectors obtained using L1QP-SVM, L1QP-SVM, SMO-SVM, and Opt-RNN-SVM applied to the IRIS data. We note that (a) ISDA considers more than 96 % as support vectors, which is an exaggeration; (b) L1QP and SMO use a reasonable number of samples as support vectors, but most of them are duplicated; and (c) thanks to the preprocessing, Opt-RNN can reduce the number of support vectors by more than 32 % , compared to SMO and L1QP, which allows it to overcome the over-learning phenomenon encountered with SMO and L1QP. In this sense, this reasonable number of support vectors used by Opt-RNN-DBSVM will speed up the online predictions of systems that implement these support vectors, especially with regard to to sentiment analysis, which manipulates very long texts.
To analyze these results further and to evaluate the performance of multiple kernel classifiers, regardless of the data type, we perform the Friedman test to verify the statistical significance of the proposed method compared to other methods with respect to the derived mean rankings [11]. The null hypothesis is given by H 0 k e r , i.e., “The kernel classifiers Opt-RNN-DB, SMO, ISDA, and L1QP perform similarly in mean rankings without a significant difference”. The considered degree of freedom of the Friedman test is 3 (number of kernel classifiers −1). The significance level is 0.05 and the considered confidence interval is 95%. Three performance measures are considered (accuracy, F1-score, precision).
Considering the accuracy measure, the average rank of the four kernel methods is given in brackets: Opt-RNN-DB(4), SMO(2.06), and L1QP(2.06), and ISDA(1.89). Opt-RNN-DB has the highest ranking, followed by SMO and L1QP. Considering the F1-score, the average rank of the four kernel methods is given in brackets: Opt-RNN-DB(4), ISDA(2.11), SMO(1.94), and L1QP(1.94). Opt-RNN-DB has the highest ranking, followed by ISDA. Considering the precision measure, the average rank of the four kernel methods is given in brackets: Opt-RNN-DB(4), ISDA(2.11), SMO(1.94), and L1QP(1.94). Opt-RNN-DB has the highest ranking, followed by ISDA.
Table 4 gives the results of the Friedman test on the kernel SVM classifiers (SOM, L1QP, ISDA) considering different performance measures. The null hypothesis H 0 k e r is rejected for all these kernel classifiers at a significance level of α = 0.05, indicating that the proposed hybrid classifier outperforms all other kernel classifiers. In this regard, the performance of ISDA-SVM is the closest to that of the Opt-RNN-DBSVM classifier.

5.3. Opt-RNN-DBSVM vs. Non-Kernel Classifiers

In this section, we compare Opt-RNN-DBSVM to several non-kernel classifiers, namely Naive Bayes [45], MLP [46], KNN [47], AdaBoostM1 [48], Decision Tree [49], SGD Classifier [50], Nearest Centroid Classifier [50], and Classical SVM [51].
Table A1, Table A2, Table A3, Table A4, Table A5 give the values of the measures accuracy, F1-score, precision, and recall for the considered datasets. Considering each of these performance measures, Opt-RNN-DBSVM permits remarkable improvements and clearly outperforms the other methods. The large number of classifiers and datasets makes it difficult to demonstrate this superiority without employing statistical methods.
To analyze these results further and to evaluate the performance of non-kernel classifiers (Naive Bayes (NB), MLP, KNN, AdaBoostM1 (ABM1), Nearest Centroid Classifier (NCC), Decision Tree (DT), SGD Classifier (SGDC)) compared to Opt-RNN DBSVM, regardless of the data type, we perform the Friedman test to verify the statistical significance of the proposed method compared to other methods with respect to the derived mean rankings. Three performance measures are considered (accuracy, F1-score, precision).
The null hypothesis is given by H 0 n k e r , i.e., “The classifiers NB, MLP, KNN, ABM1, NCC, DT, SGDC, Opt-RNN-DBSVM perform similarly in mean rankings without a significant difference”.
The considered degree of freedom of the Friedman test is 7 (number of non-kernel classifiers −1). The significance level is 0.05 and the considered confidence interval is 95%.
Considering the accuracy measure, the average rank of the four kernel methods is given in brackets: Opt-RNN-DBSVM(7.80), ABM1(5.2), KNN(5.2), NB(4.4), NCC(3.95), SGDC(3.55), DT(3.5), and MLP(2.4). Opt-RNN-DB has the highest ranking, followed by ABM1(5.2) and KNN(5.2). Considering the F1-score, the average rank of the four kernel methods is given in brackets: Opt-RNN-DBSVM(7.1), ABM1(5.2), KNN(5.05), NB(4.8), SGDC(4.6), NCC(4.05), DT(3.1), and MLP(2.1). Opt-RNN-DB has the highest ranking, followed by ABM1 and KNN. Considering the precision measure, the average rank of the four kernel methods is given in brackets: Opt-RNN-DBSVM(7.1), NCC(5.3), ABM1(5.15), KNN(4.9), NB(4.25), SGDC(3.95), DT(3.25), and MLP(2.1). Opt-RNN-DB has the highest ranking, followed by NCC and ABM1. Table 5 gives the results of the Friedman test on the non-kernel classifiers (NB, MLP, KNN, ABM1, NCC, DT, and SGDC) considering three performance measures (accuracy, F1-score, and precision). The null hypothesis H 0 n k e r is rejected for all these classifiers at a significance level of α = 0.05, indicating that the proposed hybrid classifier outperforms all other non-kernel classifiers. In this regard, the performance of ABM1 is the closest to that of the Opt-RNN-DBSVM classifier.
Additional comparison studies were performed on the PIMA and Germany Diabetes datasets and the ROC curves were used to calculate the AUC for the best performance obtained from each non-kernel classifier. Figure A4 and Figure A6 show the comparison of the ROC curves of the classifiers DT, KNN, MLP, NB, etc., and the Opt-RNN-DBSVM method, evaluated on the PIMA dataset. We point out that Opt-RNN-DBSVM quickly converges to the best results and obtains more true positives and a smaller number of false positives compared to several other classification methods.
More comparisons are given in Appendix B; Figure A5 and Figure A7 show the comparison of the ROC curves of the classical SVM and Opt-RNN-DBSVM methods, evaluated on the Germany Diabetes dataset. More specifically, considering the performance measures “false positive rate” and “true positive rate”, predictions based on support vectors produced by Opt-RNN-DBSVM dominate predictions based on support vectors produced by other non-kernel classifiers.

6. Conclusions

The main challenges of SVM implementation are the number of local minima and the amount of computer memory required to solve the dual-SVM, which increase exponentially with respect to the size of the dataset. The Kernel-Adatron family of algorithms, ISDA and SMO, has handled very large classification and regression problems. However, these methods treat noise, boundary, and kernel samples in the same way, resulting in a blind search in unpromising areas. In this paper, we have introduced a hybrid approach to deal with these drawbacks, namely Optimal Recurrent Neural Network and Density-Based Support Vector Machine (Opt-RNN-DBSVM), which performs in six phases: the characterization of different samples, the elimination of samples having a weak probability of being support vectors, building an appropriate recurrent neural network based on an original energy function, and solving the differential equation system governing the RNN dynamics, using the Euler–Cauchy method implementing an optimal time step. Data preprocessing reduces the number of local minima in the dual-SVM; this reduction enables real-time decision making in big data problems. The RNN’s recurring architecture avoids the need to explore recently visited areas; this is an implicit tabu search. With the optimal time step, the search moves from the current vectors to the best neighboring support vectors. On one hand, two main, interesting fundamental results were demonstrated: the convergence of RNN-SVM to feasible solutions and the fact that Opt-RNN-DBSVM has very low time complexity compared to Const-RNN-SVM, SMO-SVM, ISDA-SVM, and L1QP-SVM. On the other hand, several experimental studies were conducted based on well-known datasets (IRIS, ABALONE, WINE, ECOLI, BALANCE, LIVER, SPECT, SEED, PIMA). Based on popular performance measures (accuracy, F1-score, precision, recall), Opt-RNN-DBSVM outperformed Const-RNN-SVM, KA-SVM, and some non-kernel models (cited in Table A1). In fact, Opt-RNN-DBSVM improved the accuracy by up to 3.43%, F1-score by up to 2.31%, precision by up to 7.52%, and recall by up to 6.5%. In addition, compared to SMO-SVM, ISDA-SVM, and L1QP-SVM, Opt-RNN-DBSVM provides a reduction in the number of support vectors by up to 32%, which permits us to save memory for large applications that implement several machine learning models. The main problem encountered in the implementation of Opt-RNN-DBSVM is the determination of the Lagrange parameters involved in the SVM energy function. In this sense, a genetic strategy will be introduced to determine these parameters considering each dataset. In future work, extensions of this method may include combining Opt-RNN-DBSVM with big data technologies to accelerate classification tasks on big data and introducing hybrid versions based on Opt-RNN, deep learning, and fuzzy-SVM.

Author Contributions

Conceptualization, K.E.M.; Validation, A.O. and V.P.; Investigation, M.C.; Data curation, M.C.; Writing—original draft, A.E.O.; Visualization, M.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of National Education, Professional Training, Higher Education and Scientific Research and the Digital Development Agency (DDA) and CNRST of Morocco (No. Alkhawarizmi/2020/23).

Data Availability Statement

Data will only be shared upon request.

Conflicts of Interest

The authors declare that they have no conflict of interest.

Appendix A. Optimal Steps

Figure A1. (a) ABALONE dataset, (b) PIMA dataset, (c) WINE dataset.
Figure A1. (a) ABALONE dataset, (b) PIMA dataset, (c) WINE dataset.
Mathematics 11 03555 g0a1aMathematics 11 03555 g0a1b
Figure A2. (a) SEED dataset, (b) Germany Diabetes dataset, (c) BALANCE dataset.
Figure A2. (a) SEED dataset, (b) Germany Diabetes dataset, (c) BALANCE dataset.
Mathematics 11 03555 g0a2aMathematics 11 03555 g0a2b
Figure A3. (a) SPECT dataset, (b) ECOLI dataset, (c) LIVER dataset.
Figure A3. (a) SPECT dataset, (b) ECOLI dataset, (c) LIVER dataset.
Mathematics 11 03555 g0a3aMathematics 11 03555 g0a3b

Appendix B. RUC Curves

Figure A4. RUC curve for the different classification methods applied to PIMA Diabetes dataset.
Figure A4. RUC curve for the different classification methods applied to PIMA Diabetes dataset.
Mathematics 11 03555 g0a4
Figure A5. Opt-RNN-DBSVM vs. SVM RUC curve applied to PIMA Diabetes dataset.
Figure A5. Opt-RNN-DBSVM vs. SVM RUC curve applied to PIMA Diabetes dataset.
Mathematics 11 03555 g0a5
Figure A6. RUC curve for the different classification methods applied to Germany Diabetes dataset.
Figure A6. RUC curve for the different classification methods applied to Germany Diabetes dataset.
Mathematics 11 03555 g0a6
Figure A7. Opt-RNN-DBSVM vs. SVM RUC curve applied to Germany Diabetes dataset.
Figure A7. Opt-RNN-DBSVM vs. SVM RUC curve applied to Germany Diabetes dataset.
Mathematics 11 03555 g0a7

Appendix C. Opt-RNN-DBSVM and Non-Kernel Classifiers

Table A1. Comparison between Opt-RNN-DBSVM and different classification methods on the IRIS and ABALONE datasets.
Table A1. Comparison between Opt-RNN-DBSVM and different classification methods on the IRIS and ABALONE datasets.
IRIS
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes90.0087.9977.661.00
MLP26.660.000.000.00
KNN96.6695.9891.621.00
AdaBoostM186.6683.6671.771.00
Decision Tree69.2576.1270.0169.55
SGD Classifier76.6646.801.0030.10
Random Forest Classifier90.0087.9977.661.00
Nearest Centroid Classifier96.6695.9891.621.00
Classical SVM96.6695.9891.621.00
Opt-RNN-DBSVM97.9692.1995.8596.05
ABALONE
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes68.8951.1941.3767.33
MLP62.9147.6336.3247.63
KNN81.9353.7470.2343.02
AdaBoostM182.2955.9970.5655.06
Decision Tree76.7951.3352.0649.63
SGD Classifier80.8664.7458.0870.57
Nearest Centroid Classifier76.0764.7962.6061.15
Random Forest Classifier82.2857.5671.1148.34
Classical SVM80.9840.3882.0027.65
Opt-RNN-DBSVM98.3896.0796.1893.19
Table A2. Comparison between Opt-RNN-DBSVM and different classification methods on the WINE and ECOLI datasets.
Table A2. Comparison between Opt-RNN-DBSVM and different classification methods on the WINE and ECOLI datasets.
WINE
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes78.4875.9674.1774.89
MLP67.7372.6751.5243.64
KNN81.8479.8578.3579.00
AdaBoostM188.2070.6081.6476.21
Decision Tree83.0281.4279.3680.22
SGD Classifier68.4083.9552.2844.81
Nearest Centroid Classifier73.4470.7572.3371.23
Classical SVM79.4978.2673.5274.97
Opt-RNN-DBSVM96.4795.9696.0896.02
ECOLI
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes82.0897.111.0095.66
MLP55.2271.4555.001.00
KNN89.5599.871.0097.02
AdaBoostM170.1487.2177.731.00
Decision Tree70.1483.7182.0384.19
SGD Classifier86.5696.331.0092.01
Random Forest Classifier85.0796.1197.0095.66
Nearest Centroid Classifier82.0897.281.0095.39
Classical SVM88.0597.7797.8397.99
Opt-RNN-DBSVM91.8288.4690.7791.19
Table A3. Comparison between Opt-RNN-DBSVM and different classification methods on the BALANCE and LIVER datasets.
Table A3. Comparison between Opt-RNN-DBSVM and different classification methods on the BALANCE and LIVER datasets.
BALANCE
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes79.367.262.5069.40
MLP76.661.1057.4068.30
KNN80.9068.4064.5066.60
AdaBoostM181.1069.2070.6066.50
Decision Tree79.9064.8072.8068.70
SGD Classifier69.0365.6266.1565.83
Nearest Centroid Classifier66.546566.6664.89
Classical SVM79.7070.755.6062.70
Opt-RNN-DBSVM91.3190.3389.5490.89
LIVER
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes71.6671.9071.4570.80
MLP61.5063.1572.7068.88
KNN50.5071.2069.8153.90
AdaBoostM188.5089.3789.9979.39
Decision Tree39.2645.3948.3848.76
SGD Classifier49.8060.0049.4950.22
Nearest Centroid Classifier66.5063.3060.2061.87
Classical SVM80.4077.6777.9070.08
Opt-RNN-DBSVM88.1085.9586.0085.50
Table A4. Comparison between Opt-RNN-DBSVM and different classification methods on the SPECT and SEED datasets.
Table A4. Comparison between Opt-RNN-DBSVM and different classification methods on the SPECT and SEED datasets.
SPECT
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes63.1577.4096.3365.94
MLP97.3699.6097.771.00
KNN92.1096.8097.8095.90
AdaBoostM194.7397.9997.5597.50
Decision Tree86.8493.8997.5589.48
SGD Classifier97.3699.6097.771.00
Random Forest Classifier97.3699.6097.771.00
Nearest Centroid Classifier57.8972.201.0072.28
Classical SVM97.3699.6097.771.00
Opt-RNN-DBSVM97.3699.6097.771.00
SEED
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes85.7179.7692.5569.20
MLP30.950.000.000.00
KNN85.7179.2473.3985.40
AdaBoostM195.2393.4087.441.00
Decision Tree92.8590.881.0081.00
Random Forest Classifier92.8590.881.0081.00
SGD Classifier85.7186.7679.5594.20
Nearest Centroid Classifier85.7179.2892.7075.77
Classical SVM85.7183.4392.7075.04
Opt-RNN-DBSVM96.8897.0996.761.00
Table A5. Comparison between Opt-RNN-DBSVM and different classification methods on the PIMA and Germany Diabetes datasets.
Table A5. Comparison between Opt-RNN-DBSVM and different classification methods on the PIMA and Germany Diabetes datasets.
PIMA
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes79.370.4074.266.50
MLP66.2313.3357.18.40
KNN74.9066.6068.464.50
AdaBoostM172.7260.3760.860.80
Decision Tree70.7752.6360.247.60
SGD Classifier37.6652.4736.981.00
Nearest Centroid Classifier63.6348.1447.0149.55
Classical SVM79.2261.9084.749.6
Opt-RNN-DBSVM79.8768.0475.9062.05
Germany Diabetes Dataset
MethodAccuracyF1-ScorePrecisionRecall
Naive Bayes81.6687.1979.3777.33
MLP63.2854.2450.3956.40
KNN68.0867.8876.3066.00
AdaBoostM191.9590.6692.5690.06
Decision Tree55.6653.7459.9350.02
SGD Classifier79.9679.7470.0878.57
Nearest Centroid Classifier86.7182.6389.3285.63
Classical SVM78.559.8174.1050.99
Opt-RNN-DBSVM99.599.71.0098.00

References

  1. Steyerberg, E.W. Clinical Prediction Models; Springer International Publishing: Cham, Switzerland, 2019; pp. 309–328. [Google Scholar]
  2. Law, A.M. How to build valid and credible simulation models. In Proceedings of the 2019 Winter Simulation Conference (WSC), National Harbor, MD, USA, 8–11 December 2019; pp. 1402–1414. [Google Scholar]
  3. Glaeser, E.L.; Kominers, S.D.; Luca, M.; Naik, N. Big data and big cities: The promises and limitations of improved measures of urban life. Econ. Inq. 2018, 56, 114–137. [Google Scholar] [CrossRef]
  4. Minoux, M. Mathematical Programming: Theories and Algorithms; Wiley: Hoboken, NJ, USA, 1983. [Google Scholar]
  5. El Moutaouakil, K.; Roudani, M.; El Ouissari, A. Optimal Entropy Genetic Fuzzy-C-Means SMOTE (OEGFCM-SMOTE). Knowl.-Based Syst. 2023, 262, 110235. [Google Scholar] [CrossRef]
  6. Huang, T.-M.; Kecman, T.M. Bias Term b in SVMs Again. In Proceedings of the 12th European Symposium on Artificial Neural Networks, Bruges, Belgium, 28–30 April 2004. [Google Scholar]
  7. Kecman, V.; Vogt, T.-M.H. On the Equality of Kernel AdaTron and Sequential Minimal Optimization in Classification and Regression Tasks and Alike Algorithms for Kernel Machines. In Proceedings of the 11th European Symposium on Artificial Neural Networks, ESANN, Bruges, Belgium, 23–25 April 2003; pp. 215–222. [Google Scholar]
  8. Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Microsoft Research Technical Report MSR-TR-98-14; MIT Press: Boston, MA, USA, 1998. [Google Scholar]
  9. Vogt, M. SMO Algorithms for Support Vector Machines without Bias, Institute Report; Institute of Automatic Control, TU Darmstadt: Darmstadt, Germany, 2002; Available online: http://www.iat.tu-darmstadt.de/vogt (accessed on 1 June 2023).
  10. Frieß, T.-T.; Cristianini, N.; Campbell, I.C.G. The Kernel-Adatron: A Fast and Simple Learning Procedure for Support Vector Machines. In Proceedings of the 15th International Conference on Machine Learning, Madison, WI, USA, 24–27 July 1998; Shavlik, J., Ed.; Morgan Kaufmann: San Francisco, CA, USA, 1998; pp. 188–196. [Google Scholar]
  11. Anlauf, J.K.; Biehl, M. The AdaTron—An adaptive perceptron algorithm. Europhys. Lett. 1989, 10, 687–692. [Google Scholar] [CrossRef]
  12. Joachims, T. Making large-scale svm learning. Practical Advances. In Kernel Methods-Support Vector Learning; MIT-Press: Cambridge, MA, USA, 1999. [Google Scholar]
  13. Osuna, E.; Freund, R.; Girosi, F. An Improved Training Algorithm for Support Vector Machines. In Proceedings of the Neural Networks for Signal Processing VII, Proceedings of the 1997 Signal Processing Society Workshop, Amelia Island, FL, USA, 24–26 September 1997; pp. 276–285. [Google Scholar]
  14. Veropoulos, K. Machine Learning Approaches to Medical Decision Making. Ph.D. Thesis, The University of Bristol, Bristol, UK, 2001. [Google Scholar]
  15. Kecman, V.; Huang, T.M.; Vogt, M. Iterative single data algorithm for training kernel machines from huge data sets: Theory and performance. In Support Vector Machines: Theory and Applications; Springer: Berlin, Germany, 2005; pp. 255–274. [Google Scholar]
  16. Haddouch, K.; El Moutaouakil, K. New Starting Point of the Continuous Hopfield Network. In Proceedings of the Big Data, Cloud and Applications: Third International Conference, BDCA 2018, Kenitra, Morocco, 4–5 April 2018; pp. 379–389. [Google Scholar]
  17. Hopfield, J.J.; Tank, D.W. Neural computation of decisions in optimization problems. Biol. Cybern. 1985, 52, 1–25. [Google Scholar] [CrossRef]
  18. Alotaibi, Y. A new meta-heuristics data clustering algorithm based on tabu search and adaptive search memory. Symmetry 2022, 14, 623. [Google Scholar] [CrossRef]
  19. El Ouissari, A.; El Moutaouakil, K. Density based fuzzy support vector machine: Application to diabetes dataset. Math. Model. Comput. 2021, 8, 747–760. [Google Scholar] [CrossRef]
  20. Moutaouakil, K.E.; Yahyaouy, A.; Chellak, S.; Baizri, H. An Optimized Gradient Dynamic-Neuro-Weighted-Fuzzy Clustering Method: Application in the Nutrition Field. Int. J. Fuzzy Syst. 2022, 24, 3731–3744. [Google Scholar] [CrossRef]
  21. Aghbashlo, M.; Peng, W.; Tabatabaei, M.; Kalogirou, S.A.; Soltanian, S.; Hosseinzadeh-Bandbafha, H.; Lam, S.S. Machine learning technology in biodiesel research: A review. Prog. Energy Combust. Sci. 2021, 85, 100904. [Google Scholar]
  22. Ahmadi, M.; Khashei, M. Generalized support vector machines (GSVMs) model for real-world time series forecasting. Soft Comput. 2021, 25, 14139–14154. [Google Scholar] [CrossRef]
  23. Xie, X.; Xiong, Y. Generalized multi-view learning based on generalized eigenvalues proximal support vector machines. Exp. Syst. Appl. 2022, 194, 116491. [Google Scholar] [CrossRef]
  24. Tanveer, M.; Rajani, T.; Rastogi, R.; Shao, Y.H.; Ganaie, M.A. Comprehensive review on twin support vector machines. In Annals of Operations Research; Springer: Berlin, Germany, 2022; pp. 1–46. [Google Scholar]
  25. Mercer, J. Functions of positive and negative type, and their connection the theory of integral equations. Philos. Trans. R. Soc. Lond. Ser. Contain. Pap. Math. Phys. Character 1909, 209, 415–446. [Google Scholar]
  26. Schölkopf, B.; Smola, A.J.; Bach, F. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2002. [Google Scholar]
  27. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  28. Ettaouil, M.; Elmoutaouakil, K.; Ghanou, Y. The Placement of Electronic Circuits Problem: A Neural Network Approach. Math. Model. Nat. Phenom. 2010, 5, 109–115. [Google Scholar] [CrossRef]
  29. El Moutaouakil, K.; Ahourag, A.; Chakir, S.; Kabbaj, Z.; Chellack, S.; Cheggour, M.; Baizri, H. Hybrid firefly genetic algorithm and integral fuzzy quadratic programming to an optimal Moroccan diet. Math. Model. Comput. 2023, 10, 338–350. [Google Scholar] [CrossRef]
  30. Abdellatif, E.O.; El Moutaouakil, K.; Hicham, B.; Saliha, C. Intelligent local search for an optimal control of diabetic population dynamics. Math. Models Comput. Simul. 2022, 14, 1051–1071. [Google Scholar] [CrossRef]
  31. Lin, C.F.; Wang, S.D. Fuzzy support vector machines. IEEE Trans. Neural Netw. 2002, 13, 464–471. [Google Scholar]
  32. Schölkopf, B.; Smola, A.J.; Williamson, R.C.; Bartlett, P.L. New support vector algorithms. Neural Comput. 2000, 12, 1207–1245. [Google Scholar] [CrossRef] [PubMed]
  33. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neur. Comp. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
  34. Bi, J.; Zhang, T. Support vector classification with input data uncertainty. In Proceedings of the Advances in Neural Information Processing Systems 18 Neural Information Processing Systems, NIPS 2005, Vancouver, Canada, 5–8 December 2005; pp. 161–168. [Google Scholar]
  35. Hazarika, B.B.; Gupta, D. Density-weighted support vector machines for binary class imbalance learning. Neural Comput. Appl. 2021, 33, 4243–4261. [Google Scholar] [CrossRef]
  36. Guo, H.; Wang, W. Granular support vector machine: A review. Artif. Intell. Rev. 2019, 51, 19–32. [Google Scholar] [CrossRef]
  37. Lee, Y.J.; Mangasarian, O.L. SSVM: A smooth support vector machine for classification. Comput. Optim. Appl. 2001, 20, 5–22. [Google Scholar] [CrossRef]
  38. Laxmi, S.; Gupta, S.K. Multi-category intuitionistic fuzzy twin support vector machines with an application to plant leaf recognition. Eng. Appl. Artif. Int. 2022, 110, 104687. [Google Scholar] [CrossRef]
  39. El Moutaouakil, K.; Ettaouil, M. Reduction of the continuous Hopfield architecture. J. Comput. 2012, 4, 64–70. [Google Scholar]
  40. Hopfield, J.J. Neurons with graded response have collective computational properties like those of two-states neurons. Proc. Natl. Acad. Sci. USA 1984, 81, 3088–3092. [Google Scholar] [CrossRef] [PubMed]
  41. El Moutaouakil, K.; El Ouissari, A.; Touhafi, A.; Aharrane, N. An Improved Density Based Support Vector Machine (DBSVM). In Proceedings of the 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech), Marrakesh, Morocco, 24–26 November 2020; pp. 1–7. [Google Scholar]
  42. Moutaouakil, K.E.; Touhafi, A. A New Recurrent Neural Network Fuzzy Mean Square Clustering Method. In Proceedings of the 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech), Marrakesh, Morocco, 24–26 November 2020; pp. 1–5. [Google Scholar]
  43. Kecman, V. Iterative k data algorithm for solving both the least squares SVM and the system of linear equations. In Proceedings of the SoutheastCon 2015, Fort Lauderdale, FL, USA, 9–12 April 2015; pp. 1–6. [Google Scholar]
  44. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2021; Available online: http://archive.ics.uci.edu/ml (accessed on 1 June 2023).
  45. Wickramasinghe, I.; Kalutarage, H. Naive Bayes: Applications, variations and vulnerabilities: A review of literature with code snippets for implementation. Soft Comput. 2021, 25, 2277–2293. [Google Scholar] [CrossRef]
  46. Tolstikhin, I.O.; Houlsby, N.; Kolesnikov, A.; Beyer, L.; Zhai, X.; Unterthiner, T.; Dosovitskiy, A. Mlp-mixer: An all-mlp architecture for vision. In Proceedings of the Advances in Neural Information Processing Systems 34, 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
  47. Shokrzade, A.; Ramezani, M.; Tab, F.A.; Mohammad, M.A. A novel extreme learning machine based kNN classification method for dealing with big data. Expert Syst. Appl. 2021, 183, 115293. [Google Scholar] [CrossRef]
  48. Chen, P.; Pan, C. Diabetes classification model based on boosting algorithms. BMC Bioinform. 2018, 19, 109. [Google Scholar] [CrossRef]
  49. Charbuty, B.; Abdulazeez, A. Classification based on decision tree algorithm for machine learning. J. Appl. Sci. Technol. Trends 2021, 2, 20–28. [Google Scholar] [CrossRef]
  50. Almustafa, K.M. Classification of epileptic seizure dataset using different machine learning algorithms. Inform. Med. Unlocked 2020, 21, 100444. [Google Scholar] [CrossRef]
  51. Vapnik, V. The Nature of Statistical Learning Theory; Springer Science and Business Media: Berlin, Germany, 1999. [Google Scholar]
Figure 2. Architecture of the connection weights W i , j between each neuron pair.
Figure 2. Architecture of the connection weights W i , j between each neuron pair.
Mathematics 11 03555 g002
Figure 3. Graphical representation of the variation in the sums and integral terms with respect to s k . (a) RNN-DBSVM 1, (b) RNN-DBSVM 2, (c) RNN-DBSVM 3.
Figure 3. Graphical representation of the variation in the sums and integral terms with respect to s k . (a) RNN-DBSVM 1, (b) RNN-DBSVM 2, (c) RNN-DBSVM 3.
Mathematics 11 03555 g003
Figure 4. IRIS dataset optimal time steps.
Figure 4. IRIS dataset optimal time steps.
Mathematics 11 03555 g004
Figure 5. Support vectors obtained by ISDA algorithm.
Figure 5. Support vectors obtained by ISDA algorithm.
Mathematics 11 03555 g005
Figure 6. Support vectors obtained by L1QP algorithm.
Figure 6. Support vectors obtained by L1QP algorithm.
Mathematics 11 03555 g006
Figure 7. Support vectors obtained by SMO algorithm.
Figure 7. Support vectors obtained by SMO algorithm.
Mathematics 11 03555 g007
Figure 8. Support vectors obtained by Opt_RNN_SVM algorithm.
Figure 8. Support vectors obtained by Opt_RNN_SVM algorithm.
Mathematics 11 03555 g008
Table 1. Performance Performance of Const-CHN-SVM on different datasets for different values of time step in [0.1, 0.6].
Table 1. Performance Performance of Const-CHN-SVM on different datasets for different values of time step in [0.1, 0.6].
SVM-CHN s = 0.1 SVM-CHN s = 0.2
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS95.9896.6690.5292.0096.6695.9891.6292.00
ABALONE80.9840.3882.0027.6580.6640.3881.9827.65
WINE79.4978.2673.5274.9779.4978.2673.5274.97
ECOLI88.0596.7797.8397.3388.0597.7797.8397.99
BALANCE79.7070.755.6062.7079.7070.755.6062.70
LIVER80.4077.6777.9070.0880.4077.6777.9070.08
SPECT92.1290.8691.3390.0097.3699.6097.771.00
SEED85.7183.4392.7075.0485.7183.4392.7075.04
PIMA79.2261.9084.749.679.2261.9083.9749.6
SVM-CHN s = 0 . 3 SVM-CHN s = 0 . 4
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS94.5395.8689.6698.2395.9693.8889.3395.32
ABALONE77.9951.6883.8530.8881.9841.6683.5633.33
WINE80.2377.6674.8974.9781.3377.6573.1174.43
ECOLI88.8695.6596.8897.3386.7797.6697.8397.95
BALANCE79.7570.8955.9662.3279.6670.4556.166.23
LIVER80.5178.3377.970.5680.4077.6777.9070.08
SPECT97.6398.9997.8198.5696.4098.7196.8397.79
SEED85.7183.4392.7075.8885.7183.4392.7075.61
PIMA79.2261.9384.8249.8679.2261.9084.9849.89
SVM-CHN s = 0 . 5 SVM-CHN s = 0 . 6
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS94.5395.8689.6698.2395.9693.8889.3395.32
ABALONE78.0651.8383.9640.4582.142.1583.8838.26
WINE80.8478.2674.9175.2081.3977.8673.6674.47
ECOLI88.9795.796.9197.4386.7797.6697.8397.95
BALANCE79.8971.0056.1162.7279.7170.6456.3366.44
LIVER80.6678.3377.970.5680.4077.6777.9070.08
SPECT91.3692.6091.7784.3391.3692.6091.7784.33
SEED84.6782.9692.2374.1884.1183.0892.6375.48
PIMA79.1261.7584.6249.8679.1261.3384.6848.55
Table 2. Performance of Const-CHN-SVM on different datasets for different values of time step in [0.7, 0.9].
Table 2. Performance of Const-CHN-SVM on different datasets for different values of time step in [0.7, 0.9].
SVM-CHN s = 0.7 SVM-CHN s = 0.8
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS94.5395.8689.6698.2395.9693.8889.3395.32
ABALONE77.9951.6883.8530.8881.9841.6683.5633.33
WINE80.2377.6674.8974.9781.3377.6573.1174.43
ECOLI88.8695.6596.8897.3386.7797.6697.8397.95
BALANCE79.7570.8955.9662.3279.6670.4556.166.23
LIVER80.5178.3377.970.5680.4077.6777.9070.08
SPECT94.3684.6083.7785.9994.3684.6083.7785.99
SEED85.7183.4392.7075.8885.7183.4392.7075.61
PIMA79.2261.9384.8249.8679.2261.9084.9849.89
SVM-CHN s = 0 . 9 CHN-DBSVM Optimal Value of s
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS95.9693.8889.3395.3297.9696.1995.8598.5
ABALONE81.9841.6683.5633.3398.3896.0796.1893.19
WINE81.3377.6573.1174.4396.4795.9696.0896.02
ECOLI86.7788.4690.7791.1991.8297.6697.8397.95
BALANCE79.6670.4556.166.2391.3190.3389.5490.89
LIVER80.4077.6777.9070.0888.1085.9586.0085.50
SPECT94.3684.6083.7785.9995.5586.2085.2886.31
SEED85.7183.4386.1875.6188.9084.3192.7084.40
PIMA79.2261.9075.9049.8979.8768.0484.9862.05
Table 3. Performance of Classical Optimizer–SVM on different datasets.
Table 3. Performance of Classical Optimizer–SVM on different datasets.
L1QP-SVMISDA-SVM
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS71.5962.0270.8055.1782.0083.6476.6792.00
ABALONE74.1570.8071.4859.3083.7068.2270.0080.66
WINE72.9065.7975.5360.1166.0865.8066.0070.02
ECOLI66.1555.8961.3341.3051.6048.3033.3351.39
BALANCE65.2053.0160.5141.2250.4458.3668.3260.20
LIVER64.6652.0660.7740.4450.0048.0062.3151.22
SPECT70.6662.0267.4850.1177.6071.2075.3370.11
SEED70.5158.9867.3045.3080.6681.2579.8079.30
PIMA65.1853.2360.8839.4849.3244.3348.9050.27
SMO-SVMCHN-DBSVM Optimal Value of s
AccuracyF1-ScorePrecisionRecallAccuracyF1-ScorePrecisionRecall
IRIS71.5962.0270.8055.1797.9696.1995.8598.5
ABALONE74.1570.8071.4859.3098.3896.0796.1893.19
WINE72.9065.7975.5360.1196.4795.9696.0896.02
ECOLI66.1555.8961.3341.3091.8288.4690.7791.19
BALANCE65.2053.0160.5141.2291.3190.3389.5490.89
LIVER64.6652.0660.7740.4488.1085.9586.0085.50
SPECT70.6662.0267.4850.1195.5586.2085.2886.31
SEED70.5158.9867.3045.3088.9084.3186.1884.40
PIMA65.1853.2360.8839.4879.8768.0475.9062.05
Table 4. Results of Friedman test on the kernel SVM classifiers.
Table 4. Results of Friedman test on the kernel SVM classifiers.
Opt-RNN-DB vs.p-Value (Accuracy)p-Value (F1-Score)p-Value (Precision)
SOM0.0080.0040.004
L1QP0.0080.0040.004
ISDA0.0030.0110.011
Table 5. Results of Friedman test on non-kernel classifiers.
Table 5. Results of Friedman test on non-kernel classifiers.
Opt-RNN-DB vs.p-Value (Accuracy)p-Value (F1-Score)p-Value (Precision)
Naive Bayes0.0540.0220.026
MLP0.000.000.00
KNN0.0180.0610.045
AdaBoostM10.0180.0830.075
Nearest Centroid Classifier0.0120.150.1
Decision Tree0.0020.0070.012
SGD Classifier0.0030.0220.004
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

El Moutaouakil, K.; El Ouissari, A.; Olaru, A.; Palade, V.; Ciorei, M. OPT-RNN-DBSVM: OPTimal Recurrent Neural Network and Density-Based Support Vector Machine. Mathematics 2023, 11, 3555. https://doi.org/10.3390/math11163555

AMA Style

El Moutaouakil K, El Ouissari A, Olaru A, Palade V, Ciorei M. OPT-RNN-DBSVM: OPTimal Recurrent Neural Network and Density-Based Support Vector Machine. Mathematics. 2023; 11(16):3555. https://doi.org/10.3390/math11163555

Chicago/Turabian Style

El Moutaouakil, Karim, Abdellatif El Ouissari, Adrian Olaru, Vasile Palade, and Mihaela Ciorei. 2023. "OPT-RNN-DBSVM: OPTimal Recurrent Neural Network and Density-Based Support Vector Machine" Mathematics 11, no. 16: 3555. https://doi.org/10.3390/math11163555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop