Next Article in Journal
Theoretical Model for Nonlinear Long Waves over a Thin Viscoelastic Muddy Seabed
Next Article in Special Issue
Solving Nonlinear Second-Order Differential Equations through the Attached Flow Method
Previous Article in Journal
Research on Attack Detection of Cyber Physical Systems Based on Improved Support Vector Machine
Previous Article in Special Issue
Staggered Semi-Implicit Hybrid Finite Volume/Finite Element Schemes for Turbulent and Non-Newtonian Flows
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DCA for Sparse Quadratic Kernel-Free Least Squares Semi-Supervised Support Vector Machine

1
School of Mathematics and Statistics, Linyi University, Linyi 276000, China
2
Department of Applied Mathematics, Beijing Jiaotong University, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Mathematics 2022, 10(15), 2714; https://doi.org/10.3390/math10152714
Submission received: 12 April 2022 / Revised: 2 July 2022 / Accepted: 18 July 2022 / Published: 1 August 2022
(This article belongs to the Special Issue Computational Methods in Nonlinear Analysis and Their Applications)

Abstract

:
With the development of science and technology, more and more data have been produced. For many of these datasets, only some of the data have labels. In order to make full use of the information in these data, it is necessary to classify them. In this paper, we propose a strong sparse quadratic kernel-free least squares semi-supervised support vector machine ( S S Q L S S 3 V M ), in which we add a 0 norm regularization term to make it sparse. An NP-hard problem arises since the proposed model contains the 0 norm and another nonconvex term. One important method for solving the nonconvex problem is the DC (difference of convex function) programming. Therefore, we first approximate the 0 norm by a polyhedral DC function. Moreover, due to the existence of the nonsmooth terms, we use the sGS-ADMM to solve the subproblem. Finally, empirical numerical experiments show the efficiency of the proposed algorithm.

1. Introduction

In today’s information society, more and more data is emerging, and in order to make reasonable and full use of the information contained in the data, it is common to classify or categorize the data, which usually involves a high cost (money or time). Machine learning is based on sample data. Supervised learning is a learning method that predicts a function or classifier from a labeled training dataset, and one of the most famous methods is the support vector machine (SVM), which has been widely used for data classification [1,2,3,4]. Nevertheless, in the real world, there are a large number of data that are unlabeled or only partially labeled, and it is difficult to obtain labels. Usually, a lot of work is needed to tag the data, which means that it is often unacceptable to invest a lot of resources in the work at a high (monetary or time) cost. For example, computer-aided diagnosis (CAD) with a breast cancer system is usually characterized by a special professional method to mark a large number of collected data. However, we know that it is very difficult to collect the pathological records of patients with labels. Generally, it takes at least five years to mark the patient record as alive or not alive. A major problem is the need for experts to spend a lot of time labeling. Supervised learning cannot be used in these contexts, and it is difficult to obtain a learner with good prediction results and strong generalization ability by using only a small amount of labeled data in supervised learning. In order to overcome this difficulty, researchers have proposed the semi-supervised learning method, which has attracted more and more attention [5,6]. When only a small portion of data has labels, exploiting potential information contained in them can help improve the learning performance [7,8].
Therefore, some researchers have introduced the idea of semi-supervised learning into SVM, for which a semi-supervised support vector machine ( S 3 V M ) is proposed, similar to the SVM, that maximizes the margin between labeled and unlabeled data points. The S 3 V M was originally proposed by Vapnik and Sterin [9] in 1977. It considered that the hyperplane needs to pass through the low-density area of data. In 1999, Bennett and Demiriz [10] proposed the first optimization formula of S 3 V M and proved that S 3 V M can be re-expressed as a mixed integer programming problem. They then used the method of solving integer programming to solve it accurately. Traditionally, since the categories of unlabeled data need to be predicted, S 3 V M is usually formulated as a nonconvex nonsmooth optimization problem, and this raises some analytical and computational difficulties. In order to solve S 3 V M with better results, scholars have proposed many semi-supervised methods to improve S 3 V M , and many effective algorithms have been proposed. For more details of S 3 V M methods, the reader is referred to [11,12,13,14,15,16,17,18] and references therein.
In addition, for linearly indivisible datasets, most traditional methods use kernel functions to map each data point from the original space to a higher dimensional space, and then find a hyperplane in the high dimensional space so that it can separate all mapped data points. However, selection of the appropriate kernel function for a given dataset becomes a new problem. To avoid this situation, Dagher [19] first proposed a kernel-free quadratic support vector machine (QSVM), which tries to find a quadratic decision function that can nonlinearly separate the data at the largest margin. Based on the above model, Yan et al. [20] proposed a kernel-free quadratic S 3 V M , called SSQSSVM. They reformulated SSQSSVM as a mixed-integer programming problem, which they proved to be equivalent to a nonconvex optimization problem with absolute-value constraints. Then, using convex relaxation techniques, it can be transformed into a semidefinite programming optimization problem, which a CVX package can be used to solve. In 2018, Zhan [21] proposed a sparse quadratic kernel-free least squares semi-supervised support vector machine ( S Q L S S 3 V M ) with a 1 norm regularization term for the binary classification problem and used the proximal alternating direction method of multipliers (P-ADMM) to solve it. Tian et al. [22] applied this to online credit scoring, and also added fuzzy weights to further improve the accuracy and robustness of the classification. Gao et al. [23] proposed a new quadratic kernel-free least square twin support vector machine (QLSTSVM) for binary classification problems. They adopted the alternating direction method of multipliers to solve it and achieved good numerical results.
We know that to accurately characterize the sparsity, it is suggested to impose the 0 norm, which is the most natural and suitable concept for modeling the sparsity. There are two main ways to deal with the 0 norm: one is to approximate it with a convex function [24] or nonconvex function [25], and the other is to deal with it directly [26]. The 1 norm is the best convex approximation of the 0 norm, and it can encourage sparsity in only some cases with restrictive assumptions [27]. It has also been shown to be, in certain cases, inconsistent and biased [28]. With the in-depth study of the properties of 0 norm, previous application works have shown that the 0 norm has better properties and better sparse performance and robustness than the 1 norm approach [29,30,31,32].
In this paper, we propose a new strong sparse quadratic kernel-free least squares semi-supervised support sector machine model with 0 norm regularization term that has stronger interpretability, which finds a separating hyperplane surface far away from both the labeled and unlabeled points with the least number of features.The main contributions of the paper are summarized as follows:
(I)
We propose a new strong sparse quadratic kernel-free least squares semi-supervised support vector machine model called the strong sparse quadratic kernel-free least squares semi-supervised support vector machine ( S S Q L S S 3 V M ). In order to make it have better sparse performance and robustness, we add an 0 norm regularization term directly to the objective rather than its approximation.
(II)
We treat the problem by nonconvex methods based on the difference of the convex function (DC) program and DCA (DC algorithm), which is one of the most important methods for nonconvex problems [33,34,35]. Therefore, we used a nonconvex function (which can be expressed as the form of difference of two convex functions) to approximate the 0 norm, and the third term can be expressed in the form of a subtractive convex function.
(III)
We conducted experiments on all proposed models and algorithms on several benchmark datasets to investigate their efficiency in classification.
The rest of this paper is organized as follows: DC programming and S S Q L S S 3 V M are briefly presented in Section 2, while Section 3 is devoted to the development of DCA with sGS-ADMM for solving the S S Q L S S 3 V M . In Section 4, the computational experiments are reported. Finally, we conclude the paper with future work in Section 5.

2. Preliminaries

In this section, we will recall some elementary concepts of the quadratic kernel-free semi-supervised support vector machine and the outline of DC programming and DCA.

2.1. Sparse Quadratic Kernel-Free Least Squares Semi-Supervised Support Vector Machine

Given a training set that consists of m labeled points { ( x i , y i ) R n × { + 1 , 1 } , i = 1 , , s } and p unlabeled points x i R n , i = s + 1 , , n , our goal is to find a quadratic surface f ( x ) = x T Q x + b T x + c = 0 that can directly separate the data into two classes with the largest margin, where Q is a symmetric matrix.
According to [20], the formula of the S 3 V M problem can be written as follows
min w , b , c , ξ i , η i , γ i i = 1 n | | Q x + b | | 2 + C 1 i = 1 s ξ i + C 2 i = s + 1 n min ( η i , γ i ) s . t . y i ( x T Q x + b T x + c ) 1 ξ i , ξ i 0 , i = 1 , 2 , , s , ( x T Q x + b T x + c ) 1 η i , η i 0 , i = s + 1 , s + 2 , , n , ( x T Q x + b T x + c ) 1 γ i , γ i 0 , i = s + 1 , s + 2 , , n .
Take the upper triangular element of matrix Q, denoted as vector a, that is
a = [ a 11 , a 12 , , a 1 m , a 22 , a 23 , , a 2 m , , a m m ] T R m 2 + m 2 .
If the data x i = { x i 1 , , x i m } R m , then h x i is a vector formed as follows
h x i = [ 1 2 x i 1 2 , x i 1 x i 2 , , x i 1 x i m , 1 2 x i 2 2 , , x i 2 x i m , , 1 2 x i m 2 ] T R m 2 + m 2 .
Then, we can construct a m × m 2 + m 2 -dimensional matrix H x i . In each jth column of H x i , find the positions of all the components of Q that have the form a j k or a k j (where k can be any value), set those positions in the jth column of H x i to x i k , and set the other positions to zero.
Denoting g x i = h x i x i 1 , w = a b c , G = 1 l i = 1 l H x i T H x i i = 1 n H x i T 0 i = 1 l H x i n E 0 0 0 0
We have
y i ( 1 2 x i T Q x i + b T x i + c ) = y i ( h x i T a + b T x i + c ) = y i ( g x i T w ) ,
1 2 x i T Q x i + b T x i + c = h x i T a + b T x i + c = ( g x i T w ) ,
and
1 2 n i = 1 n | | Q x i + b | | 2 2 = 1 2 n i = 1 n | | H x i a + b | | 2 2 = 1 2 w T G w
Therefore, Q S 3 V M can be rewritten as
min w , ξ i , η i , γ i 1 2 w T G w + C 1 i = 1 s ξ i + C 2 i = s + 1 n min ( η i , γ i ) s . t . y i ( g x i T w ) 1 ξ i , ξ i 0 , i = 1 , 2 , , s , ( g x i T w ) 1 η i , η i 0 , i = s + 1 , s + 2 , , n , ( g x i T w ) 1 γ i , γ i 0 , i = s + 1 , s + 2 , , n .
To obtain sparse solutions, we add the 0 norm regularization term to the objective function. Hence, the strong sparse quadratic kernel-free least squares semi-supervised support vector machine ( S S Q L S S 3 V M ) can be written as
min w , ξ i , η i , γ i λ w 0 + 1 2 w T G w + C 1 i = 1 s ξ i + C 2 i = s + 1 n min ( η i , γ i ) s . t . y i ( g x i T w ) 1 ξ i , ξ i 0 , i = 1 , 2 , , s , ( g x i T w ) 1 η i , η i 0 , i = s + 1 , s + 2 , , n , ( g x i T w ) 1 γ i , γ i 0 , i = s + 1 , s + 2 , , n ,
where · 0 is defined as the number of its nonzero components.

2.2. Outline of DCA

DC programming and DCA solve the problem of minimizing function f, which is the difference between two convex functions on the subspace X R n and the dual space Y of X. In general, the DC program is an optimization problem in the following forms
α = inf { f ( x ) = g ( x ) h ( x ) , x R n } ( P d c ) ,
where g , h are lower semi-continuous proper convex functions on R n .
For a convex function g ( x ) , its conjugate function is defined as
g * ( y ) = sup { x , y g ( x ) | x X } .
For ϵ > 0 and x 0 d o m g , the symbol ϵ g ( x 0 ) denotes ϵ subdifferential of g at x 0 , that is,
ϵ g ( x 0 ) = { y Y : g ( x ) g ( x 0 ) + x x 0 , y ϵ x X } ,
while g ( x 0 ) stands for the usual (or exact) subdifferential of g at x 0 .
According to the subdifferential calculus of a lower semi-continuous proper convex function [36], we can get
y 0 f ( x 0 ) x 0 f * ( y 0 ) x 0 , y 0 = f ( x 0 ) + f * ( y 0 ) .
On the basis of the definition of conjugate functions, we have
α = inf { f ( x ) = g ( x ) h ( x ) , x X } = inf { g ( x ) sup { x , y h * ( y ) : y Y } : x X } = inf { β ( y ) : y Y }
with
β ( y ) = inf { g ( x ) ( x , y h * ( y ) ) : x X } .
Then, the following program is called the dual program of ( P d c ) ,
α = inf { h * ( y ) g * ( y ) : y Y } . ( D d c )
We observe perfect symmetry between primal and dual programs ( P d c ) and ( D d c ) : the dual program to ( D d c ) is exactly ( P d c ) .
Definition 1
([37]). A point x * is said to be a critical point of g h if
g ( x * ) h ( x * ) .
Theorem 1
([38]). Let P and D denote the solution sets of problems ( P d c ) and ( D d c ) . Then,
( i ) x P if and only if ϵ h ( x ) ϵ g ( x ) .
( i i ) Dually, y D if and only if ϵ g * ( y ) ϵ h * ( y ) .
( i i i ) { h ( x ) : x P } D d o m h * .
( i v ) { g * ( y ) : y D } P d o m g .
Theorem 2
([38]). Let
P l = { x * X : h ( x * ) g ( x * } ,
D l = { y * Y : g * ( y * ) h * ( y * } .
Then,
( i ) x * is a local minimizer of g h , then x * P l .
( i i ) Let x * be a critical point of g h and y * g ( x * ) h ( x * ) . Let U be a neighborhood of x * such that U d o m g d o m h . If for any x U d o m g there is y h such that h * ( y ) g * ( y ) h * ( y * ) g * ( y * ) , then x * is a local minimizer of g h . More precisely,
g ( x ) h ( x ) g ( x * ) h ( x * ) , x U d o m g .
The necessary local optimality condition for (primal) DC program P d c is
g ( x * ) h ( x * ) .
According to [38], for each fixed x * X , we solve the following optimization problem
inf { h * ( y ) g * ( y ) : y h ( x * ) } . ( S ( x * ) )
This is equal to
inf { x * , y ) g * ( y ) : y h ( x * ) } .
In the same way, for each y * Y , we define the problem
inf { g ( x ) h ( x ) : x g * ( y * ) } . ( T ( y * ) )
Similarly, it can be rewritten as
inf { x , y * ) h ( x ) : x g * ( y * ) } .
Let S ( x * ) , T ( y * ) denote the solution sets of problems ( S ( x * ) ) and ( T ( y * ) ) , respectively. Based on the above, we give the DCA.
Given an initial point x 0 d o m g , we calculate the following two sequences { x k } and { y k } , which are defined by
y k S ( x k ) ; x k + 1 T ( y k ) .
Namely, at the k-th iteration, we calculate
x k g * ( y k 1 ) y k h ( x k ) = arg min { h * ( y ) [ g * ( y k 1 ) + x k , y y k 1 ] : y Y }
y k h ( x k ) x k + 1 g * ( y k ) = arg min { g ( x ) [ h ( x k ) + y k , x x k ] : x X }
According to [36], sequences { x k } , { y k } in DCA are well defined if and only if
d o m g d o m h a n d d o m h * d o m g * .
The convergence properties of DCA and its theoretical basic can be found in [37,39].

3. SSQLSS 3 VM by DCA

The S S Q L S S 3 V M can be rewritten as follows
min w , ξ , η , γ λ w 0 + 1 2 w T G w + C 1 e , ξ + C 2 e , min { η , γ } s . t . D ( A w ) + ξ e , B w + η e , B w + γ e , ξ , η , γ 0 ,
where A is a matrix of the label data, D is a diagonal matrix whose diagonal elements are the label, and B is the unlabeled data, while ξ , η , γ are the relaxation variables.
We can see that the constraint set is a polyhedral convex set, denoted by K. Therefore, (9) can be rewritten as follows
min w , ξ , η , γ F ( w , ξ , η , γ ) = λ w 0 + 1 2 w T G w + C 1 e , ξ + C 2 e , min { η , γ } s . t . ( w , ξ , η , γ ) K .
A 0 norm DC polyhedron approximation is given in [40], and its practical effect is verified in [41]. Now, we will add this approximation to the problem (10).
For x R , we define the function θ as follows
θ ( x ) : = min { 1 , λ 1 | x | } = 1 + λ 1 | x | max { 1 , λ 1 | x | } ,
where λ 1 is a known parameter. Hence, w 0 can be approximately expressed by w 0 i = 1 m 2 + 3 m + 2 2 θ ( w i ) . We redefine λ 1 = λ 1 λ . Therefore, we can conclude that θ is a DC function with the following DC decomposition: θ ( x ) = g ( x ) h ( x ) , where
g ( x ) = λ + λ 1 | x | , h ( x ) = max { λ , λ 1 | x | } .
According to the above description, the problem of (10) can be represented as
min w , ξ , η , γ F ( w , ξ , η , γ ) = i = 1 m 2 + 3 m + 2 2 θ ( w i ) + 1 2 w T G w + C 1 e , ξ + C 2 e , min { η , γ } s . t . ( w , ξ , η , γ ) K .
We note that F ( w , ξ , η , γ ) is a DC function
F ( w , ξ , η , γ ) = G ( w , ξ , η , γ ) H ( w , ξ , η , γ ) ,
where
G ( w , ξ , η , γ ) = 1 2 w T G w + C 1 e , ξ + i = 1 m 2 + 3 m + 2 2 g ( w i ) ,
and
H ( w , ξ , η , γ ) = C 2 e , min { η , γ } + i = 1 m 2 + 3 m + 2 2 h ( w i ) .
Obviously, G and H are convex functions. Therefore, the approximation problem of (10) has the following form
min { G ( w , ξ , η , γ ) H ( w , ξ , η , γ ) : ( w , ξ , η , γ ) K } .
We can apply the DCA to the problem (14) and obtain the Algorithm 1 as follows.
Algorithm 1: Algorithm DCA.
Step 0 Given the initial value ( w 0 , ξ 0 , η 0 , γ 0 ) , set k 0 ;
Step 1 Compute ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) H ( w k , ξ k , η k , γ k ) )
Step 2 Solve the convex program;
( w k + 1 , ξ k + 1 , η k + 1 , γ k + 1 ) = arg min { 1 2 w T G w + C 1 e , ξ + λ 1 w 1 ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) ( w , ξ , η , γ ) }
s . t . ( w , ξ , η , γ ) K ;
Step 3 If x k + 1 x k ϵ , then stop; otherwise k k + 1 , go to Step 1 .
We know that the computation of a subgradient of H is easy, so we can take    ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) H ( w k , ξ k , η k , γ k ) ) as follows:
First, we compute η ¯ k by
η ¯ k = 0 i f η k γ k , η ¯ k = C 2 i f η k < γ k .
Then, γ ¯ k can be obtained by
γ ¯ k = 0 i f γ k η k , γ ¯ k = C 2 i f γ k < η k .
w ¯ i k = 0 i f 1 λ 1 w i k 1 λ 1 , λ i f w i k > 1 λ 1 , λ i f w i k < 1 λ 1 .
Furthermore, we solve the subproblem in Step  2, that is,
min 1 2 w T G w + C 1 e , ξ + λ 1 w 1 ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) , ( w , ξ , η , γ ) s . t . D ( A w ) + ξ e , B w + η e , B w + γ e , ξ , η , γ 0 .
We introduce relaxation variables r 1 , r 2 , r 3 , z into (18) as follows
min 1 2 w T G w + C 1 e , ξ + λ 1 z 1 ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) , ( w , ξ , η , γ ) s . t . D ( A w ) + ξ r 1 = e , B w + η r 2 = e , B w + γ r 3 = e , w z = 0 , ξ , η , γ 0 , r 1 , r 2 , r 3 0 .
The above optimization problem (19) can be written equivalently as the following convex programming
min 1 2 w T G w + C 1 e , ξ + λ 1 z 1 ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) , ( w , ξ , η , γ ) + δ + ( ξ ) + δ + ( η ) + δ + ( γ ) + δ + ( r 1 ) + δ + ( r 2 ) + δ + ( r 3 ) s . t . D ( A w ) + ξ r 1 = e , B w + η r 2 = e , B w + γ r 3 = e , w z = 0 ,
where
δ + ( u ) = 0 , i f u R + n , o t h e r w i s e .
In (20), there are three blocks, and only the first problem of block w is smooth. The problems of the second block { ξ , η , γ } and third block { r 1 , r 2 , r 3 , z } are nonsmooth terms, so we applied sGS-ADMM to the above problem.
Let σ > 0 be given. The augmented Lagrange function for (20) is defined by
L σ ( w , ξ , η , γ , r 1 , r 2 , r 3 , z ; s 1 , s 2 , s 3 , s 4 ) = 1 2 w T G w + C 1 e , ξ + λ 1 z 1 ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) , ( w , ξ , η , γ ) + δ + ( ξ ) + δ + ( η ) + δ + ( γ ) + δ + ( r 1 ) + δ + ( r 2 ) + δ + ( r 3 ) + D ( A w ) + ξ r 1 e , s 1 + B w + η r 2 e , s 2 + B w + γ r 3 e , s 3 + w z , s 4 + σ 2 D ( A w ) + ξ r 1 e 2 + σ 2 B w + η r 2 e 2 + σ 2 B w + γ r 3 e 2 + σ 2 w z 2 .
We provide the framework of sGS-ADMM for solving (20) and its more details are shown in Algorithm 2 below.
Algorithm 2 Algorithm sGS-ADMM.
Let σ > 0 and τ ( 0 , ) be the given parameters, Choose ( w 0 , ξ 0 , η 0 , γ 0 , r 1 0 , r 2 0 , r 3 0 , z 0 ) ,
and s 1 0 , s 2 0 , s 3 0 , s 4 0 , set l 0 . Perform the l + 1 th iteration as follows:
Step 1 Compute w l + 1 2 = arg min L σ ( w , ξ l , η l , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ;
Step 2 Compute ξ l + 1 = arg min L σ ( w l + 1 2 , ξ , η l , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ,
η l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ,
γ l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ;
Step 3 Compute w l + 1 = arg min L σ ( w , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 l + 1 , z l + 1 ; s 1 l , s 2 l , s 3 l , s 4 l )
Step 4 Compute r 1 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ,
r 2 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ,
r 3 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) ,
z l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 l + 1 , z ; s 1 l , s 2 l , s 3 l , s 4 l ) ;
Step 5 Compute s 1 l + 1 = s 1 l + τ σ ( D ( A w l + 1 ) + ξ l + 1 r 1 l + 1 e ) ,
s 2 l + 1 = s 2 l + τ σ ( B w l + 1 + η l + 1 r 2 l + 1 e ) ,
s 3 l + 1 = s 3 l + τ σ ( B w l + 1 + γ l + 1 r 3 l + 1 e ) ,
z l + 1 = z l + τ σ ( w l + 1 z l + 1 ) .
It is obvious that every subproblem is convex and easy to compute. Therefore, we will compute the subproblems in the algorithm sGS-ADMM one by one.
Firstly, we compute the first block about w.
w l + 1 2 = arg min L σ ( w , ξ l , η l , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = 1 2 w T G w w , w ¯ k A * D * s 1 l B * s 2 l + B * s 3 l s 4 l + σ 2 D ( A w ) + ξ l r 1 l e 2 + σ 2 B w + η l r 2 l e 2 + σ 2 B w + γ l r 3 l e 2 + σ 2 w z l 2 .
The problem (23) is a convex and smooth optimization problem. Based on the optimality condition of the convex programming, we can obtain the following result
w l + 1 2 = [ G + σ ( A * D * D A + 2 B * B + I ) ] 1 [ w ¯ k A * D * s 1 l B * s 2 l + B * s 3 l s 4 l σ A * D * ( ξ l r 1 l e ) σ B * ( η l r 2 l e ) + σ B * ( γ l r 3 l e ) + σ z l ] .
Then, we compute the subproblem in Step 2 and use the last result w l + 1 2 to solve the second block, which contains the same structure of the variables ξ , η , and γ
ξ l + 1 = arg min L σ ( w l + 1 2 , ξ , η l , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = C 1 e , ξ + δ + ( ξ ) + D ( A w l + 1 2 ) + ξ r 1 l e , s 1 l + σ 2 D ( A w l + 1 2 ) + ξ r 1 l e 2 = δ + ( ξ ) + σ 2 ξ + D A w l + 1 2 r 1 l e + C 1 e + s 1 l σ 2 = Π + ( r 1 l + e D A w l + 1 2 C 1 e + s 1 l σ ) .
η l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η , γ l , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = η ¯ k , η + δ + ( η ) + B w l + 1 2 + η r 2 l e , s 2 l + σ 2 B w l + 1 2 + η r 2 l e 2 = δ + ( η ) + σ 2 η + B w l + 1 2 r 2 l e + s 2 l η ¯ k σ 2 = Π + ( r 2 l + e B w l + 1 2 s 2 l η ¯ k σ ) .
γ l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ , r 1 l , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = γ ¯ k , γ + δ + ( γ ) + B w l + 1 2 + γ r 3 l e , s 3 l + σ 2 B w l + 1 2 + γ r 3 l e 2 = δ + ( γ ) + σ 2 γ B w l + 1 2 r 3 l e + s 3 l γ ¯ k σ 2 = Π + ( B w l + 1 2 + r 3 l + e s 3 l γ ¯ k σ ) .
Next, we update w again.
w l + 1 = arg min L σ ( w , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 l + 1 , z l + 1 ; s 1 l , s 2 l , s 3 l , s 4 l ) = 1 2 w T G w w , w ¯ k A * D * s 1 l B * s 2 l + B * s 3 l s 4 l + σ 2 D ( A w ) + ξ l + 1 r 1 l + 1 e 2 + σ 2 B w + η l + 1 r 2 l + 1 e 2 + σ 2 B w + γ l + 1 r 3 l + 1 e 2 + σ 2 w z l + 1 2 ,
Thus,
w l + 1 = [ G + σ ( A * D * D A + 2 B * B + I ) ] 1 [ w ¯ k A * D * s 1 l B * s 2 l + B * s 3 l s 4 l σ A * D * ( ξ l + 1 r 1 l + 1 e ) σ B * ( η l + 1 r 2 l + 1 e ) + σ B * ( γ l + 1 r 3 l + 1 e ) + σ z l + 1 ] .
Finally, we compute the update of the relaxation variables r 1 , r 2 , r 3 , z .
r 1 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 , r 2 l , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = δ + ( r 1 ) + D A w l + 1 2 + ξ l + 1 r 1 e , s 1 l + σ 2 D A w l + 1 2 + ξ l + 1 r 1 l e 2 = δ + ( r 1 ) + σ 2 D A w l + 1 2 + ξ l + 1 r 1 e + s 1 l σ 2 = Π + ( D A w l + 1 2 + ξ l + 1 + s 1 l σ e ) .
r 2 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 , r 3 l , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = δ + ( r 2 ) + B w l + 1 2 + η l + 1 r 2 e , s 2 l + + σ 2 B w l + 1 2 + η l + 1 r 2 e 2 = δ + ( r 2 ) + σ 2 B w l + 1 2 + η l + 1 r 2 e + s 2 l σ 2 = Π + ( B w l + 1 2 + η l + 1 + s 2 l σ e ) .
r 3 l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 , z l ; s 1 l , s 2 l , s 3 l , s 4 l ) = δ + ( r 3 ) + B w l + 1 2 + γ l + 1 r 3 e , s 3 l + σ 2 B w l + 1 2 + γ l + 1 r 3 e 2 = δ + ( r 3 ) + σ 2 B w l + 1 2 + γ l + 1 r 3 e + s 3 l σ 2 = Π + ( B w l + 1 2 + γ l + 1 + s 3 l σ e ) .
z l + 1 = arg min L σ ( w l + 1 2 , ξ l + 1 , η l + 1 , γ l + 1 , r 1 l + 1 , r 2 l + 1 , r 3 l + 1 , z ; s 1 l , s 2 l , s 3 l , s 4 l ) = λ 1 z 1 + w l + 1 2 z , s 4 l + σ 2 w l + 1 2 z 2 = λ 1 z 1 + σ 2 w l + 1 2 z + s 4 l σ 2 = P r o x λ 1 · 1 ( w l + 1 2 + s 4 l σ ) .
Lemma 1.
Let
P = D A B B I , Q = I I I , R = I I I I .
Then, P T P , Q T Q and R T R are positive definite.
Theorem 3
([42]). Suppose that ( w l , ξ l , η l , γ l ) is generated by the sGS-ADMM. Then, it converges to a solution of (18).
The proof of Theorem 3 is similar to that in [42], and we omit it here. Before we give the convergence of the DCA algorithm, we first provide a useful lemma.
Lemma 2
([38]). Let r be a proper lower semi-continuous convex function and { x k } be a sequence such that
( i ) x k x ;
( i i ) There exists a bounded sequence { y k } with y k r ( x k ) ;
( i i i ) r ( x * ) is nonempty.
Then, lim k r ( x k ) = r ( x * )
On the basis of the above lemma, we can obtain the following convergence theorem.
Theorem 4
(Convergence of DCA). Suppose the sequence { ( w k , ξ k , η k , γ k ) } is generated by DCA. Then,
( i ) The sequence { F ( w k , ξ k , η k , γ k ) } is monotonously decreasing.
( i i ) Suppose be a accumulation point ( w * , ξ * , η * , γ * ) of the sequence { ( w k , ξ k , η k , γ k ) } ; then, the point ( w * , ξ * , η * , γ * ) is a critical point of the F.
( i i i ) If
w * { 1 λ 1 , 1 λ 1 } , i = 1 , , m 2 + 3 m + 2 2 , η * γ * , i = s + 1 , , n ,
then ( w * , ξ * , η * , γ * ) is a local minimizer of F.
Proof. 
Let a k = ( w k , ξ k , η k , γ k ) , b k = ( w ¯ k , ξ ¯ k , η ¯ k , γ ¯ k ) .
( i ) Since b k H ( a k ) , it follows that H ( a k + 1 ) H ( a k ) + a k + 1 a k , b k . We have
( G H ) ( a k + 1 ) G ( a k + 1 ) a k + 1 a k , b k H ( a k )
Likewise, a k + 1 G * ( b k ) implies G ( a k ) G ( a k + 1 ) + a k a k + 1 , b k .
Therefore,
G ( a k + 1 ) a k + 1 a k , b k H ( a k ) ( G H ) ( a k ) .
Finally, combining (35) and (36), we get
( G H ) ( a k + 1 ) ( H * G * ) ( b k ) ( G H ) ( a k ) .
( i i ) We know that H is a polyhedral convex function. Then,
C 2 e , min { η , γ } = max { C 2 ( u , e u ) , ( η , γ ) : u { 0 , 1 } n s } ,
i = 1 m 2 + 3 m + 2 2 max { 1 , λ | w i | } = max { μ , λ w + e | μ | , e : μ { 1 , 0 , 1 } m 2 + 3 m + 2 2 } .
Combine (38) and (39), and we have
H = max { ( μ λ , 0 , C 2 u , C 2 ( e u ) ) , ( w , ξ , η , γ ) + 1 λ | μ | e , e : u { 0 , 1 } n s , μ { 1 , 0 , 1 } m 2 + 3 m + 2 2 } .
For simplicity, we redefine
H = max { α i , a b i : i I } ,
where a = ( w , ξ , η , γ ) , I = { 1 , , 2 n s × 3 m 2 + 3 m + 2 2 } .
It is clear that H is finite. In this case, the sequence b k is discrete (i.e., it has only finitely many different elements, at most 2 n l × 3 m 2 + 3 m + 2 2 ).
Let
min G ( a ) [ α i , a a i ] ( P i ) .
Solving the subproblem in Step 2 at every iteration is equivalent to solving ( P i ) . Thus, the algorithm will be terminated after at most 2 n l × 3 m 2 + 3 m + 2 2 steps.
Suppose a * is an accumulation point of the sequence { a k } . For the sake of simplicity, we denote
lim k a k = a * .
According to the proof of Theorem 3, we know that every { a k } generated by the sGS-ADMM is bounded, so we can obtain that { a k } is bounded. Thus, with the generation of { b k } , { b k } is bounded.
We can suppose (extracting subsequences if necessary) that the sequence { b k } converges to a point b * H ( a * ) , and according to ( i ) , it follows that
lim k { G ( a k ) + G * ( b k ) a k , b k } = 0
Thus,
lim k { G ( a k ) + G * ( b k ) } = lim k a k , b k = a * , b * .
Set θ ( a , b ) = G ( a ) + G * ( b ) . It is clear that θ is a proper lower semi-continuous convex function. By Lemma 2, we have
θ ( a * , b * ) lim inf k θ ( a k , b k ) = lim k θ ( a k , b k ) = a * , b * ,
that is, θ ( a * , b * ) = G ( a * ) + G * ( b * ) = a * , b * . In other words, b * G ( a * ) .
Thus, a * is a critical point of the F.
( i i i ) The formula (13), the second component of F, is a polyhedral convex function. According to the condition (41), H is differentiable at a * . We define J ( a * ) = { i I : F ( a * ) = G ( a * ) [ α i , a * b i ] } . There also exists at least one j J ( a * such that F is convex. Then, there is a neighborhood U ( a * , 0 ) such that 0 F ( a * ) . Therefore, a * is a local minimizer of F. □

4. Numerical Experiments

In this section, we evaluate the proposed method. All codes are written in MATLAB, and all computations are performed on a Lenovo IdeaPad with Windows 10 Inter(R) Core(TM)i5-6200U CPU @2.30 GHz 2.40 GHz and 4 GB memory.
In particular, we apply this method to a simple example. In this dataset, there are 100 data points in each class, 10 of which are labeled. Circles represent positive classes, triangles represent negative classes, solid ones represent data with labels, and hollow ones represent data without labels. The distributions of the datasets are shown in Figure 1. From Figure 2, we can conclude the good performance of our algorithm.
We draw the separating surfaces derived by solving S S Q L S S 3 V M for the simulated datasets in Figure 2. From Figure 2, we can conclude that our method performs well, and only three negative classes appear on the curved surface.
In this subsection, we perform several numerical experiments on real datasets. We use 10 public datasets (Iris, Skin, Seeds, Pima, Glass, Heart, Hepatitis, Iono, Sonar and BCI) obtained from the UCI Machine Learning Repository and Benchmark datasets. In the details column of the datasets from Table 1, the Iris and Seeds datasets contain multiple classes, and we select only two classes for numerical experiments. For each dataset, we perform 100 trials, each time randomly selecting the labeled data points, while the rest are considered as unlabeled.
To illustrate the effectiveness of our proposed model S S Q L S S 3 V M , as well as the method, several alternative state-of-the-art methods are selected for comparison, such as SSQSSVM, CTSVM, Cut S 3 VM, SVMlin, and LapSVM. First, the misclassification rate of several methods are shown in Table 2. The best results are shown in bold in the table. We observe that the misclassification rates of the S S Q L S S 3 VM achieves the best accuracy rates, that is, the smallest misclassification rates on most of the data sets. In addition, due to the large size of the Skin dataset, SSQSSVM, CTSVM, Cut S 3 VM, and SVMlin can lead to computational memory overflow, which is a common phenomenon for solving the relaxations of the S 3 VM model; we present those results as -. We can also obtain low misclassification rates, which indicates that our proposed method can solve large-scale datasets and also shows the efficiency of our proposed method. The misclassification rates on the Seeds, Glass, and Sonar datasets are only a little higher than the P-ADMM method. Better results are achieved on the remaining other datasets, such the Heart dataset. We achieve 24.40% vs. 28.52%, 30.00%, 31.11%, 37.73%, and 33.33%, respectively, which are improved by 4.12%, 5.60%, 6.71%, 13.33%, and 8.93%, which shows that our proposed method is effective.
The average CPU time of the 100 experiments is reported in Table 3. The best results are shown in bold in the table. We can observe that SVMlin and P-ADMM are much faster than our method. Since our method is a nesting of two algorithms, the inner layer is sGS-ADMM, and the outer layer is DCA, it takes a longer time than the methods such as P-ADMM and SVMlin.
Overall, although our method is not the fastest, it has the best misclassification rate on most of the datasets, which shows the effectiveness of our model and algorithm.

5. Conclusions and Discussion

In this paper, we deal with a strong sparse quadratic kernel-free least squares semi-supervised support vector machine model by adding an 0 norm regularization term to the objective function. We use the DC program (difference of convex function) and DCA (DC algorithm) to solve it. Firstly, we approximate the 0 norm by a polyhedral DC function. Secondly, when solving the subproblem, we used sGS-ADMM due to the existence of the nonsmooth term. Empirical numerical experiments show the efficiency of the proposed algorithm.
Since the 0 norm is highly nonconvex and computationally NP-hard, we use the convex difference method to overcome this problem. The next step is to directly use the 0 norm regularity or the 0 norm constraint to solve the S S Q L S S 3 V M . In addition, the optimality condition of the model is not established here. These will be addressed in our future work.

Author Contributions

Data collection and analysis, J.S.; validation, J.S. and W.Q.; writing—original draft, J.S. and W.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from http://archive.ics.uci.edu/ml/index.php, accessed on 20 January 2022.

Acknowledgments

The authors would like to thank the Associate Editor and the anonymous referee for their helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yang, X.; Tan, L.; He, L. A robust least squares support vector machine for regression and classification with noise. Neurocomputing 2014, 140, 41–52. [Google Scholar] [CrossRef]
  2. Nan, S.; Sun, L.; Chen, B.; Lin, Z.; Toh, K.A. Density-dependent quantized least squares support vector machine for large data sets. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 94–106. [Google Scholar] [CrossRef]
  3. Melki, G.; Kecman, V.; Ventura, S.; Cano, A. OLLAWV: Online learning algorithm using worst-violators. Appl. Soft. Comput. 2018, 66, 384–393. [Google Scholar] [CrossRef]
  4. Sun, J.; Fujita, H.; Zheng, Y.; Ai, W. Multi-class financial distress prediction based on support vector machines integrated with the decomposition and fusion methods. Inform. Sci. 2021, 559, 153–170. [Google Scholar] [CrossRef]
  5. Forestier, G.; Wemmert, C. Semi-supervised learning using multiple clusterings with limited labeled data. Inform. Sci. 2016, 361, 48–65. [Google Scholar] [CrossRef] [Green Version]
  6. Tu, E.; Zhang, Y.; Zhu, L.; Yang, J.; Kasabov, N. A graph-based semi-supervised k nearest-neighbor method for nonlinear manifold distributed data classification. Inform. Sci. 2016, 367, 673–688. [Google Scholar] [CrossRef] [Green Version]
  7. Zhou, Z.H.; Li, M. Semisupervised regression with cotraining-style algorithms. IEEE Trans. Knowl. Data Eng. 2007, 19, 1479–1493. [Google Scholar] [CrossRef] [Green Version]
  8. Xu, S.; An, X.; Qiao, X.; Zhu, L.; Li, L. Semi-supervised least-squares support vector regression machines. J. Inf. Comput. Sci. 2011, 8, 885–892. [Google Scholar]
  9. Vapnik, V.; Sterin, A. On structural risk minimization or overall risk in a problem of pattern recognition. Autom. Rem. Contr. 1977, 10, 1495–1503. [Google Scholar]
  10. Bennett, K.; Demiriz, A. Semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 1999, 11, 368–374. [Google Scholar]
  11. Joachims, T. Transductive inference for text classification using support vector machines. In Proceedings of the Sixteenth International Conference on Machine Learning, Bled, Slovenia, 27–30 June 1999; pp. 200–209. [Google Scholar]
  12. Chapelle, O.; Sindhwani, V.; Keerthi, S.S. Branch and bound for semi-supervised support vector machines. Adv. Neural Inf. Process. Syst. 2007, 19, 217–224. [Google Scholar]
  13. Hoi, S.C.; Jin, R.; Zhu, J.; Lyu, M.R. Semi-supervised SVM batch mode active learning for image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–7. [Google Scholar]
  14. Zhu, X.; Goldberg, A.B. Introduction to semi-supervised learning. Synth. Lect. Artif. Intell. Mach. Learn. 2009, 3, 1–130. [Google Scholar]
  15. Chapelle, O.; Zien, A. Semi-supervised classification by low density separation. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, Bridgetown, Barbados, 6–8 January 2005; pp. 57–64. [Google Scholar]
  16. Li, Y.F.; Kwok, J.T.; Zhou, Z.H. Semi-supervised learning using label Mean. In Proceedings of the 26th International Conference on Machine Learning, Montreal, QC, Canada, 14–18 June 2009; pp. 1–8. [Google Scholar]
  17. Liu, Y.; Xu, Z.; Li, C. Online semi-supervised support vector machine. Inform. Sci. 2018, 439, 125–141. [Google Scholar] [CrossRef]
  18. Cui, L.; Xia, Y. Semi-supervised sparse least squares support vector machine based on Mahalanobis distance. Appl. Intell. 2022. [Google Scholar] [CrossRef]
  19. Issam, D. Quadratic kernel-free non-linear support vector machine. J. Glob. Optim. 2007, 41, 15–30. [Google Scholar]
  20. Yan, X.; Bai, Y.; Fang, S.C.; Luo, J. A kernel-free quadratic surface support vector machine for semi-supervised learning. J. Oper. Res. Soc. 2016, 67, 1001–1011. [Google Scholar] [CrossRef]
  21. Zhan, Y.; Bai, Y.; Zhang, W.; Ying, S. A P-ADMM for sparse quadratic kernel-free least squares semi-supervised support vector machine. Neurocomputing 2018, 306, 37–50. [Google Scholar] [CrossRef]
  22. Tian, Y.; Bian, B.; Tang, X.; Zhou, J. A new non-kernel quadratic surface approach for imbalanced data classification in online credit scoring. Inform. Sci. 2021, 563, 150–165. [Google Scholar] [CrossRef]
  23. Gao, Q.Q.; Bai, Y.Q.; Zhan, Y.R. Quadratic kernel-free least square twin support vector machine for binary classification problems. J. Oper. Res. Soc. China 2019, 7, 539–559. [Google Scholar] [CrossRef]
  24. Tibshirani, R. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 1996, 46, 431–439. [Google Scholar] [CrossRef]
  25. Fan, J.; Li, R. Variable selection via nonconvave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
  26. Pan, L.; Xiu, N.; Fan, J. Optimality conditions for sparse nonlinear programming. Sci. China Math. 2017, 5, 5–22. [Google Scholar] [CrossRef]
  27. Gribonval, R.; Nielsen, M. Sparse representation in union of bases. IEEE Trans. Inf. Theory 2003, 49, 3320–3325. [Google Scholar] [CrossRef] [Green Version]
  28. Zou, H. The adaptive lasso and its oracle properties. J. Am. Stat. Ass. 2006, 101, 1418–1429. [Google Scholar] [CrossRef] [Green Version]
  29. Le Thi, H.A.; Le, H.M.; Nguyen, N.V.; Pham Dinh, T.A. DC programming approach for feature selection in support vector machines learning. J. Adv. Data Anal. Classif. 2008, 2, 259–278. [Google Scholar] [CrossRef]
  30. Le Thi, H.A.; Le, H.M.; Pham Dinh, T. Feature selection in machine learning: An exact penalty approach using a difference of convex function algorithm. Mach. Learn. 2015, 101, 163–186. [Google Scholar] [CrossRef]
  31. Neumann, J.; Schnorr, C.; Steidl, G. Combined SVM-based feature selection and classification. Mach. Learn. 2005, 61, 129–150. [Google Scholar] [CrossRef] [Green Version]
  32. Collobert, R.; Sinz, F.; Weston, J.; Bottou, L. Large scale transductive SVMs. J. Mach. Learn. 2006, 7, 1687–1712. [Google Scholar]
  33. Le Thi, H.A.; Le, H.M.; Pham Dinh, T. Optimization based DC programming and DCA for hierarchical clustering. Eur. J. Oper. Res. 2007, 183, 1067–1085. [Google Scholar]
  34. Liu, Y.; Shen, X.; Doss, H. Multicategory learning and support vector machine: Computational tools. J. Comput. Graph. Stat. 2005, 14, 219–236. [Google Scholar] [CrossRef]
  35. Ronan, C.; Fabian, S.; Jason, W.; Le, B. Trading convexity for scalability. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 201–208. [Google Scholar]
  36. Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 2015. [Google Scholar]
  37. Tao, P.D.; Souad, E.B. Duality in dc (difference of convex functions) optimization. Subgradient methods. Trends Math. Optim. 1988, 84, 277–293. [Google Scholar]
  38. Tao, P.D.; An, L.H. Convex analysis approach to d.c. programming: Theory, algorithm and applications. Acta Math. Vietnam. 1997, 22, 289–355. [Google Scholar]
  39. Le Thi, H.A.; Pham Dinh, T. The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann. Oper. Res. 2005, 133, 23–46. [Google Scholar]
  40. Peleg, D.; Meir, R. A bilinear formulation for vector sparsity optimization. Signal Process. 2008, 88, 375–389. [Google Scholar] [CrossRef]
  41. Ong, C.S.; Le Thi, H.A. Learning sparse classifiers with difference of convex functions algorithms. Optim. Method. Softw. 2013, 28, 830–854. [Google Scholar] [CrossRef]
  42. Chen, L.; Sun, D.; Toh, K.C. An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming. Math. Program. 2017, 161, 1–34. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The distribution of the simulated data.
Figure 1. The distribution of the simulated data.
Mathematics 10 02714 g001
Figure 2. Decision function for the simulated data.
Figure 2. Decision function for the simulated data.
Mathematics 10 02714 g002
Table 1. Details of datasets.
Table 1. Details of datasets.
Data SetsFeatureTotal NumbersLabeled Number
Iris410010
Skin4245,05724,505
Seeds714014
Pima876840
Glass1014615
Heart1227027
Hepatitis19808
Iono3435120
Sonar6020820
BCI11740040
Table 2. Misclassification rates (%) for datasets.
Table 2. Misclassification rates (%) for datasets.
Data SetsP-ADMMSSQSSVMCTSVMCutS 3 VMSVMlinDCA
2circles2.504.505.0020.0036.502.30
Hyperbola1.5010.0011.6713.4345.501.20
Iris0.004.535.008.500.000.00
Skin8.60----5.30
Seeds5.007.866.438.337.145.20
Pima29.81-31.6431.2460.8727.70
Glass0.002.052.0510.813.640.60
Heart28.5230.0031.1137.7333.3324.40
Hepatitis23.7533.7531.2543.2150.0021.40
Iono11.40-9.9721.6531.1311.20
Sonar23.56-25.9636.5446.723.78
BCI42.00-35.2553.1064.1741.60
Remark: - indicates that the computer is out of memory.
Table 3. CPU time(s) for data sets.
Table 3. CPU time(s) for data sets.
DatasetsP-ADMMSSQSSVMCTSVMCutS 3 VMSVMlinDCA
2circles2.41137.465.277.050.153.47
Hyperbola2.36137.605.166.450.154.34
Iris1.3819.402.972.280.103.24
Skin2907.80----3457.42
Seeds2.3192.594.133.000.123.45
Pima11.89-51.5526.902.0815.67
Glass3.02209.492.290.970.115.34
Heart5.191902.908.222.570.298.96
Hepatitis3.59548.571.011.140.116.73
Iono36.33-11.126.350.5050.24
Sonar12.30-5.481.150.5416.58
BCI405.89-12.919.504.25526.73
Remark: - presents that the computer is out of memory.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Sun, J.; Qu, W. DCA for Sparse Quadratic Kernel-Free Least Squares Semi-Supervised Support Vector Machine. Mathematics 2022, 10, 2714. https://doi.org/10.3390/math10152714

AMA Style

Sun J, Qu W. DCA for Sparse Quadratic Kernel-Free Least Squares Semi-Supervised Support Vector Machine. Mathematics. 2022; 10(15):2714. https://doi.org/10.3390/math10152714

Chicago/Turabian Style

Sun, Jun, and Wentao Qu. 2022. "DCA for Sparse Quadratic Kernel-Free Least Squares Semi-Supervised Support Vector Machine" Mathematics 10, no. 15: 2714. https://doi.org/10.3390/math10152714

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop