Next Article in Journal
Semiotics and Epistemology of Physics: Reflections on Language and the Interpretation of Quantum Mechanics
Next Article in Special Issue
Levitin–Polyak Well Posedness for Fuzzy Optimization Problems Through a Linear Ordering
Previous Article in Journal
Application and Evaluation of a Bipolar Improvement-Based Metaheuristic Algorithm for Photovoltaic Parameter Estimation
Previous Article in Special Issue
Resolution-Aware Deep Learning with Feature Space Optimization for Reliable Identity Verification in Electronic Know Your Customer Processes
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss

School of Mathematical Sciences and Geoinformatics, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(3), 549; https://doi.org/10.3390/math14030549
Submission received: 30 December 2025 / Revised: 26 January 2026 / Accepted: 27 January 2026 / Published: 3 February 2026
(This article belongs to the Special Issue Advanced Studies in Mathematical Optimization and Machine Learning)

Abstract

We present a one-parameter family of smooth generalized pinball loss functions to overcome the challenges of non-differentiability, noise sensitivity, and resampling instability inherent in traditional loss functions such as hinge loss. These functions make the objective function in the formulation of the support vector machine (SVM) model twice continuously differentiable and improve model performance by reducing noise sensitivity and preserving the sparsity of the solution. Similarly, a novel twin-bounded support vector machine (TBSVM) model with a smooth generalized pinball loss function is obtained. Furthermore, we compare the performance of the TBSVM with the novel type of smooth loss function against other contemporary approaches, offering a comprehensive assessment of its strengths and limitations by conducting an evaluation with UCI datasets. The experimental results show that the proposed model has the best performance in the TBSVM with RBFSampler. Additionally, we prove that the generalized pinball loss function can be approximated by a novel smooth generalized pinball loss function in the uniform norm with arbitrary precision. We further show that the solutions of the proposed SVM and TBSVM models are unique and that they converge to the solutions of the models with non-smooth generalized pinball loss as the parameter approaches zero.

1. Introduction

The support vector machine (SVM), as introduced by Vapnik and Cortes [1], is a machine learning technique founded on the principles of the Vapnik–Chervonenkis (VC) dimension and structural risk minimization theory within the realm of statistical learning. It is a powerful and widely used supervised machine learning algorithm that is primarily employed for classification and regression tasks, such as text categorization [2], or scene classification [3], to name just a few examples. When applied to classification, this approach employs a strategy of maximizing the distances between two distinct data classes from a separating hyperplane, ensuring the correct classification of the two training datasets with a high level of confidence. Through the introduction of “slack variables” and the “kernel trick”, SVMs are particularly suited for their ability to effectively handle high-dimensional data and complex, non-linear decision boundaries.
The twin support vector machine (TWSVM), an innovative extension of the traditional SVM, has garnered considerable attention for its potential to address complex data distributions. Khemchandani and Chandra [4] proposed the TWSVM, which consists of two non-parallel hyperplanes, each positioned in close proximity to one of the two classes while maintaining a minimum separation distance from the other class. The TWSVM reduces the algorithmic complexity to just a quarter of that of the standard SVM, resulting in a significant reduction of computational time. Shao and his research team [5] modified the TWSVM to the twin-bounded support vector machine (TBSVM), aiming to minimize structural risk and leading to improved general performance.
Creating the hyperplanes in any of the support vector machine models involves solving a constrained convex minimization problem. The most common solution technique is to formulate the dual optimization problem and solve it using the method of Lagrange multipliers. More recently, techniques for solving the primal problem directly have become popular. These methods use a “loss function” to reformulate the problem as an unconstrained optimization problem. Common loss functions include the hinge loss, the pinball loss [6], and the generalized pinball loss functions [7], listed in order of complexity. Because these loss functions are not differentiable, efficient numerical methods such as gradient descent or Newton methods cannot be applied, or cannot be applied based on a solid theoretical foundation. A theoretical analysis generally requires that the objective functions be of C 2 type, that is, twice continuously differentiable.
In 2023, Makmuang, Ratiphaphongthon, and Wangkeeree introduced a C 1 -smooth approximation to the generalized pinball loss function within the standard SVM framework [8]. The results demonstrate that, on average, the proposed method exhibits superior performance compared to the baseline models. Similarly, Kai and Zhen [9] introduced a smooth approximation to the pinball loss function in the twin-bounded support vector machine model to mitigate the noise sensitivity and resampling instability associated with the hinge loss function.
A variety of further loss functions have been proposed in the literature. In [10], a piecewise-quadratic loss is introduced that smooths the pinball loss and belongs to the class of C 1 -functions. In [11], a truncated e p s i l o n -insensitive pinball loss is investigated, which is only piecewise C 1 . In [12], the pinball loss is modified by an S-curve to obtain a family of C 1 loss functions depending on three parameters. In [13], a quartic truncated pinball loss is introduced, which is C 1 and bounded; hence, it is non-convex. Furthermore, in [14], a rescaled huberized pinball loss is discussed, which again is C 1 , bounded, and non-convex. Finally, recent studies [15] have explored interesting hybrid data–physics loss formulations, which incorporate physical constraints or domain knowledge into the training process. While such approaches may offer improved interpretability, they are tailored to specific applications and require knowledge of the physical setting. Thus, there is a scarcity of C 2 -smooth convex loss functions that approximate the pinball loss and can be applied to a variety of data.
The main contributions of this work can be summarized as follows. First, we propose a novel one-parameter family of C 2 -smooth loss functions, rendering the objective function in the unconstrained formulation of the SVM C 2 -smooth as well. We further show that the novel smooth functions can approximate the non-smooth generalized pinball loss function with arbitrary precision in the uniform norm. Second, the proposed smooth loss is incorporated into the twin-bounded support vector machine framework, for which we rigorously prove the uniqueness of solutions and their convergence to the solutions obtained from the non-smooth generalized pinball loss as the smoothing parameter approaches zero. Third, unlike existing smooth pinball or generalized pinball loss formulations, our work provides a complete theoretical foundation for both SVM and TBSVM models, linking smooth loss approximation, optimization stability, and solution behavior. Finally, numerical experiments on benchmark datasets demonstrate that the proposed approach achieves competitive accuracy with stable training behavior.
By comparing the TBSVM equipped with this novel loss function against other contemporary approaches, this study aims to provide a comprehensive perspective on the strengths and limitations of this methodology. It primarily concentrates on conducting an extensive comparative analysis to investigate the impact of various loss functions and their generalizations on model performance. Additionally, it proves that the generalized pinball loss function can be arbitrarily approximated by one of the smooth loss functions in the uniform norm.
The remainder of this paper is organized as follows. Section 2 reviews background material on SVMs, TWSVMs, TBSVMs, and loss functions. Section 3 introduces the proposed smooth generalized pinball loss and presents theoretical analysis. Section 4 reports on numerical experiments, and Section 5 concludes the paper.

2. Background and Literature Review

This section introduces the fundamental mathematical concepts in machine learning as they pertain to the support vector machine models used in this work.

2.1. The Support Vector Machine

Let X = { ( x i , y i ) : i = 1 , 2 , , m } be a training data set, where x i R n and y i { + 1 , 1 } . Here, m denotes the number of data samples, n the number of features, and y i the class to which a data sample belongs. A linear support vector machine (SVM) finds the optimal hyperplane
f x = w T x + b = 0 ,
which separates the data into two classes and has the largest distance from each of the two classes; this distance is called the margin. The vector w is normal to the hyperplane and determines its direction in n-space, while the offset b reflects its distance from the origin. The decision function of an SVM, which determines the class a data sample x belongs to, is obtained from the sign function (sgn) by
class ( x ) = sgn ( w T x + b ) .
In practice, it may not be possible to properly separate the two data classes by means of a single hyperplane. One thus allows for misclassification of some of the training data samples, and can find the hyperplane by solving an unconstrained optimization problem:
min w , b 1 2 w 2 + b 2 + C m i = 1 m L hinge 1 y i ( w T x i + b ) ,
where the function L hinge is the hinge loss function defined on the real line by
L hinge ( u ) = max { 0 , u } = u , u 0 , 0 , u < 0 .
The term on the right of (2) reflects the amount of misclassification, and the parameter C > 0 manages the equilibrium between maximizing the margin and minimizing classification errors.

2.2. The Twin Support Vector Machine

Consider a binary classification dataset X, which contains m 1 positive (class + 1 ) and m 2 negative (class 1 ) samples, respectively, with m 1 + m 2 = m . In a twin support vector machine (TWSVM), two non-parallel hyperplanes are located in a manner where each hyperplane is nearer to one of the two classes and maintains a minimum margin of at least one from the other. The two nonparallel classification hyperplanes take the following form:
f 1 ( x ) = w 1 T x + b 1 = 0 and f 2 ( x ) = w 2 T x + b 2 = 0 ,
and need to separate the samples correctly. Thus, the TWSVM is determined by the following two unconstrained optimization problems:
min w 1 , b 1 1 2 i = 1 m 1 ( w 1 T x i ( 1 ) + b 1 ) 2 + c 1 m 2 i = 1 m 2 L hinge 1 + ( w 1 T x i ( 2 ) + b 1 ) ,
and
min w 2 , b 2 1 2 i = 1 m 2 ( w 2 T x i ( 2 ) + b 2 ) 2 + c 2 m 1 i = 1 m 1 L hinge 1 ( w 2 T x i ( 1 ) + b 2 ) ,
where x i ( 1 ) ( i = 1 m 1 ) and x i ( 2 ) ( i = 1 m 2 ) denote the data samples in the positive and negative class, respectively. The decision function of the TWSVM is
class ( x ) = sign w 1 T x + b 1 w 1 + w 2 T x + b 2 w 2 .

2.3. The Twin-Bounded Support Vector Machine

The objective function of the TWSVM saw the integration of a regularization term by Shao and his research team in 2011 [5], resulting in the introduction of a novel machine learning model called the twin-bounded support vector machine (TBSVM). It is formulated as
min w 1 , b 1 1 2 i = 1 m 1 ( w 1 T x i ( 1 ) + b 1 ) 2 + c 3 2 ( w 1 2 + b 1 2 ) + c 1 m 2 i = 1 m 2 L hinge 1 + ( w 1 T x i ( 2 ) + b 1 ) ,
and
min w 2 , b 2 1 2 i = 1 m 2 ( w 2 T x i ( 2 ) + b 2 ) 2 + c 4 2 ( w 2 2 + b 2 2 ) + c 2 m 1 i = 1 m 1 L hinge 1 ( w 2 T x i ( 1 ) + b 2 ) .
A survey on various twin support machine variants can be found in [16].

2.4. Loss Functions

The hinge loss function L hinge ( u ) is unstable for resampling and sensitive to noise. To resolve this problem, in 2013, Huang, Shi, and Suykens [6] modified it to the pinball loss function in the SVM classifier, and demonstrated its effectiveness in mitigating noise sensitivity through experiments. The pinball loss function is defined as follows
L τ ( u ) = u , u 0 , τ u , u < 0 ,
where τ is a non-negative real parameter. However, the pinball loss function still shows some noise sensitivity. Thus, the ϵ -insensitive pinball loss function was created, defined as
L τ ϵ ( u ) = u ϵ , u > ϵ , 0 , ϵ τ u ϵ , τ ( u + ϵ τ ) , u < ϵ τ ,
where τ , ϵ are non-negative real values. Recently, Rastogi, Pal, and Chandra [7] introduced a novel loss function generalizing the ϵ -insensitive pinball loss by introducing one more parameter. This is generalized pinball loss; it is defined by
L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) = τ 1 ( u ϵ 1 τ 1 ) , u > ϵ 1 τ 1 , 0 , ϵ 2 τ 2 u ϵ 1 τ 1 , τ 2 ( u + ϵ 2 τ 2 ) , u < ϵ 2 τ 2 ,
where τ 1 , τ 2 , ϵ 1 , ϵ 2 are non-negative parameters.
Makmuang, Ratiphaphongthon, and Wangkeeree [8] introduced a new C 1 -smooth loss function that approximates the generalized pinball loss function for use in the SVM, which is given by
Q τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) = τ 1 ( u ϵ 1 τ 1 ) τ 1 2 2 μ , ϵ 1 τ 1 + τ 1 μ u , 1 2 μ ( u ϵ 1 τ 1 ) 2 , ϵ 1 τ 1 u ϵ 1 τ 1 + τ 1 μ , 0 , ϵ 2 τ 2 u ϵ 1 τ 1 , 1 2 μ ( u + ϵ 2 τ 2 ) 2 , ϵ 2 τ 2 τ 2 μ u ϵ 2 τ 2 , τ 2 ( u + ϵ 2 τ 2 ) τ 2 2 2 μ , u ϵ 2 τ 2 τ 2 μ ,
where τ 1 , τ 2 , ϵ 1 , ϵ 2 are non-negative real values, u R , and μ R + is a parameter of how close Q τ 1 , τ 2 ϵ 1 , ϵ 2 ( · , μ ) approximates L τ 1 , τ 2 ϵ 1 , ϵ 2 ( · ) .
Alternatively, Kai and Zhen [9] proposed an infinitely differentiable approximation of the pinball loss function within the twin-bounded support vector machine model to mitigate the noise sensitivity and resampling instability inherent in the pinball loss function. Their smooth approximation of the pinball loss function is defined as
ϕ τ ( u , ϵ ) = u + u 2 + 4 ϵ 2 2 + τ u + τ 2 u 2 + 4 ϵ 2 2 ,
where τ , ϵ are non-negative real parameter values.

3. Proposed Work

This section proposes a novel one-parameter family of C 2 loss functions, presents the proof of uniform convergence to the generalized pinball loss function with decreasing parameter, and develops and analyzes an SVM model and a TBSVM model incorporating the novel loss.

3.1. The Proposed Smooth Loss Function

To smooth the generalized pinball loss L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) , we define functions P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) : R × R + R + , which depend on the real variable u and are parameterized by an approximation parameter μ > 0 , by
P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) = τ 1 ( u ϵ 1 τ 1 ) 5 8 τ 1 2 μ , ϵ 1 τ 1 + τ 1 μ < u , 5 8 τ 1 2 μ 3 ( u ϵ 1 τ 1 ) 4 1 4 τ 1 4 μ 5 ( u ϵ 1 τ 1 ) 6 , ϵ 1 τ 1 < u ϵ 1 τ 1 + τ 1 μ , 0 , ϵ 2 τ 2 u ϵ 1 τ 1 , 5 8 τ 2 2 μ 3 ( u + ϵ 2 τ 2 ) 4 1 4 τ 2 4 μ 5 ( u + ϵ 2 τ 2 ) 6 , ϵ 2 τ 2 τ 2 μ u < ϵ 2 τ 2 , τ 2 ( u + ϵ 2 τ 2 ) 5 8 τ 2 2 μ , u < ϵ 2 τ 2 τ 2 μ .
where the parameters τ 1 , τ 2 , ϵ 1 , ϵ 2 take non-negative real values. Note that when μ = 0 , we recover the generalized pinball loss L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) . Figure 1 displays the graphic representation of P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) for various values of the parameter μ , with τ 1 , τ 2 , ϵ 1 , ϵ 2 fixed ( τ 1 , τ 2 , ϵ 1 , ϵ 2 = 1 ).
It is easy to verify that the P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) are C 2 -functions and belong to a category of smoothing functions for L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) .  Figure 1 hints that the mapping P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) converges to L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) as μ 0 + , which we will prove.
In the support vector machine models described below, the parameters τ 1 and τ 2 control the asymmetric penalties assigned to positive and negative misclassification errors, respectively, while ε 1 and ε 2 determine the width of the insensitivity region and thus tolerance to noise. The smoothing parameter μ governs the trade-off between approximation accuracy and the numerical smoothness: smaller values of μ yield a closer approximation to the original generalized pinball loss, whereas larger values improve numerical stability during optimization.
In the following proofs, we will use the symbols L ( u ) and P ( u , μ ) in place of L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) and P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) , respectively, since the parameters τ 1 , τ 2 , ϵ 1 , ϵ 2 remain fixed.
The next theorem shows that our proposed smooth loss functions indeed approach the generalized pinball loss uniformly as the parameter μ tends to zero.
Theorem 1.
Let L ( u ) and P ( u , μ ) be defined as in (11) and (14), respectively. Then
(i)  
for all u R , 0 L ( u ) P ( u , μ ) 5 8 τ 0 2 μ , where τ 0 = max { τ 1 , τ 2 } ,
(ii)
lim μ 0 + P ( u , μ ) = L ( u ) uniformly on R .
Proof. 
(i) Let ϵ 1 , ϵ 2 0 , τ 1 , τ 2 > 0 be fixed and μ > 0 . In light of the definitions of L ( u ) and P ( u , μ ) , we divide the proof into five cases, depending on the values of the variable u.
  • Case 1: ϵ 1 τ 1 + τ 1 μ < u . We obtain
    L ( u ) P ( u , μ ) = τ 1 ( u ϵ 1 τ 1 ) τ 1 ( u ϵ 1 τ 1 ) 5 8 τ 1 2 μ = 5 8 τ 1 2 μ 5 8 τ 0 2 μ .
  • Case 2: ϵ 1 τ 1 < u ϵ 1 τ 1 + τ 1 μ . We have
    L ( u ) P ( u , μ ) = τ 1 ( u ϵ 1 τ 1 ) 5 8 τ 1 2 μ 3 ( u ϵ 1 τ 1 ) 4 1 4 τ 1 4 μ 5 ( u ϵ 1 τ 1 ) 6 .
    Set I ( u ) = L ( u ) P ( u , μ ) , we obtain that
    I ( u ) = τ 1 5 2 τ 1 2 μ 3 ( u ϵ 1 τ 1 ) 3 3 2 τ 1 4 μ 5 ( u ϵ 1 τ 1 ) 5 .
    Let x = u ϵ 1 τ 1 . Then, x lies in the interval ( 0 , τ 1 μ ] , and I ( x ) = τ 1 x 5 8 τ 1 2 μ 3 x 4 1 4 τ 1 4 μ 5 x 6 , so that I ( x ) = τ 1 5 2 τ 1 2 μ 3 x 3 3 2 τ 1 4 μ 5 x 5 . The critical points of the function I ( x ) are x = 0 , τ 1 μ and τ 1 μ , so that I ( x ) is monotone on [ 0 , τ 1 μ ] . As I ( 0 ) = τ 1 > 0 and I ( τ 1 μ ) = 0 , therefore, I ( x ) 0 on 0 , τ 1 μ , and hence I ( u ) is an increasing function on ϵ 1 τ 1 < u ϵ 1 τ 1 + τ 1 μ . In particular,
    L ( u ) P ( u , μ ) I ( u ) | u = ϵ 1 τ 1 + τ 1 μ = 5 8 τ 1 2 μ 5 8 τ 0 2 μ .
  • Case 3: ϵ 2 τ 2 u ϵ 1 τ 1 . We obtain
    L ( u ) P ( u , μ ) = 0 .
  • Case 4: ϵ 2 τ 2 τ 2 μ u < ϵ 2 τ 2 . Proceeding similarly to Case 2, we obtain that I ( u ) is a decreasing function on ϵ 2 τ 2 τ 2 μ u ϵ 2 τ 2 . Then
    L ( u ) P ( u , μ ) I ( ϵ 2 τ 2 τ 2 μ ) = 5 8 τ 2 2 μ 5 8 τ 0 2 μ .
  • Case 5: u < ϵ 2 τ 2 τ 2 μ . Proceeding similarly to Case 1, we obtain
    L ( u ) P ( u , μ ) = τ 2 ( u + ϵ 2 τ 2 ) τ 2 ( u + ϵ 2 τ 2 ) 5 8 τ 2 2 μ = 5 8 τ 2 2 μ 5 8 τ 0 2 μ .
    All five cases show that
    0 sup u R L ( u ) P ( u , μ ) 5 8 τ 0 2 μ .
(ii) By the Squeeze Theorem, it follows that
lim μ 0 + P ( u , μ ) = L ( u ) ,
uniformly on R , as μ 0 + . □

3.2. The Support Vector Machine with Smooth Loss Function

Consider a dataset X = { ( x i , y i ) | i = 1 , 2 , , m } , x i R n , y i { + 1 , 1 } . We replace the usage of the hinge loss function L hinge ( u ) with the proposed smooth approximation P ( u , μ ) , where μ is the parameter. The SVM model (2) changes to the optimization problem
min ω φ μ ( ω ) : = 1 2 ω 2 + C m i = 1 m P ( 1 y i ω T x i , μ ) ,
where ω = [ w T , b ] T R n + 1 , x i = [ x i , 1 ] T .
In the following, we will demonstrate that each optimization problem in the family defined by (16) admits a unique solution, and that the sequence of solutions for this family converges to the solution of the exact problem:
min ω ψ 0 ( ω ) : = 1 2 ω 2 + C m i = 1 m L ( 1 y i ω T x i ) ,
as μ 0 + . The next theorem shows that the objective function of (16) converges to the objective function of (17) uniformly as μ 0 + .
Theorem 2.
Let τ 0 = max { τ 1 , τ 2 } , then for all ω R n + 1 and μ > 0 ,
ψ 0 ( ω ) φ μ ( ω ) C 5 8 τ 0 2 μ .
Proof. 
For all ω R n + 1 as L ( u ) P ( u , μ ) 0 ,
0 ψ 0 ( ω ) φ μ ( ω ) = C m i = 1 m L ( 1 y i ω T x i ) C m i = 1 m P ( 1 y i ω T x i , μ ) = C m i = 1 m [ L ( u i ) P ( u i , μ ) ]       [ Note that u i = 1 y i ω T x i ] C m i = 1 m 5 8 τ 0 2 μ       [ By ( 15 ) in Theorem 1 ] = C 5 8 τ 0 2 μ .
It follows that ψ 0 ( ω ) φ μ ( ω ) C 5 8 τ 0 2 μ . □
Finally, we show that the solution of (16) converges to the solution of (17) as μ 0 + .
Theorem 3.
Let φ μ ( ω ) and ψ 0 ( ω ) be defined as in (16) and (17), respectively, and let ω * be an optimal solution of problem (17). Then:
(i)   
There exists a unique solution ω μ * of problem (16);
(ii)  
ω μ * ω * 2 2 C 5 8 τ 0 2 μ ;
(iii)
ω μ * ω * as μ 0 + .
Proof. 
Let ϵ 1 , ϵ 2 0 , τ 1 , τ 2 > 0 be fixed.
(i) Let μ > 0 be arbitrary, but given. For c R , the set
S c ( φ μ ) : = { ω R n + 1 : φ μ ( ω ) c }
is called a sublevel set. Clearly, we can pick c > 0 so that S c ( φ μ ) Ø . Since φ μ is continuous, the sublevel set S c ( φ μ ) is closed. By the definition of φ μ ,
S c ( φ μ ) { ω R n + 1 : ω T ω = ω 2 2 c } .
Since the set { ω R n + 1 : ω T ω = ω 2 2 c } is bounded, it follows that S c ( φ μ ) is bounded. By S c ( φ μ ) closed and bounded, we obtain that S c ( φ μ ) is a compact set in R n + 1 . By the Extreme Value Theorem, it follows that φ μ has a minimizer ω μ * on S c ( φ μ ) , i.e., Problem (16) has a solution. On the other hand, P ( u , μ ) is convex and u i = 1 y i ω T x i is affine in ω ; therefore, C m i = 1 m P ( 1 y i ω T x i , μ ) is convex. As 1 2 ω 2 is strongly convex, it follows that φ μ ( ω ) is also strongly convex. By strong convexity, we achieve the uniqueness of the solution of problem (16).
(ii) Let ω μ * and ω * be optimal solutions of (16) and (17), respectively.
Let φ μ ( ω ) be the gradient of φ μ ( ω ) and 0 ( ω ) be subgradients of ψ 0 ( ω ) ; that is,
0 ( ω ) ψ 0 ( ω ) = ω C i = 1 m y i x i L ( u i ) , [ u i = 1 y i ω T x i ]
where
L ( u ) = { τ 1 } , ϵ 1 τ 1 < u , 0 , τ 1 , u = ϵ 1 τ 1 , { 0 } , ϵ 2 τ 2 u ϵ 1 τ 1 , τ 2 , 0 , u = ϵ 2 τ 2 , { τ 2 } , u < ϵ 2 τ 2 .
By strong convexity with parameter 1, we obtain that
ψ 0 ( ω μ * ) ψ 0 ( ω * ) ( ω μ * ω * ) 0 ( ω * ) + 1 2 ω μ * ω * 2 2 ,
and
φ μ ( ω * ) φ μ ( ω μ * ) ( ω * ω μ * ) φ μ ( ω μ * ) + 1 2 ω μ * ω * 2 2 .
The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we obtain
ψ 0 ( ω μ * ) ψ 0 ( ω * ) 1 2 ω μ * ω * 2 2 ,
and
φ μ ( ω * ) φ μ ( ω μ * ) 1 2 ω μ * ω * 2 2 .
By Theorem 1, we have
ψ 0 ( ω ) φ μ ( ω ) = C m i = 1 m L ( u i ) C m i = 1 m P ( u i , μ ) 0 , ω .
Consider
ω μ * ω * 2 2 ψ 0 ( ω μ * ) ψ 0 ( ω * ) + φ μ ( ω * ) φ μ ( ω μ * ) = ψ 0 ( ω μ * ) φ μ ( ω μ * ) ψ 0 ( ω * ) φ μ ( ω * ) ψ 0 ( ω μ * ) φ μ ( ω μ * ) .
and by Theorem 2, we achieve
0 ω μ * ω * 2 2 ψ 0 ( ω μ * ) φ μ ( ω μ * ) = | ψ 0 ( ω μ * ) φ μ ( ω μ * ) | ψ 0 φ μ C 5 8 τ 0 2 μ .
(iii) By the above and the Squeeze Theorem, we obtain that ω μ * ω * as μ 0 + . □

3.3. The Twin-Bounded Support Vector Machine with Smooth Loss Function

The combination of the TBSVM with the new smooth generalized pinball loss function P ( u , μ ) is designed to improve the algorithm’s performance in handling complex or noisy data, which traditional loss functions (such as the hinge loss) often struggle with. Consequently, a novel TBSVM with smooth generalized pinball loss is introduced to enhance the algorithm’s ability to manage noise sensitivity and imbalanced data. This approach not only improves the model’s predictive performance but also ensures faster and more stable computations during the optimization process.
The formulation of the TBSVM model (7) and (8) can be simplified to
min ω 1 Φ 0 ( 1 ) ( ω 1 ) : = 1 2 i = 1 m 1 ( ω 1 T x i ( 1 ) ) 2 + c 3 ω 1 2 + c 1 m 2 i = 1 m 2 L ( 1 + ω 1 T x i ( 2 ) ) ,
min ω 2 Φ 0 ( 2 ) ( ω 2 ) : = 1 2 i = 1 m 2 ( ω 2 T x i ( 2 ) ) 2 + c 4 ω 2 2 + c 2 m 1 i = 1 m 1 L ( 1 ω 2 T x i ( 1 ) ) ,
where ω 1 = [ w 1 T , b 1 ] T , ω 2 = [ w 2 T , b 2 ] T and x i ( 1 ) = [ x i ( 1 ) , 1 ] T , x i ( 2 ) = [ x i ( 2 ) , 1 ] T .
We replace the right-most terms with the new smooth loss function P ( · , μ ) , and obtain the following structure:
min ω 1 ϕ μ ( 1 ) ( ω 1 ) : = 1 2 i = 1 m 1 ( ω 1 T x i ( 1 ) ) 2 + c 3 ω 1 2 + c 1 m 2 i = 1 m 2 P ( 1 + ω 1 T x i ( 2 ) , μ ) ,
min ω 2 ϕ μ ( 2 ) ( ω 2 ) : = 1 2 i = 1 m 2 ( ω 2 T x i ( 2 ) ) 2 + c 4 ω 2 2 + c 2 m 1 i = 1 m 1 P ( 1 ω 2 T x i ( 1 ) , μ ) .
Theorem 4.
Let τ 0 = max { τ 1 , τ 2 } . Then Φ 0 ( 1 ) ( ω 1 ) ϕ μ ( 1 ) ( ω 1 ) c 1 5 8 τ 0 2 μ and Φ 0 ( 2 ) ( ω 2 ) ϕ μ ( 2 ) ( ω 2 ) c 2 5 8 τ 0 2 μ for ω 1 , ω 2 R n + 1 .
Proof. 
Recall from the proof of Theorem 1 that L ( u ) P ( u , μ ) 5 8 τ 0 2 μ , u R . Then
0 Φ 0 ( 1 ) ( ω 1 ) ϕ μ ( 1 ) ( ω 1 ) = c 1 m 2 i = 1 m 2 L ( ω 1 T x i ( 2 ) + 1 ) c 1 m 2 i = 1 m 2 P ( 1 + ω 1 T x i ( 2 ) , μ ) = c 1 m 2 i = 1 m 2 [ L ( u i ) P ( u i , μ ) ]       [ Note that u i = 1 + ω 1 T x i ( 2 ) ] c 1 m 2 i = 1 m 2 5 8 τ 0 2 μ = c 1 5 8 τ 0 2 μ ,
that is, Φ 0 ( 1 ) ( ω 1 ) ϕ μ ( 1 ) ( ω 1 ) c 1 5 8 τ 0 2 μ .
In a similar way, one shows that Φ 0 ( 2 ) ( ω 2 ) ϕ μ ( 2 ) ( ω 2 ) c 2 5 8 τ 0 2 μ . □
Theorem 5.
Let ω 1 * and ω 2 * be the optimal solutions of problems (18) and (19), respectively. Then
(i)   
there exist unique solutions of problems (20) and (21), denoted ω 1 μ and ω 2 μ , respectively.
(ii)  
ω 1 μ ω 1 * 2 2 c 1 5 8 τ 0 2 μ and ω 2 μ ω 2 * 2 2 c 2 5 8 τ 0 2 μ ,
(iii)
ω 1 μ ω 1 * and ω 2 μ ω 2 * as μ 0 + .
Proof. 
(i) Let μ > 0 be arbitrary, but fixed. Pick ν > 0 , so that the sublevel set
S ν ( ϕ μ ( 1 ) ) : = { ω 1 R n + 1 : ϕ μ ( 1 ) ( ω 1 ) ν }
is not empty. Since ϕ μ ( 1 ) ( ω 1 ) is continuous, the sublevel set S ν ( ϕ μ ( 1 ) ) is closed. Furthermore, by the definition of ϕ μ ( 1 ) ,
S ν ( ϕ μ ( 1 ) ) { ω 1 R n + 1 : ω 1 T ω 1 = ω 1 2 ν c 3 } .
Since the set { ω 1 R n + 1 : ω 1 T ω 1 = ω 1 2 ν c 3 } is bounded, then S ν ( ϕ μ ( 1 ) ) is bounded. By S ν ( ϕ μ ( 1 ) ) closed and bounded, we obtain that S ν ( ϕ μ ( 1 ) ) is a compact set in R n + 1 . By the Extreme Value Theorem, a solution to problem (20) exists, which we denote by ω 1 μ . As before, since P ( u , μ ) is convex, then c 1 m 2 i = 1 m 2 P ( u i , μ ) is convex.
Next, we show that the first term in (20) is convex in ω 1 . In fact, as the function y = x 2 is convex, then
t v 1 + ( 1 t ) v 2 T x i ( 1 ) 2 = t v 1 T x i ( 1 ) + ( 1 t ) v 2 T x i ( 1 ) 2 t ( v 1 T x i ( 1 ) ) 2 + ( 1 t ) ( v 2 T x i ( 1 ) ) 2
v 1 , v 2 R n + 1 , t [ 0 , 1 ] . This shows that ω 1 ( ω 1 T x i ( 1 ) ) 2 is convex i , so that 1 2 i = 1 m 1 ( ω 1 T x i ( 1 ) ) 2 is convex. As c 3 ω 1 2 is strongly convex and the remaining terms in (20) are convex, it follows that ϕ μ ( 1 ) ( ω 1 ) is also strongly convex. By strong convexity, we obtain that the solution of problem (20) is unique. For the uniqueness of the solution of problem (21), we proceed in a similar way to obtain a unique solution ω 2 μ .
(ii) Let ω 1 * and ω 1 μ be the optimal solutions of (18) and (20), respectively.
Let ϕ μ ( 1 ) ( ω 1 ) be the gradient of ϕ μ ( 1 ) ( ω 1 ) and 0 ( 1 ) be a subgradient of Φ 0 ( 1 ) ( ω 1 ) , that is
0 ( 1 ) ( ω 1 ) Φ 0 ( 1 ) ( ω 1 ) = i = 1 m 1 x i ( 1 ) x i ( 1 ) T ω 1 + 2 c 3 ω 1 + c 1 m 2 i = 1 m 2 x i ( 2 ) L ( 1 + ω 1 T x i ( 2 ) ) .
By strong convexity with parameter 1, we obtain that
Φ 0 ( 1 ) ( ω 1 μ ) Φ 0 ( 1 ) ( ω 1 * ) ( ω 1 μ ω 1 * ) 0 ( 1 ) ( ω 1 * ) + 1 2 ω 1 μ ω 1 * 2 2 ,
and
ϕ μ ( 1 ) ( ω 1 * ) ϕ μ ( 1 ) ( ω 1 μ ) ( ω 1 * ω 1 μ ) ϕ μ ( 1 ) ( ω 1 μ ) + 1 2 ω 1 * ω 1 μ 2 2 .
The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we achieve
Φ 0 ( 1 ) ( ω 1 μ ) Φ 0 ( 1 ) ( ω 1 * ) 1 2 ω 1 μ ω 1 * 2 2 ,
and
ϕ μ ( 1 ) ( ω 1 * ) ϕ μ ( 1 ) ( ω 1 μ ) 1 2 ω 1 * ω 1 μ 2 2 = 1 2 ω 1 μ ω 1 * 2 2 .
By Theorem 1, we have
Φ 0 ( 1 ) ( ω 1 ) ϕ μ ( 1 ) ( ω 1 ) = c 1 m 2 i = 1 m 2 L ( u i ( 1 ) ) c 1 m 2 i = 1 m 2 P ( u i ( 1 ) , μ ) 0 ,
where u i ( 1 ) = 1 + ω 1 T x i ( 2 ) .
Consider
ω 1 μ ω 1 * 2 2 Φ 0 ( 1 ) ( ω 1 μ ) Φ 0 ( 1 ) ( ω 1 * ) + ϕ μ ( 1 ) ( ω 1 * ) ϕ μ ( 1 ) ( ω 1 μ ) = Φ 0 ( 1 ) ( ω 1 μ ) ϕ μ ( 1 ) ( ω 1 μ ) Φ 0 ( 1 ) ( ω 1 * ) ϕ μ ( 1 ) ( ω 1 * ) Φ 0 ( 1 ) ( ω 1 μ ) ϕ μ ( 1 ) ( ω 1 μ ) ,
and by Theorem 4, we obtain
0 ω 1 μ ω 1 * 2 2 Φ 0 ( 1 ) ( ω 1 μ ) ϕ μ ( 1 ) ( ω 1 μ ) = | Φ 0 ( 1 ) ( ω 1 μ ) ϕ μ ( 1 ) ( ω 1 μ ) | Φ 0 ( 1 ) ϕ μ ( 1 ) c 1 5 8 τ 0 2 μ .
The inequality ω 2 μ ω 2 * 2 2 c 2 5 8 τ 0 2 μ , can be proven in the same way.
(iii) By the above arguments and the Squeeze Theorem, we have ω 1 μ ω 1 * as μ 0 + . Similarly, we obtain ω 2 μ ω 2 * as μ 0 + . □
In summary, the above theorems have established the convergence properties of the proposed smooth loss within the TBSVM framework, demonstrating both theoretical soundness and practical relevance. From a practical viewpoint, the established uniform convergence guarantees predictable behavior of the model as the smoothing parameter decreases, while the uniqueness of solutions implies robustness with respect to initialization and numerical perturbations. These properties are particularly important for large-scale or noisy datasets, where ill-posed optimization problems may otherwise lead to unstable training or inconsistent classification results.

3.4. Quasi-Newton Smooth Generalized Pinball Twin-Bounded Support Vector Machine

The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method is one of the most widely used quasi-Newton algorithms, named after its developers. The focus of this section is on the application of the BFGS method, which we used to solve a strongly convex differentiable problem. This approach is used in optimization to find the minimum of a strongly convex and C 2 function. As before, we focus on the strongly convex differentiable problem (20):
min ω 1 ϕ μ ( 1 ) ( ω 1 ) : = 1 2 i = 1 m 1 ( ω 1 T x i ( 1 ) ) 2 + c 3 ω 1 2 + c 1 m 2 i = 1 m 2 P ( 1 + ω 1 T x i ( 2 ) , μ )
If ω 1 k denotes the value of ω 1 obtained in the k-th iteration step, then the gradient of the objective function ϕ μ ( 1 ) at ω 1 k is
ϕ μ ( 1 ) ( ω 1 k ) = i = 1 m 1 x i ( 1 ) x i ( 1 ) T ω 1 k + 2 c 3 ω 1 k + c 1 m 2 i = 1 m 2 x i ( 2 ) P ( u i k , μ ) ,
where u i k = 1 + ω 1 k T x i ( 2 ) and the partial derivative of P ( u , μ ) with respect to the variable u is
P ( u , μ ) = τ 1 , ϵ 1 τ 1 + τ 1 μ < u , 5 2 τ 1 2 μ 3 ( u ϵ 1 τ 1 ) 3 3 2 τ 1 4 μ 5 ( u ϵ 1 τ 1 ) 5 , ϵ 1 τ 1 < u ϵ 1 τ 1 + τ 1 μ , 0 , ϵ 2 τ 2 u ϵ 1 τ 1 , 5 2 τ 2 2 μ 3 ( u + ϵ 2 τ 2 ) 3 3 2 τ 2 4 μ 5 ( u + ϵ 2 τ 2 ) 5 , ϵ 2 τ 2 τ 2 μ u < ϵ 2 τ 2 , τ 2 , u < ϵ 2 τ 2 τ 2 μ .
The BFGS method is outlined as follows:
ω 1 k + 1 = ω 1 k + α k d k .
where d k = F ( x k ) 1 f ( x k ) and α k is determined by the Armijo condition.
Let B k be an approximation of the Hessian matrix 2 ϕ μ ( 1 ) ( ω 1 k ) . For the next iteration, the Hessian matrix is updated by the Sherman–Morrison–Woodbury formula, that is,
B k + 1 = B k B k Δ x k Δ x k T B k Δ x k T B k Δ x k + Δ g k Δ g k T Δ x k T Δ g k .
where
Δ x k = ω 1 k + 1 ω 1 k ,
and
Δ g k = ϕ μ ( 1 ) ( ω 1 k + 1 ) ϕ μ ( 1 ) ( ω 1 k ) .
The BFGS algorithm for the strongly convex differentiable problem (21) can be computed similarly to the problem (20).

3.5. The Kernel Trick

Many real-world datasets have complex structures where classes cannot be separated by a hyperplane. To address this, in the SVM or TBSVM models, one maps the original feature space into a higher-dimensional feature space in which separating hyperplanes can be found. It turns out that the mapping itself need not be known, this is called the kernel trick, as we explain now.
Let Φ : R n H be a mapping, where H is a Hilbert space, called the feature map. Here again, n is the number of features in the given dataset. Let H ˜ denote the linear span of { Φ ( x 1 ) , , Φ ( x m ) } , so that H ˜ is a finite-dimensional subspace of H that is typically of high dimension. For convenience, we relabel the sets of positive and negative data samples by { x i } i = 1 m 1 and { x i } i = m 1 + 1 m , respectively. We build our TBSVM in H ˜ , where (20) and (21) become
min w 1 , b 1 1 2 i = 1 m 1 ( w 1 T Φ ( x i ) + b 1 ) 2 + c 3 w 1 2 + b 1 2 + c 1 m 2 i = m 1 + 1 m P 1 + ( w 1 T Φ ( x i ) + b 1 ) , μ ,
min w 2 , b 2 1 2 i = m 1 + 1 m ( w 2 T Φ ( x i ) + b 2 ) 2 + c 4 w 2 2 + b 2 2 + c 2 m 1 i = 1 m 1 P 1 ( w 2 T Φ ( x i ) + b 2 ) , μ .
Consider the symmetric mapping K : R n × R n R , called the kernel, given by
K ( y 1 , y 2 ) = Φ ( y 1 ) T Φ ( y 2 ) = Φ ( y 2 ) T Φ ( y 1 ) = K ( y 2 , y 1 ) , y i R n .
Its Gramian Matrix is
X = K ( x i , x j ) i , j = K ( x 1 , x 1 ) K ( x 1 , x 2 ) K ( x 1 , x m ) K ( x 2 , x 1 ) K ( x 2 , x 2 ) K ( x 2 , x m ) K ( x m , x 1 ) K ( x m , x 2 ) K ( x m , x m ) .
As w 1 , w 2 H ˜ , we can write
w 1 = j = 1 m α j Φ ( x j ) and w 2 = j = 1 m β j Φ ( x j )
where α j , β j R need not be unique. Then (23) becomes
min α , b 1 1 2 i = 1 m 1 j = 1 m α j Φ ( x j ) T Φ ( x i ) + b 1 2 + c 3 j = 1 m α j Φ ( x j ) T k = 1 m α k Φ ( x k ) + b 1 2 + c 1 m 2 i = m 1 + 1 m P 1 + j = 1 m α j Φ ( x j ) T Φ ( x i ) + b 1 , μ ,
where α = ( α 1 , α 2 , , α m ) T R m , b 1 R , or equivalently,
min α , b 1 1 2 i = 1 m 1 j = 1 m K ( x i , x j ) α j + b 1 2 + c 3 j , k = 1 m α j α k K ( x j , x k ) + b 1 2 + c 1 m 2 i = m 1 + 1 m P 1 + j = 1 m K ( x i , x j ) α j + b 1 , μ ,
that is,
min α , b 1 1 2 i = 1 m 1 X i α + b 1 2 + c 3 α T X α + b 1 2 + c 1 m 2 i = m 1 + 1 m P 1 + ( X i α + b 1 ) , μ ,
where X i denotes the i-th row of X . Similarly, (24) changes to
min β , b 2 1 2 i = m 1 + 1 m X i β + b 2 2 + c 4 β T X β + b 2 2 + c 2 m 1 i = 1 m 1 P 1 ( X i β + b 2 ) , μ
with β = ( β 1 , β 2 , , β m ) T .
An unknown sample point x R n is assigned to class i ( i = + 1 or i = 1 ) by the following:
class ( x ) = sgn w 1 T Φ ( x ) + b 1 w 1 + w 2 T Φ ( x ) + b 2 w 2 .
Since w 1 = i = 1 m α i Φ ( x i ) , then
w 1 T Φ ( x ) = i = 1 m α i K ( x i , x ) ,
and
w 1 2 = w 1 T w 1 = i , j = 1 m α i K ( x i , x j ) α j = α T X α ,
and similarly for w 2 , so that (29) becomes
class ( x ) = sgn b 1 + i = 1 m α i K ( x i , x ) α T X α + b 2 + i = 1 m β i K ( x i , x ) β T X β .
Equations (27), (28) and (30) show that knowledge of the kernel suffices for building a TBSVM model; the specific feature map Φ need not be known.

4. Numerical Experiments

This section illustrates the performance of our proposed algorithm through experimental results with a selection of nine datasets from the UCI dataset collection [17]. The UCI benchmark datasets provide standardized and widely accepted testbeds for classification algorithms. The datasets chosen are the Australian, Diabetes, Ionosphere, Monk2, Phoneme, Ring, Saheart, Spectfheart, and Twonorm datasets, as shown in Table 1.
All computations were performed with Python3 using the numpy and sklearn packages under the Linux operating system. In the following, we present a comparison of three TBSVM algorithms that differ in the loss functions used: the smooth approximation to the pinball loss function ϕ τ 1 ( u , ϵ 1 ) of (13), the generalized pinball loss function L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) of (11), and our smooth generalized pinball loss function P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) of (14).
For each algorithm, random grid search was used to optimize the parameters of the TBSVM model and the parameter μ of our smooth generalized pinball loss function P ( u , μ ) , as shown in Table 2. After having obtained the best parameters, we trained our proposed algorithm using these parameters.
The experimental results are presented in Table 3, Table 4, Table 5 and Table 6. Accuracy (in %), standard deviation, and training time (in seconds) are used for evaluation, denoted as Acc, sd, and time (s), respectively. The numbers in bold show the best results for each row.

4.1. Linear Models

In this step, to assess the performance of the classifiers, a five-fold cross-validation technique was employed for all experiments. We divided the experiment into two cases, “fixed splits” and “variable splits”. In the case of fixed splits, the same five-fold cross-validation splits used in parameter optimization were also used in evaluation. In case of variable splits, the average results of 50 different tests are reported, where in each test the five-fold cross-validation splits were different, which is a more realistic scenario than with fixed splits.
The results of the fixed split experiments in Table 3 show that our smooth loss achieved the highest accuracies in three datasets (Diabetes, Spectfheart, Twonorm). As for the variable split experiments, Table 4 shows that our smooth loss achieved the highest accuracies in two datasets (Australian and Ring). It can be seen that the generalized pinball loss and our smooth loss show similar accuracies overall in the TBSVM model, although the training times differ. However, smooth pinball loss takes less training time than our proposed algorithm for all datasets.
Table 3. Linear TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Table 3. Linear TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Loss Function
Dataset ϕ τ 1 ( u , ϵ 1 ) L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ )
Australian88.4058 ± 2.291587.8261 ± 2.120187.8261 ± 2.5681
time (s)0.216790.3709890.877043
Diabetes77.8593 ± 2.548277.4688 ± 3.653577.8627 ± 2.2755
time (s)0.0926960.1638530.253656
Monk283.3280 ± 4.586187.7252 ± 3.949587.0356 ± 4.0527
time (s)0.0475820.2308130.178027
Phoneme77.8127 ± 0.747577.9793 ± 0.445677.9237 ± 0.5031
time (s)0.3351060.4800170.678361
Saheart74.0159 ± 5.230773.8125 ± 4.455973.8055 ± 3.6154
time (s)0.1326910.1862570.246309
Spectfheart81.6562 ± 5.740581.6771 ± 4.985183.9064 ± 5.6250
time (s)0.8511004.7253021.593111
Twonorm97.8649 ± 0.255097.9054 ± 0.396397.9324 ± 0.3954
time (s)0.4301770.7042230.658462
Ring76.7973 ± 1.707576.9189 ± 1.566476.8378 ± 1.7058
time (s)0.5568050.6677750.902595
Ionosphere89.4567 ± 2.656389.1751 ± 2.793688.8893 ± 3.3052
time (s)0.3997950.562810.680536
Table 4. Linear TBSVM performance with various loss functions, variable splits (Acc ± sd).
Table 4. Linear TBSVM performance with various loss functions, variable splits (Acc ± sd).
Loss Function
Dataset        ϕ τ 1 ( u , ϵ 1 ) L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ )
Australian86.6000 ± 2.318386.7130 ± 2.318286.9768 ± 2.5594
time (s)0.2187930.3622280.813517
Diabetes75.6744 ± 3.120676.6450 ± 2.791875.4544 ± 2.8918
time (s)0.1143130.1590360.195348
Monk279.8147 ± 5.404786.4165 ± 3.658483.7312 ± 4.2069
time (s)0.0477660.2109390.177863
Phoneme77.4807 ± 1.052177.7872 ± 1.161676.9989 ± 1.4930
time (s)0.3383690.4973390.747009
Saheart72.2705 ± 3.728272.6528 ± 4.387572.6122 ± 3.9976
time (s)0.1233550.1832730.270993
Spectfheart77.2806 ± 7.264279.0070 ± 4.950078.5426 ± 7.4243
time (s)0.7723055.0342581.382591
Twonorm97.7586 ± 0.319097.8059 ± 0.340397.7232 ± 0.3141
time (s)0.4135970.7109380.656289
Ring76.3803 ± 0.984476.6319 ± 0.974576.6335 ± 0.9697
time (s)0.5389400.6788370.889948
Ionosphere86.9708 ± 3.647888.3367 ± 3.479588.1153 ± 3.2979
time (s)0.4644210.5632880.726181

4.2. Non-Linear Models

Table 5 and Table 6 show the experimental results using the kernel trick applied to the most commonly used kernel, the RBF kernel. This kernel is of the form
K x , y = e γ x y 2
where γ > 0 is a parameter. However, to reduce problem size and thus decrease computation time, we have made use of the RBF sampler. This is a randomized technique that avoids utilizing kernels and the huge matrices that appear in the presence of large datasets, thus reducing problem size and computation time. It functions by approximating the feature map of the RBF kernel, mapping the given data into a vector space of substantially lower dimension than the RBF feature space, in which the linear vector machine models can still be applied.
Table 5. Kernel TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Table 5. Kernel TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Loss Function
Dataset        ϕ τ 1 ( u , ϵ 1 ) L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ )
Australian82.4638 ± 2.309883.6232 ± 2.810284.0580 ± 1.8332
time (s)1.7364510.6663182.079387
Diabetes74.9979 ± 1.892675.3875 ± 3.218075.9087 ± 4.1264
time (s)5.9698811.4435271.220819
Monk291.4274 ± 1.770595.5948 ± 1.554894.8998 ± 1.5939
time (s)0.7729243.9838291.956448
Phoneme79.4781 ± 0.584179.8668 ± 0.688580.3479 ± 0.9708
time (s)2.1849062.3253093.43196
Saheart74.0112 ± 4.630274.0089 ± 5.856174.2380 ± 4.2839
time (s)1.1774181.5851771.247391
Spectfheart83.5360 ± 4.612383.1586 ± 4.047384.6681 ± 5.5610
time (s)0.4719611.0810961.35906
Twonorm93.7162 ± 0.854794.6486 ± 0.588194.4595 ± 0.7914
time (s)4.9623332.7497452.804496
Ring95.0676 ± 0.444196.0000 ± 0.404195.7973 ± 0.5429
time (s)1.9941013.1356593.292427
Ionosphere90.3219 ± 2.875393.1630 ± 2.455891.7384 ± 3.5440
time (s)0.9557932.9265642.245855
Table 6. Kernel TBSVM performance with various loss functions, variable splits (Acc ± sd).
Table 6. Kernel TBSVM performance with various loss functions, variable splits (Acc ± sd).
Loss Function
Dataset        ϕ τ 1 ( u , ϵ 1 ) L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ )
Australian79.9333 ± 3.204581.1130 ± 3.110481.9826 ± 3.0698
time (s)1.1009570.6924291.772433
Diabetes70.8914 ± 3.518474.4581 ± 3.112973.7404 ± 3.0545
time (s)2.5571761.5137311.235133
Monk289.4627 ± 4.280694.3527 ± 2.229093.3532 ± 2.7422
time (s)1.7518794.0591422.015005
Phoneme78.1296 ± 2.296679.8060 ± 1.018478.9168 ± 1.7498
time (s)2.5537661.7721174.101333
Saheart69.8903 ± 4.388771.9447 ± 4.307172.1198 ± 3.8393
time (s)1.9963331.5799521.461621
Spectfheart72.5744 ± 10.049679.9280 ± 4.848780.0032 ± 4.6938
time (s)0.8640641.0852971.819914
Twonorm93.3878 ± 0.761694.0049 ± 0.948393.8722 ± 0.7545
time (s)6.2434372.9179592.576174
Ring94.9878 ± 0.475994.9262 ± 1.624093.4192 ± 3.6378
time (s)1.5464313.3768593.962855
Ionosphere86.9302 ± 5.146591.2884 ± 2.893290.3624 ± 3.3991
time (s)1.5122564.0661192.140556
Overall, our proposed smooth loss function can deliver the highest accuracy on five datasets (Australian, Diabetes, Phoneme, Saheart, Spectfheart). Moreover, our smooth loss function takes less training time with three of the nine datasets, namely Diabetes, Saheart, and Twonorm.

4.3. Noise Sensitivity

The generalized pinball loss function [7] has shown to lead to reduced noise sensitivity and improved stability during resampling. To evaluate the sensitivity of our smooth loss functions, normally distributed noise with a mean of zero and standard deviation (r) was added to the selected UCI datasets at ratios of r = 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30 standard deviation to test the noise sensitivity of the algorithms.
As the Twonorm dataset is the dataset that gives the best performance, its results are presented first. Figure 2 shows the performance of the BFGS algorithm applied to the TBSVM model at different noise levels. Loss functions implemented are our proposed loss (labeled BFGS-SGPTBSVM in the figures), the generalized pinball loss (BFGS-GPTBSVM), and the smooth approximation to the pinball loss (BFGS-SPTBSVM).
From Figure 2, we observe that as noise increases, the accuracy of all algorithms tends to decrease. Figure 3 shows the performance of different noise levels for the Australian dataset. We observe that, overall, our smooth generalized pinball loss function retains the noise insensitivity property of the generalized pinball loss function and shows lower noise sensitivity on the Australian dataset in most cases.
The experimental results indicate that the proposed method exhibits stable performance across repeated runs, as reflected by relatively small standard deviations. Moreover, the smoothing parameter μ allows a controlled trade-off between approximation accuracy and numerical stability. While the proposed smooth generalized pinball loss involves higher-order polynomial terms and may introduce additional per-iteration computational cost compared with simpler pinball losses, the resulting twice differentiable objective function guarantees efficient quasi-Newton optimization. This partially compensates for the increased complexity, leading to acceptable overall training times.
Although widely used, the UCI benchmark datasets may not fully reflect the complexity of highly structured, large-scale, or application-specific data. Therefore, the reported experiments primarily serve to validate the general effectiveness and stability of the proposed loss function. Future work will investigate the performance of the proposed approach on more challenging datasets, including imbalanced and domain-specific problems.

5. Conclusions and Discussion

This paper has introduced a family of smooth generalized pinball loss functions to address the issues associated with non-differentiability found in traditional loss functions, such as the hinge loss, pinball loss, and generalized pinball loss function. In addition, a novel twin-bounded support vector machine model with a smooth generalized pinball loss function is proposed. We proved that the generalized pinball loss function can be approximated by the proposed smooth generalized pinball loss function in the uniform norm with arbitrary precision, and that the solution of our TBSVM model is unique and converges to that of the non-smooth problem. In experiments, we selected nine UCI datasets and used a quasi-Newton method to solve the corresponding strongly convex unconstrained optimization problems with twice continuously differentiable objective functions. We then compared the proposed BFGS-SGPTBSVM algorithm with BFGS-GPTBSVM and BFGS-SPTBSVM algorithms in terms of classification performance, accuracy, and computational speed. From the numerical experiments, we found that the proposed BFGS-SGPTBSVM algorithm shows the best performance for the TBSVM with RBFSampler.
The proposed smooth generalized pinball loss-based TBSVM is particularly suitable for classification problems involving noisy, imbalanced, or asymmetric data distributions, where robustness and stable optimization are critical. For problems requiring explicit physical constraints or domain-specific modeling, hybrid loss formulations may be more appropriate.
In future studies, we plan to assess the performance of our model by experiments with complex, large-scale datasets. Moreover, we will evaluate sensitivity to hyperparameters and improve the techniques of parameter optimization for speed and efficacy enhancements. Finally, we will further apply our proposed loss function to other support vector machine models.

Author Contributions

Conceptualization, P.Y. and E.S.; Methodology, P.Y. and E.S.; Software, P.S., P.Y. and E.S.; Validation, P.S., P.Y. and E.S.; Formal analysis, P.S., P.Y. and E.S.; Investigation, P.S., P.Y. and E.S.; Resources, P.S., P.Y. and E.S.; Data curation, P.S., P.Y. and E.S.; Writing—original draft, P.S., P.Y. and E.S.; Writing—review and editing, P.S., P.Y. and E.S.; Visualization, P.Y. and E.S.; Supervision, P.Y. and E.S.; Project administration, P.Y. and E.S.; Funding acquisition, P.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Suranaree University of Technology and the Development and Promotion of Science and Technology Talents Project Scholarship.

Data Availability Statement

The original data presented in the study are openly available in the UCI Machine Learning Repository at https://archive.ics.uci.edu (accessed on 5 June 2024).

Acknowledgments

The first author (P.S.) wishes to express thanks for the financial support received from the School of Mathematical Sciences Geoinformatics at SUT, and through a Scholarship of the Development and Promotion of Science and Technology Talents Project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SVMSupport Vector Machine
TWSVMTwin Support Vector Machine
TBSVMTwin-Bounded Support Vector Machine
BFGSBroyden–Fletcher–Goldfarb–Shanno

References

  1. Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  2. Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; Nédellec, C., Rouveirol, C., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 137–142. [Google Scholar] [CrossRef]
  3. Yin, H.; Jiao, X.; Chai, Y.; Fang, B. Scene classification based on single-layer SAE and SVM. Expert Syst. Appl. 2015, 42, 3368–3380. [Google Scholar] [CrossRef]
  4. Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef] [PubMed]
  5. Shao, Y.H.; Zhang, C.H.; Wang, X.B.; Deng, N.Y. Improvements on twin support vector machines. IEEE Trans. Neural Netw. 2011, 22, 962–968. [Google Scholar] [CrossRef] [PubMed]
  6. Huang, X.; Shi, L.; Suykens, J.A. Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 984–997. [Google Scholar] [CrossRef] [PubMed]
  7. Rastogi, R.; Pal, A.; Chandra, S. Generalized Pinball Loss SVMs. Neurocomputing 2018, 322, 151–165. [Google Scholar] [CrossRef]
  8. Makmuang, D.; Ratiphaphongthon, W.; Wangkeeree, R. Smooth support vector machine with generalized pinball loss for pattern classification. J. Supercomput. 2023, 79, 11684–11706. [Google Scholar] [CrossRef]
  9. Li, K.; Lv, Z. Smooth twin bounded support vector machine with pinball loss. Appl. Intell. 2021, 51, 5489–5505. [Google Scholar] [CrossRef]
  10. Shi, Y.; Zhang, L.; Wang, Z.; Li, X. Smooth and semi-smooth pinball twin support vector machine. Expert Syst. Appl. 2023, 226, 120084. [Google Scholar] [CrossRef]
  11. Shan, X.; Zhang, Z.; Li, X.; Xie, Y.; You, J. Robust online support vector regression with truncated ϵ-insensitive pinball loss. Mathematics 2023, 11, 709. [Google Scholar] [CrossRef]
  12. Li, F.; Yang, H. A novel bounded loss framework for support vector machines. Neural Netw. 2024, 178, 104–118. [Google Scholar]
  13. Wang, L.; Liu, Z. The SVM classifier with quartic truncated pinball loss. Appl. Math. 2025, 16, 245–260. [Google Scholar] [CrossRef]
  14. Diao, S. Support vector machine classifier with rescaled huberized pinball loss. arXiv 2025, arXiv:2511.22065. [Google Scholar] [CrossRef]
  15. Song, L.K.; Tao, F.; Peng, G.Z. Mixed loss-guided modular regression for dependent system reliability. Reliab. Eng. Syst. Saf. 2026, 267, 111898. [Google Scholar] [CrossRef]
  16. Shifei, D.; Junzhao, Y.; Bingjuan, Q.; Huajuan, H. An overview on twin support vector machines. Artif. Intell. Rev. 2014, 42, 245–252. [Google Scholar] [CrossRef]
  17. Dua, D.; Graff, C. UCI Machine Learning Repository. 2019. Available online: http://archive.ics.uci.edu/ml (accessed on 5 June 2024).
Figure 1. P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) for various values of μ .
Figure 1. P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) for various values of μ .
Mathematics 14 00549 g001
Figure 2. Performance of different noise levels for the Twonorm dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.
Figure 2. Performance of different noise levels for the Twonorm dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.
Mathematics 14 00549 g002
Figure 3. Performance of different noise levels for the Australian dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.
Figure 3. Performance of different noise levels for the Australian dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.
Mathematics 14 00549 g003
Table 1. Properties of the nine UCI datasets.
Table 1. Properties of the nine UCI datasets.
Dataset NameNumber of InstancesNumber of Features
Australian69014
Diabetes7688
Monk24326
Phoneme54045
Saheart4629
Spectfheart26744
Twonorm740020
Ring740020
Ionosphere35133
Table 2. The range of parameters in the random grid search. The symbol ‘+’ denotes the range for the positive TBSVM, and ‘−’ for the negative TBSVM.
Table 2. The range of parameters in the random grid search. The symbol ‘+’ denotes the range for the positive TBSVM, and ‘−’ for the negative TBSVM.
Model/Loss FunctionParameter Ranges for the Loss Functions
ϕ τ 1 ( u , ϵ 1 ) , L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) , P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) ϵ 1 + : 0.01 , 0.05 , 0.1 , 0.2 , 0.25 , 0.45 , 0.6 , 0.7 , 0.8 , 1 , 2
ϵ 1 : 0.01 , 0.05 , 0.1 , 0.2 , 0.25 , 0.4 , 0.45 , 0.5 , 0.6 , 0.7 , 0.8 , 1
τ 1 + : 0.1 , 0.3 , 0.4 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1 , 1.2
τ 1 : 0.1 , 0.3 , 0.4 , 0.45 , 0.5 , 0.6 , 0.7 , 0.8 , 0.9 , 1 , 1.2
L τ 1 , τ 2 ϵ 1 , ϵ 2 ( u ) , P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) ϵ 2 + : 0.1 , 0.25 , 0.45 , 0.7 , 0.8 , 1 , 1.5 , 2
ϵ 2 : 0.1 , 0.25 , 0.45 , 0.5 , 0.7 , 0.8 , 1 , 1.5 , 2
τ 2 + : 0.01 , 0.1 , 0.3 , 0.5 , 0.7 , 0.9 , 1 , 3.5
τ 2 : 0.1 , 0.3 , 0.5 , 0.7 , 0.9 , 1 , 2
P τ 1 , τ 2 ϵ 1 , ϵ 2 ( u , μ ) μ + : 0.01 , 0.2 , 0.4 , 0.6 , 0.8 , 1 , 2
μ : 0.01 , 0.2 , 0.4 , 0.6 , 1 , 2
TBSVM penalties c 1 , c 2 , c 3 , c 4 : 0.1 , 0.2 , 0.4 , 0.8 , 1.2 , 1.6 , 2 , 3.2 , 6 , 10 , 20 , 50 , 80
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Srichok, P.; Yimmuang, P.; Schulz, E. A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics 2026, 14, 549. https://doi.org/10.3390/math14030549

AMA Style

Srichok P, Yimmuang P, Schulz E. A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics. 2026; 14(3):549. https://doi.org/10.3390/math14030549

Chicago/Turabian Style

Srichok, Patcharapa, Panu Yimmuang, and Eckart Schulz. 2026. "A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss" Mathematics 14, no. 3: 549. https://doi.org/10.3390/math14030549

APA Style

Srichok, P., Yimmuang, P., & Schulz, E. (2026). A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics, 14(3), 549. https://doi.org/10.3390/math14030549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop