1. Introduction
The support vector machine (SVM), as introduced by Vapnik and Cortes [
1], is a machine learning technique founded on the principles of the Vapnik–Chervonenkis (VC) dimension and structural risk minimization theory within the realm of statistical learning. It is a powerful and widely used supervised machine learning algorithm that is primarily employed for classification and regression tasks, such as text categorization [
2], or scene classification [
3], to name just a few examples. When applied to classification, this approach employs a strategy of maximizing the distances between two distinct data classes from a separating hyperplane, ensuring the correct classification of the two training datasets with a high level of confidence. Through the introduction of “slack variables” and the “kernel trick”, SVMs are particularly suited for their ability to effectively handle high-dimensional data and complex, non-linear decision boundaries.
The twin support vector machine (TWSVM), an innovative extension of the traditional SVM, has garnered considerable attention for its potential to address complex data distributions. Khemchandani and Chandra [
4] proposed the TWSVM, which consists of two non-parallel hyperplanes, each positioned in close proximity to one of the two classes while maintaining a minimum separation distance from the other class. The TWSVM reduces the algorithmic complexity to just a quarter of that of the standard SVM, resulting in a significant reduction of computational time. Shao and his research team [
5] modified the TWSVM to the twin-bounded support vector machine (TBSVM), aiming to minimize structural risk and leading to improved general performance.
Creating the hyperplanes in any of the support vector machine models involves solving a constrained convex minimization problem. The most common solution technique is to formulate the dual optimization problem and solve it using the method of Lagrange multipliers. More recently, techniques for solving the primal problem directly have become popular. These methods use a “loss function” to reformulate the problem as an unconstrained optimization problem. Common loss functions include the hinge loss, the pinball loss [
6], and the generalized pinball loss functions [
7], listed in order of complexity. Because these loss functions are not differentiable, efficient numerical methods such as gradient descent or Newton methods cannot be applied, or cannot be applied based on a solid theoretical foundation. A theoretical analysis generally requires that the objective functions be of
type, that is, twice continuously differentiable.
In 2023, Makmuang, Ratiphaphongthon, and Wangkeeree introduced a
-smooth approximation to the generalized pinball loss function within the standard SVM framework [
8]. The results demonstrate that, on average, the proposed method exhibits superior performance compared to the baseline models. Similarly, Kai and Zhen [
9] introduced a smooth approximation to the pinball loss function in the twin-bounded support vector machine model to mitigate the noise sensitivity and resampling instability associated with the hinge loss function.
A variety of further loss functions have been proposed in the literature. In [
10], a piecewise-quadratic loss is introduced that smooths the pinball loss and belongs to the class of
-functions. In [
11], a truncated
-insensitive pinball loss is investigated, which is only piecewise
. In [
12], the pinball loss is modified by an S-curve to obtain a family of
loss functions depending on three parameters. In [
13], a quartic truncated pinball loss is introduced, which is
and bounded; hence, it is non-convex. Furthermore, in [
14], a rescaled huberized pinball loss is discussed, which again is
, bounded, and non-convex. Finally, recent studies [
15] have explored interesting hybrid data–physics loss formulations, which incorporate physical constraints or domain knowledge into the training process. While such approaches may offer improved interpretability, they are tailored to specific applications and require knowledge of the physical setting. Thus, there is a scarcity of
-smooth convex loss functions that approximate the pinball loss and can be applied to a variety of data.
The main contributions of this work can be summarized as follows. First, we propose a novel one-parameter family of -smooth loss functions, rendering the objective function in the unconstrained formulation of the SVM -smooth as well. We further show that the novel smooth functions can approximate the non-smooth generalized pinball loss function with arbitrary precision in the uniform norm. Second, the proposed smooth loss is incorporated into the twin-bounded support vector machine framework, for which we rigorously prove the uniqueness of solutions and their convergence to the solutions obtained from the non-smooth generalized pinball loss as the smoothing parameter approaches zero. Third, unlike existing smooth pinball or generalized pinball loss formulations, our work provides a complete theoretical foundation for both SVM and TBSVM models, linking smooth loss approximation, optimization stability, and solution behavior. Finally, numerical experiments on benchmark datasets demonstrate that the proposed approach achieves competitive accuracy with stable training behavior.
By comparing the TBSVM equipped with this novel loss function against other contemporary approaches, this study aims to provide a comprehensive perspective on the strengths and limitations of this methodology. It primarily concentrates on conducting an extensive comparative analysis to investigate the impact of various loss functions and their generalizations on model performance. Additionally, it proves that the generalized pinball loss function can be arbitrarily approximated by one of the smooth loss functions in the uniform norm.
The remainder of this paper is organized as follows.
Section 2 reviews background material on SVMs, TWSVMs, TBSVMs, and loss functions.
Section 3 introduces the proposed smooth generalized pinball loss and presents theoretical analysis.
Section 4 reports on numerical experiments, and
Section 5 concludes the paper.
3. Proposed Work
This section proposes a novel one-parameter family of loss functions, presents the proof of uniform convergence to the generalized pinball loss function with decreasing parameter, and develops and analyzes an SVM model and a TBSVM model incorporating the novel loss.
3.1. The Proposed Smooth Loss Function
To smooth the generalized pinball loss
, we define functions
:
, which depend on the real variable
u and are parameterized by an approximation parameter
, by
where the parameters
take non-negative real values. Note that when
, we recover the generalized pinball loss
.
Figure 1 displays the graphic representation of
for various values of the parameter
, with
fixed (
).
It is easy to verify that the
are
-functions and belong to a category of smoothing functions for
Figure 1 hints that the mapping
converges to
as
, which we will prove.
In the support vector machine models described below, the parameters and control the asymmetric penalties assigned to positive and negative misclassification errors, respectively, while and determine the width of the insensitivity region and thus tolerance to noise. The smoothing parameter governs the trade-off between approximation accuracy and the numerical smoothness: smaller values of yield a closer approximation to the original generalized pinball loss, whereas larger values improve numerical stability during optimization.
In the following proofs, we will use the symbols and in place of and , respectively, since the parameters remain fixed.
The next theorem shows that our proposed smooth loss functions indeed approach the generalized pinball loss uniformly as the parameter tends to zero.
Theorem 1. Let and be defined as in (11) and (14), respectively. Then - (i)
for all , , where = max,
- (ii)
uniformly on .
Proof. (i) Let be fixed and . In light of the definitions of and , we divide the proof into five cases, depending on the values of the variable u.
Case 1:
. We obtain
Case 2:
. We have
Set
, we obtain that
Let
. Then,
x lies in the interval
, and
, so that
. The critical points of the function
are
and
, so that
is monotone on
. As
and
, therefore,
on
, and hence
is an increasing function on
. In particular,
Case 3:
. We obtain
Case 4:
. Proceeding similarly to Case 2, we obtain that
is a decreasing function on
. Then
Case 5:
. Proceeding similarly to Case 1, we obtain
(ii) By the Squeeze Theorem, it follows that
uniformly on
, as
. □
3.2. The Support Vector Machine with Smooth Loss Function
Consider a dataset
. We replace the usage of the hinge loss function
with the proposed smooth approximation
, where
is the parameter. The SVM model (
2) changes to the optimization problem
where
.
In the following, we will demonstrate that each optimization problem in the family defined by (
16) admits a unique solution, and that the sequence of solutions for this family converges to the solution of the exact problem:
as
. The next theorem shows that the objective function of (
16) converges to the objective function of (
17) uniformly as
.
Theorem 2. Let = max, then for all and ,
.
Proof. For all
as
,
It follows that
. □
Finally, we show that the solution of (
16) converges to the solution of (
17) as
.
Theorem 3. Let and be defined as in (16) and (17), respectively, and let be an optimal solution of problem (17). Then: - (i)
There exists a unique solution of problem (16); - (ii)
;
- (iii)
as .
Proof. Let be fixed.
(i) Let
be arbitrary, but given. For
, the set
is called a sublevel set. Clearly, we can pick
so that
. Since
is continuous, the sublevel set
is closed. By the definition of
,
Since the set
is bounded, it follows that
is bounded. By
closed and bounded, we obtain that
is a compact set in
. By the Extreme Value Theorem, it follows that
has a minimizer
on
, i.e., Problem (
16) has a solution. On the other hand,
is convex and
is affine in
; therefore,
is convex. As
is strongly convex, it follows that
is also strongly convex. By strong convexity, we achieve the uniqueness of the solution of problem (
16).
(ii) Let
and
be optimal solutions of (
16) and (
17), respectively.
Let
be the gradient of
and
be subgradients of
; that is,
where
By strong convexity with parameter 1, we obtain that
and
The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we obtain
and
Consider
and by Theorem 2, we achieve
(iii) By the above and the Squeeze Theorem, we obtain that as . □
3.3. The Twin-Bounded Support Vector Machine with Smooth Loss Function
The combination of the TBSVM with the new smooth generalized pinball loss function is designed to improve the algorithm’s performance in handling complex or noisy data, which traditional loss functions (such as the hinge loss) often struggle with. Consequently, a novel TBSVM with smooth generalized pinball loss is introduced to enhance the algorithm’s ability to manage noise sensitivity and imbalanced data. This approach not only improves the model’s predictive performance but also ensures faster and more stable computations during the optimization process.
The formulation of the TBSVM model (
7) and (
8) can be simplified to
where
.
We replace the right-most terms with the new smooth loss function
, and obtain the following structure:
Theorem 4. Let = max. Then and for .
Proof. Recall from the proof of Theorem 1 that
. Then
that is,
.
In a similar way, one shows that . □
Theorem 5. Let and be the optimal solutions of problems (18) and (19), respectively. Then - (i)
there exist unique solutions of problems (20) and (21), denoted and , respectively. - (ii)
and ,
- (iii)
and as .
Proof. (i) Let
be arbitrary, but fixed. Pick
, so that the sublevel set
is not empty. Since
is continuous, the sublevel set
is closed. Furthermore, by the definition of
,
Since the set
is bounded, then
is bounded. By
closed and bounded, we obtain that
is a compact set in
. By the Extreme Value Theorem, a solution to problem (
20) exists, which we denote by
. As before, since
is convex, then
is convex.
Next, we show that the first term in (
20) is convex in
. In fact, as the function
is convex, then
. This shows that
is convex
, so that
is convex. As
is strongly convex and the remaining terms in (
20) are convex, it follows that
is also strongly convex. By strong convexity, we obtain that the solution of problem (
20) is unique. For the uniqueness of the solution of problem (
21), we proceed in a similar way to obtain a unique solution
.
(ii) Let
and
be the optimal solutions of (
18) and (
20), respectively.
Let
be the gradient of
and
be a subgradient of
, that is
By strong convexity with parameter 1, we obtain that
and
The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we achieve
and
By Theorem 1, we have
where
.
Consider
and by Theorem 4, we obtain
The inequality , can be proven in the same way.
(iii) By the above arguments and the Squeeze Theorem, we have as . Similarly, we obtain as . □
In summary, the above theorems have established the convergence properties of the proposed smooth loss within the TBSVM framework, demonstrating both theoretical soundness and practical relevance. From a practical viewpoint, the established uniform convergence guarantees predictable behavior of the model as the smoothing parameter decreases, while the uniqueness of solutions implies robustness with respect to initialization and numerical perturbations. These properties are particularly important for large-scale or noisy datasets, where ill-posed optimization problems may otherwise lead to unstable training or inconsistent classification results.
3.4. Quasi-Newton Smooth Generalized Pinball Twin-Bounded Support Vector Machine
The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method is one of the most widely used quasi-Newton algorithms, named after its developers. The focus of this section is on the application of the BFGS method, which we used to solve a strongly convex differentiable problem. This approach is used in optimization to find the minimum of a strongly convex and
function. As before, we focus on the strongly convex differentiable problem (
20):
If
denotes the value of
obtained in the
k-th iteration step, then the gradient of the objective function
at
is
where
and the partial derivative of
with respect to the variable
u is
The BFGS method is outlined as follows:
where
and
is determined by the Armijo condition.
Let
be an approximation of the Hessian matrix
. For the next iteration, the Hessian matrix is updated by the Sherman–Morrison–Woodbury formula, that is,
where
and
The BFGS algorithm for the strongly convex differentiable problem (
21) can be computed similarly to the problem (
20).
3.5. The Kernel Trick
Many real-world datasets have complex structures where classes cannot be separated by a hyperplane. To address this, in the SVM or TBSVM models, one maps the original feature space into a higher-dimensional feature space in which separating hyperplanes can be found. It turns out that the mapping itself need not be known, this is called the kernel trick, as we explain now.
Let
be a mapping, where
is a Hilbert space, called the feature map. Here again,
n is the number of features in the given dataset. Let
denote the linear span of
, so that
is a finite-dimensional subspace of
that is typically of high dimension. For convenience, we relabel the sets of positive and negative data samples by
and
, respectively. We build our TBSVM in
, where (
20) and (
21) become
Consider the symmetric mapping
, called the kernel, given by
Its Gramian Matrix is
As
, we can write
where
need not be unique. Then (
23) becomes
where
, or equivalently,
that is,
where
denotes the
i-th row of
. Similarly, (
24) changes to
with
.
An unknown sample point
is assigned to class
i by the following:
Since
, then
and
and similarly for
, so that (
29) becomes
Equations (
27), (
28) and (
30) show that knowledge of the kernel suffices for building a TBSVM model; the specific feature map
need not be known.
4. Numerical Experiments
This section illustrates the performance of our proposed algorithm through experimental results with a selection of nine datasets from the UCI dataset collection [
17]. The UCI benchmark datasets provide standardized and widely accepted testbeds for classification algorithms. The datasets chosen are the Australian, Diabetes, Ionosphere, Monk2, Phoneme, Ring, Saheart, Spectfheart, and Twonorm datasets, as shown in
Table 1.
All computations were performed with Python3 using the numpy and sklearn packages under the Linux operating system. In the following, we present a comparison of three TBSVM algorithms that differ in the loss functions used: the smooth approximation to the pinball loss function
of (
13), the generalized pinball loss function
of (
11), and our smooth generalized pinball loss function
of (
14).
For each algorithm, random grid search was used to optimize the parameters of the TBSVM model and the parameter
of our smooth generalized pinball loss function
, as shown in
Table 2. After having obtained the best parameters, we trained our proposed algorithm using these parameters.
The experimental results are presented in
Table 3,
Table 4,
Table 5 and
Table 6. Accuracy (in %), standard deviation, and training time (in seconds) are used for evaluation, denoted as Acc, sd, and time (s), respectively. The numbers in bold show the best results for each row.
4.1. Linear Models
In this step, to assess the performance of the classifiers, a five-fold cross-validation technique was employed for all experiments. We divided the experiment into two cases, “fixed splits” and “variable splits”. In the case of fixed splits, the same five-fold cross-validation splits used in parameter optimization were also used in evaluation. In case of variable splits, the average results of 50 different tests are reported, where in each test the five-fold cross-validation splits were different, which is a more realistic scenario than with fixed splits.
The results of the fixed split experiments in
Table 3 show that our smooth loss achieved the highest accuracies in three datasets (Diabetes, Spectfheart, Twonorm). As for the variable split experiments,
Table 4 shows that our smooth loss achieved the highest accuracies in two datasets (Australian and Ring). It can be seen that the generalized pinball loss and our smooth loss show similar accuracies overall in the TBSVM model, although the training times differ. However, smooth pinball loss takes less training time than our proposed algorithm for all datasets.
Table 3.
Linear TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Table 3.
Linear TBSVM performance with various loss functions, fixed splits (Acc ± sd).
| | Loss Function |
|---|
| Dataset | | | |
|---|
| Australian | 88.4058 ± 2.2915 | 87.8261 ± 2.1201 | 87.8261 ± 2.5681 |
| time (s) | 0.21679 | 0.370989 | 0.877043 |
| Diabetes | 77.8593 ± 2.5482 | 77.4688 ± 3.6535 | 77.8627 ± 2.2755 |
| time (s) | 0.092696 | 0.163853 | 0.253656 |
| Monk2 | 83.3280 ± 4.5861 | 87.7252 ± 3.9495 | 87.0356 ± 4.0527 |
| time (s) | 0.047582 | 0.230813 | 0.178027 |
| Phoneme | 77.8127 ± 0.7475 | 77.9793 ± 0.4456 | 77.9237 ± 0.5031 |
| time (s) | 0.335106 | 0.480017 | 0.678361 |
| Saheart | 74.0159 ± 5.2307 | 73.8125 ± 4.4559 | 73.8055 ± 3.6154 |
| time (s) | 0.132691 | 0.186257 | 0.246309 |
| Spectfheart | 81.6562 ± 5.7405 | 81.6771 ± 4.9851 | 83.9064 ± 5.6250 |
| time (s) | 0.851100 | 4.725302 | 1.593111 |
| Twonorm | 97.8649 ± 0.2550 | 97.9054 ± 0.3963 | 97.9324 ± 0.3954 |
| time (s) | 0.430177 | 0.704223 | 0.658462 |
| Ring | 76.7973 ± 1.7075 | 76.9189 ± 1.5664 | 76.8378 ± 1.7058 |
| time (s) | 0.556805 | 0.667775 | 0.902595 |
| Ionosphere | 89.4567 ± 2.6563 | 89.1751 ± 2.7936 | 88.8893 ± 3.3052 |
| time (s) | 0.399795 | 0.56281 | 0.680536 |
Table 4.
Linear TBSVM performance with various loss functions, variable splits (Acc ± sd).
Table 4.
Linear TBSVM performance with various loss functions, variable splits (Acc ± sd).
| | Loss Function |
|---|
| Dataset | | | |
|---|
| Australian | 86.6000 ± 2.3183 | 86.7130 ± 2.3182 | 86.9768 ± 2.5594 |
| time (s) | 0.218793 | 0.362228 | 0.813517 |
| Diabetes | 75.6744 ± 3.1206 | 76.6450 ± 2.7918 | 75.4544 ± 2.8918 |
| time (s) | 0.114313 | 0.159036 | 0.195348 |
| Monk2 | 79.8147 ± 5.4047 | 86.4165 ± 3.6584 | 83.7312 ± 4.2069 |
| time (s) | 0.047766 | 0.210939 | 0.177863 |
| Phoneme | 77.4807 ± 1.0521 | 77.7872 ± 1.1616 | 76.9989 ± 1.4930 |
| time (s) | 0.338369 | 0.497339 | 0.747009 |
| Saheart | 72.2705 ± 3.7282 | 72.6528 ± 4.3875 | 72.6122 ± 3.9976 |
| time (s) | 0.123355 | 0.183273 | 0.270993 |
| Spectfheart | 77.2806 ± 7.2642 | 79.0070 ± 4.9500 | 78.5426 ± 7.4243 |
| time (s) | 0.772305 | 5.034258 | 1.382591 |
| Twonorm | 97.7586 ± 0.3190 | 97.8059 ± 0.3403 | 97.7232 ± 0.3141 |
| time (s) | 0.413597 | 0.710938 | 0.656289 |
| Ring | 76.3803 ± 0.9844 | 76.6319 ± 0.9745 | 76.6335 ± 0.9697 |
| time (s) | 0.538940 | 0.678837 | 0.889948 |
| Ionosphere | 86.9708 ± 3.6478 | 88.3367 ± 3.4795 | 88.1153 ± 3.2979 |
| time (s) | 0.464421 | 0.563288 | 0.726181 |
4.2. Non-Linear Models
Table 5 and
Table 6 show the experimental results using the kernel trick applied to the most commonly used kernel, the RBF kernel. This kernel is of the form
where
is a parameter. However, to reduce problem size and thus decrease computation time, we have made use of the RBF sampler. This is a randomized technique that avoids utilizing kernels and the huge matrices that appear in the presence of large datasets, thus reducing problem size and computation time. It functions by approximating the feature map of the RBF kernel, mapping the given data into a vector space of substantially lower dimension than the RBF feature space, in which the linear vector machine models can still be applied.
Table 5.
Kernel TBSVM performance with various loss functions, fixed splits (Acc ± sd).
Table 5.
Kernel TBSVM performance with various loss functions, fixed splits (Acc ± sd).
| | Loss Function |
|---|
| Dataset | | | |
|---|
| Australian | 82.4638 ± 2.3098 | 83.6232 ± 2.8102 | 84.0580 ± 1.8332 |
| time (s) | 1.736451 | 0.666318 | 2.079387 |
| Diabetes | 74.9979 ± 1.8926 | 75.3875 ± 3.2180 | 75.9087 ± 4.1264 |
| time (s) | 5.969881 | 1.443527 | 1.220819 |
| Monk2 | 91.4274 ± 1.7705 | 95.5948 ± 1.5548 | 94.8998 ± 1.5939 |
| time (s) | 0.772924 | 3.983829 | 1.956448 |
| Phoneme | 79.4781 ± 0.5841 | 79.8668 ± 0.6885 | 80.3479 ± 0.9708 |
| time (s) | 2.184906 | 2.325309 | 3.43196 |
| Saheart | 74.0112 ± 4.6302 | 74.0089 ± 5.8561 | 74.2380 ± 4.2839 |
| time (s) | 1.177418 | 1.585177 | 1.247391 |
| Spectfheart | 83.5360 ± 4.6123 | 83.1586 ± 4.0473 | 84.6681 ± 5.5610 |
| time (s) | 0.471961 | 1.081096 | 1.35906 |
| Twonorm | 93.7162 ± 0.8547 | 94.6486 ± 0.5881 | 94.4595 ± 0.7914 |
| time (s) | 4.962333 | 2.749745 | 2.804496 |
| Ring | 95.0676 ± 0.4441 | 96.0000 ± 0.4041 | 95.7973 ± 0.5429 |
| time (s) | 1.994101 | 3.135659 | 3.292427 |
| Ionosphere | 90.3219 ± 2.8753 | 93.1630 ± 2.4558 | 91.7384 ± 3.5440 |
| time (s) | 0.955793 | 2.926564 | 2.245855 |
Table 6.
Kernel TBSVM performance with various loss functions, variable splits (Acc ± sd).
Table 6.
Kernel TBSVM performance with various loss functions, variable splits (Acc ± sd).
| | Loss Function |
|---|
| Dataset | | | |
|---|
| Australian | 79.9333 ± 3.2045 | 81.1130 ± 3.1104 | 81.9826 ± 3.0698 |
| time (s) | 1.100957 | 0.692429 | 1.772433 |
| Diabetes | 70.8914 ± 3.5184 | 74.4581 ± 3.1129 | 73.7404 ± 3.0545 |
| time (s) | 2.557176 | 1.513731 | 1.235133 |
| Monk2 | 89.4627 ± 4.2806 | 94.3527 ± 2.2290 | 93.3532 ± 2.7422 |
| time (s) | 1.751879 | 4.059142 | 2.015005 |
| Phoneme | 78.1296 ± 2.2966 | 79.8060 ± 1.0184 | 78.9168 ± 1.7498 |
| time (s) | 2.553766 | 1.772117 | 4.101333 |
| Saheart | 69.8903 ± 4.3887 | 71.9447 ± 4.3071 | 72.1198 ± 3.8393 |
| time (s) | 1.996333 | 1.579952 | 1.461621 |
| Spectfheart | 72.5744 ± 10.0496 | 79.9280 ± 4.8487 | 80.0032 ± 4.6938 |
| time (s) | 0.864064 | 1.085297 | 1.819914 |
| Twonorm | 93.3878 ± 0.7616 | 94.0049 ± 0.9483 | 93.8722 ± 0.7545 |
| time (s) | 6.243437 | 2.917959 | 2.576174 |
| Ring | 94.9878 ± 0.4759 | 94.9262 ± 1.6240 | 93.4192 ± 3.6378 |
| time (s) | 1.546431 | 3.376859 | 3.962855 |
| Ionosphere | 86.9302 ± 5.1465 | 91.2884 ± 2.8932 | 90.3624 ± 3.3991 |
| time (s) | 1.512256 | 4.066119 | 2.140556 |
Overall, our proposed smooth loss function can deliver the highest accuracy on five datasets (Australian, Diabetes, Phoneme, Saheart, Spectfheart). Moreover, our smooth loss function takes less training time with three of the nine datasets, namely Diabetes, Saheart, and Twonorm.
4.3. Noise Sensitivity
The generalized pinball loss function [
7] has shown to lead to reduced noise sensitivity and improved stability during resampling. To evaluate the sensitivity of our smooth loss functions, normally distributed noise with a mean of zero and standard deviation (r) was added to the selected UCI datasets at ratios of r = 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30 standard deviation to test the noise sensitivity of the algorithms.
As the Twonorm dataset is the dataset that gives the best performance, its results are presented first.
Figure 2 shows the performance of the BFGS algorithm applied to the TBSVM model at different noise levels. Loss functions implemented are our proposed loss (labeled BFGS-SGPTBSVM in the figures), the generalized pinball loss (BFGS-GPTBSVM), and the smooth approximation to the pinball loss (BFGS-SPTBSVM).
From
Figure 2, we observe that as noise increases, the accuracy of all algorithms tends to decrease.
Figure 3 shows the performance of different noise levels for the Australian dataset. We observe that, overall, our smooth generalized pinball loss function retains the noise insensitivity property of the generalized pinball loss function and shows lower noise sensitivity on the Australian dataset in most cases.
The experimental results indicate that the proposed method exhibits stable performance across repeated runs, as reflected by relatively small standard deviations. Moreover, the smoothing parameter allows a controlled trade-off between approximation accuracy and numerical stability. While the proposed smooth generalized pinball loss involves higher-order polynomial terms and may introduce additional per-iteration computational cost compared with simpler pinball losses, the resulting twice differentiable objective function guarantees efficient quasi-Newton optimization. This partially compensates for the increased complexity, leading to acceptable overall training times.
Although widely used, the UCI benchmark datasets may not fully reflect the complexity of highly structured, large-scale, or application-specific data. Therefore, the reported experiments primarily serve to validate the general effectiveness and stability of the proposed loss function. Future work will investigate the performance of the proposed approach on more challenging datasets, including imbalanced and domain-specific problems.
5. Conclusions and Discussion
This paper has introduced a family of smooth generalized pinball loss functions to address the issues associated with non-differentiability found in traditional loss functions, such as the hinge loss, pinball loss, and generalized pinball loss function. In addition, a novel twin-bounded support vector machine model with a smooth generalized pinball loss function is proposed. We proved that the generalized pinball loss function can be approximated by the proposed smooth generalized pinball loss function in the uniform norm with arbitrary precision, and that the solution of our TBSVM model is unique and converges to that of the non-smooth problem. In experiments, we selected nine UCI datasets and used a quasi-Newton method to solve the corresponding strongly convex unconstrained optimization problems with twice continuously differentiable objective functions. We then compared the proposed BFGS-SGPTBSVM algorithm with BFGS-GPTBSVM and BFGS-SPTBSVM algorithms in terms of classification performance, accuracy, and computational speed. From the numerical experiments, we found that the proposed BFGS-SGPTBSVM algorithm shows the best performance for the TBSVM with RBFSampler.
The proposed smooth generalized pinball loss-based TBSVM is particularly suitable for classification problems involving noisy, imbalanced, or asymmetric data distributions, where robustness and stable optimization are critical. For problems requiring explicit physical constraints or domain-specific modeling, hybrid loss formulations may be more appropriate.
In future studies, we plan to assess the performance of our model by experiments with complex, large-scale datasets. Moreover, we will evaluate sensitivity to hyperparameters and improve the techniques of parameter optimization for speed and efficacy enhancements. Finally, we will further apply our proposed loss function to other support vector machine models.