A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss

Srichok, Patcharapa; Yimmuang, Panu; Schulz, Eckart

doi:10.3390/math14030549

Open AccessArticle

A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss

by

Patcharapa Srichok

,

Panu Yimmuang

^* and

Eckart Schulz

School of Mathematical Sciences and Geoinformatics, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(3), 549; https://doi.org/10.3390/math14030549

Submission received: 30 December 2025 / Revised: 26 January 2026 / Accepted: 27 January 2026 / Published: 3 February 2026

(This article belongs to the Special Issue Advanced Studies in Mathematical Optimization and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

We present a one-parameter family of smooth generalized pinball loss functions to overcome the challenges of non-differentiability, noise sensitivity, and resampling instability inherent in traditional loss functions such as hinge loss. These functions make the objective function in the formulation of the support vector machine (SVM) model twice continuously differentiable and improve model performance by reducing noise sensitivity and preserving the sparsity of the solution. Similarly, a novel twin-bounded support vector machine (TBSVM) model with a smooth generalized pinball loss function is obtained. Furthermore, we compare the performance of the TBSVM with the novel type of smooth loss function against other contemporary approaches, offering a comprehensive assessment of its strengths and limitations by conducting an evaluation with UCI datasets. The experimental results show that the proposed model has the best performance in the TBSVM with RBFSampler. Additionally, we prove that the generalized pinball loss function can be approximated by a novel smooth generalized pinball loss function in the uniform norm with arbitrary precision. We further show that the solutions of the proposed SVM and TBSVM models are unique and that they converge to the solutions of the models with non-smooth generalized pinball loss as the parameter approaches zero.

Keywords:

generalized pinball loss function; smooth generalized pinball loss function; support vector machine; RBFSampler

MSC:

68T05; 90C25; 62H30

1. Introduction

The support vector machine (SVM), as introduced by Vapnik and Cortes [1], is a machine learning technique founded on the principles of the Vapnik–Chervonenkis (VC) dimension and structural risk minimization theory within the realm of statistical learning. It is a powerful and widely used supervised machine learning algorithm that is primarily employed for classification and regression tasks, such as text categorization [2], or scene classification [3], to name just a few examples. When applied to classification, this approach employs a strategy of maximizing the distances between two distinct data classes from a separating hyperplane, ensuring the correct classification of the two training datasets with a high level of confidence. Through the introduction of “slack variables” and the “kernel trick”, SVMs are particularly suited for their ability to effectively handle high-dimensional data and complex, non-linear decision boundaries.

The twin support vector machine (TWSVM), an innovative extension of the traditional SVM, has garnered considerable attention for its potential to address complex data distributions. Khemchandani and Chandra [4] proposed the TWSVM, which consists of two non-parallel hyperplanes, each positioned in close proximity to one of the two classes while maintaining a minimum separation distance from the other class. The TWSVM reduces the algorithmic complexity to just a quarter of that of the standard SVM, resulting in a significant reduction of computational time. Shao and his research team [5] modified the TWSVM to the twin-bounded support vector machine (TBSVM), aiming to minimize structural risk and leading to improved general performance.

Creating the hyperplanes in any of the support vector machine models involves solving a constrained convex minimization problem. The most common solution technique is to formulate the dual optimization problem and solve it using the method of Lagrange multipliers. More recently, techniques for solving the primal problem directly have become popular. These methods use a “loss function” to reformulate the problem as an unconstrained optimization problem. Common loss functions include the hinge loss, the pinball loss [6], and the generalized pinball loss functions [7], listed in order of complexity. Because these loss functions are not differentiable, efficient numerical methods such as gradient descent or Newton methods cannot be applied, or cannot be applied based on a solid theoretical foundation. A theoretical analysis generally requires that the objective functions be of

C^{2}

type, that is, twice continuously differentiable.

In 2023, Makmuang, Ratiphaphongthon, and Wangkeeree introduced a

C^{1}

-smooth approximation to the generalized pinball loss function within the standard SVM framework [8]. The results demonstrate that, on average, the proposed method exhibits superior performance compared to the baseline models. Similarly, Kai and Zhen [9] introduced a smooth approximation to the pinball loss function in the twin-bounded support vector machine model to mitigate the noise sensitivity and resampling instability associated with the hinge loss function.

A variety of further loss functions have been proposed in the literature. In [10], a piecewise-quadratic loss is introduced that smooths the pinball loss and belongs to the class of

C^{1}

-functions. In [11], a truncated

e p s i l o n

-insensitive pinball loss is investigated, which is only piecewise

C^{1}

. In [12], the pinball loss is modified by an S-curve to obtain a family of

C^{1}

loss functions depending on three parameters. In [13], a quartic truncated pinball loss is introduced, which is

C^{1}

and bounded; hence, it is non-convex. Furthermore, in [14], a rescaled huberized pinball loss is discussed, which again is

C^{1}

, bounded, and non-convex. Finally, recent studies [15] have explored interesting hybrid data–physics loss formulations, which incorporate physical constraints or domain knowledge into the training process. While such approaches may offer improved interpretability, they are tailored to specific applications and require knowledge of the physical setting. Thus, there is a scarcity of

C^{2}

-smooth convex loss functions that approximate the pinball loss and can be applied to a variety of data.

The main contributions of this work can be summarized as follows. First, we propose a novel one-parameter family of

C^{2}

-smooth loss functions, rendering the objective function in the unconstrained formulation of the SVM

C^{2}

-smooth as well. We further show that the novel smooth functions can approximate the non-smooth generalized pinball loss function with arbitrary precision in the uniform norm. Second, the proposed smooth loss is incorporated into the twin-bounded support vector machine framework, for which we rigorously prove the uniqueness of solutions and their convergence to the solutions obtained from the non-smooth generalized pinball loss as the smoothing parameter approaches zero. Third, unlike existing smooth pinball or generalized pinball loss formulations, our work provides a complete theoretical foundation for both SVM and TBSVM models, linking smooth loss approximation, optimization stability, and solution behavior. Finally, numerical experiments on benchmark datasets demonstrate that the proposed approach achieves competitive accuracy with stable training behavior.

By comparing the TBSVM equipped with this novel loss function against other contemporary approaches, this study aims to provide a comprehensive perspective on the strengths and limitations of this methodology. It primarily concentrates on conducting an extensive comparative analysis to investigate the impact of various loss functions and their generalizations on model performance. Additionally, it proves that the generalized pinball loss function can be arbitrarily approximated by one of the smooth loss functions in the uniform norm.

The remainder of this paper is organized as follows. Section 2 reviews background material on SVMs, TWSVMs, TBSVMs, and loss functions. Section 3 introduces the proposed smooth generalized pinball loss and presents theoretical analysis. Section 4 reports on numerical experiments, and Section 5 concludes the paper.

2. Background and Literature Review

This section introduces the fundamental mathematical concepts in machine learning as they pertain to the support vector machine models used in this work.

2.1. The Support Vector Machine

Let

X = {(\vec{x_{i}}, y_{i}) : i = 1, 2, \dots, m}

be a training data set, where

\vec{x_{i}} \in R^{n}

and

y_{i} \in {+ 1, - 1}

. Here, m denotes the number of data samples, n the number of features, and

y_{i}

the class to which a data sample belongs. A linear support vector machine (SVM) finds the optimal hyperplane

f (\vec{x}) = {\vec{w}}^{T} \vec{x} + b = 0,

which separates the data into two classes and has the largest distance from each of the two classes; this distance is called the margin. The vector

\vec{w}

is normal to the hyperplane and determines its direction in n-space, while the offset b reflects its distance from the origin. The decision function of an SVM, which determines the class a data sample

\vec{x}

belongs to, is obtained from the sign function (sgn) by

class (\vec{x}) = sgn ({\vec{w}}^{T} \vec{x} + b) .

(1)

In practice, it may not be possible to properly separate the two data classes by means of a single hyperplane. One thus allows for misclassification of some of the training data samples, and can find the hyperplane by solving an unconstrained optimization problem:

min_{\vec{w}, b} \frac{1}{2} (∥ \vec{w} ∥^{2} + b^{2}) + \frac{C}{m} \sum_{i = 1}^{m} L_{hinge} (1 - y_{i} ({\vec{w}}^{T} \vec{x_{i}} + b)),

(2)

where the function

L_{hinge}

is the hinge loss function defined on the real line by

L_{hinge} (u) = max {0, u} = \{\begin{matrix} u, & u \geq 0, \\ 0, & u < 0 . \end{matrix}

(3)

The term on the right of (2) reflects the amount of misclassification, and the parameter

C > 0

manages the equilibrium between maximizing the margin and minimizing classification errors.

2.2. The Twin Support Vector Machine

Consider a binary classification dataset X, which contains

m_{1}

positive (class

+ 1

) and

m_{2}

negative (class

- 1

) samples, respectively, with

m_{1} + m_{2} = m

. In a twin support vector machine (TWSVM), two non-parallel hyperplanes are located in a manner where each hyperplane is nearer to one of the two classes and maintains a minimum margin of at least one from the other. The two nonparallel classification hyperplanes take the following form:

f_{1} (\vec{x}) = {\vec{w_{1}}}^{T} \vec{x} + b_{1} = 0 and f_{2} (\vec{x}) = {\vec{w_{2}}}^{T} \vec{x} + b_{2} = 0,

and need to separate the samples correctly. Thus, the TWSVM is determined by the following two unconstrained optimization problems:

min_{\vec{w_{1}}, b_{1}} \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{w_{1}}}^{T} {\vec{x_{i}}}^{(1)} + b_{1})}^{2} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} L_{hinge} (1 + ({\vec{w_{1}}}^{T} {\vec{x_{i}}}^{(2)} + b_{1})),

(4)

and

min_{\vec{w_{2}}, b_{2}} \frac{1}{2} \sum_{i = 1}^{m_{2}} {({\vec{w_{2}}}^{T} {\vec{x_{i}}}^{(2)} + b_{2})}^{2} + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} L_{hinge} (1 - ({\vec{w_{2}}}^{T} {\vec{x_{i}}}^{(1)} + b_{2})),

(5)

where

{\vec{x_{i}}}^{(1)}

(i = 1 \dots m_{1})

and

{\vec{x_{i}}}^{(2)}

(i = 1 \dots m_{2})

denote the data samples in the positive and negative class, respectively. The decision function of the TWSVM is

class (\vec{x}) = sign (\frac{{\vec{w_{1}}}^{T} \vec{x} + b_{1}}{∥ \vec{w_{1}} ∥} + \frac{{\vec{w_{2}}}^{T} \vec{x} + b_{2}}{∥ \vec{w_{2}} ∥}) .

(6)

2.3. The Twin-Bounded Support Vector Machine

The objective function of the TWSVM saw the integration of a regularization term by Shao and his research team in 2011 [5], resulting in the introduction of a novel machine learning model called the twin-bounded support vector machine (TBSVM). It is formulated as

min_{\vec{w_{1}}, b_{1}} \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{w_{1}}}^{T} {\vec{x_{i}}}^{(1)} + b_{1})}^{2} + \frac{c_{3}}{2} (∥ \vec{w_{1}} ∥^{2} + b_{1}^{2}) + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} L_{hinge} (1 + ({\vec{w_{1}}}^{T} {\vec{x_{i}}}^{(2)} + b_{1})),

(7)

and

min_{\vec{w_{2}}, b_{2}} \frac{1}{2} \sum_{i = 1}^{m_{2}} {({\vec{w_{2}}}^{T} {\vec{x_{i}}}^{(2)} + b_{2})}^{2} + \frac{c_{4}}{2} (∥ \vec{w_{2}} ∥^{2} + b_{2}^{2}) + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} L_{hinge} (1 - ({\vec{w_{2}}}^{T} {\vec{x_{i}}}^{(1)} + b_{2})) .

(8)

A survey on various twin support machine variants can be found in [16].

2.4. Loss Functions

The hinge loss function

L_{hinge} (u)

is unstable for resampling and sensitive to noise. To resolve this problem, in 2013, Huang, Shi, and Suykens [6] modified it to the pinball loss function in the SVM classifier, and demonstrated its effectiveness in mitigating noise sensitivity through experiments. The pinball loss function is defined as follows

L_{τ} (u) = \{\begin{matrix} u, & u \geq 0, \\ - τ u, & u < 0, \end{matrix}

(9)

where

τ

is a non-negative real parameter. However, the pinball loss function still shows some noise sensitivity. Thus, the

ϵ

-insensitive pinball loss function was created, defined as

L_{τ}^{ϵ} (u) = \{\begin{matrix} u - ϵ, & u > ϵ, \\ 0, & - \frac{ϵ}{τ} \leq u \leq ϵ, \\ - τ (u + \frac{ϵ}{τ}), & u < - \frac{ϵ}{τ}, \end{matrix}

(10)

where

τ, ϵ

are non-negative real values. Recently, Rastogi, Pal, and Chandra [7] introduced a novel loss function generalizing the

ϵ

-insensitive pinball loss by introducing one more parameter. This is generalized pinball loss; it is defined by

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u) = \{\begin{matrix} τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}), & u > \frac{ϵ_{1}}{τ_{1}}, \\ 0, & - \frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}, \\ - τ_{2} (u + \frac{ϵ_{2}}{τ_{2}}), & u < - \frac{ϵ_{2}}{τ_{2}}, \end{matrix}

(11)

where

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2}

are non-negative parameters.

Makmuang, Ratiphaphongthon, and Wangkeeree [8] introduced a new

C^{1}

-smooth loss function that approximates the generalized pinball loss function for use in the SVM, which is given by

Q_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ) = \{\begin{matrix} τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}) - \frac{τ_{1}^{2}}{2} μ, & \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ \leq u, \\ \frac{1}{2 μ} {(u - \frac{ϵ_{1}}{τ_{1}})}^{2}, & \frac{ϵ_{1}}{τ_{1}} \leq u \leq \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ, \\ 0, & - \frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}, \\ \frac{1}{2 μ} {(u + \frac{ϵ_{2}}{τ_{2}})}^{2}, & - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ \leq u \leq - \frac{ϵ_{2}}{τ_{2}}, \\ - τ_{2} (u + \frac{ϵ_{2}}{τ_{2}}) - \frac{τ_{2}^{2}}{2} μ, & u \leq - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ, \end{matrix}

(12)

where

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2}

are non-negative real values,

u \in R

, and

μ \in R_{+}

is a parameter of how close

Q_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (\cdot, μ)

approximates

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (\cdot)

.

Alternatively, Kai and Zhen [9] proposed an infinitely differentiable approximation of the pinball loss function within the twin-bounded support vector machine model to mitigate the noise sensitivity and resampling instability inherent in the pinball loss function. Their smooth approximation of the pinball loss function is defined as

ϕ_{τ} (u, ϵ) = \frac{u + \sqrt{u^{2} + 4 ϵ^{2}}}{2} + \frac{- τ u + \sqrt{τ^{2} u^{2} + 4 ϵ^{2}}}{2},

(13)

where

τ, ϵ

are non-negative real parameter values.

3. Proposed Work

This section proposes a novel one-parameter family of

C^{2}

loss functions, presents the proof of uniform convergence to the generalized pinball loss function with decreasing parameter, and develops and analyzes an SVM model and a TBSVM model incorporating the novel loss.

3.1. The Proposed Smooth Loss Function

To smooth the generalized pinball loss

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)

, we define functions

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

:

R \times R^{+} \to R^{+}

, which depend on the real variable u and are parameterized by an approximation parameter

μ > 0

, by

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ) = \{\begin{matrix} τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}) - \frac{5}{8} τ_{1}^{2} μ, & \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ < u, \\ \frac{5}{8 τ_{1}^{2} μ^{3}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{4} - \frac{1}{4 τ_{1}^{4} μ^{5}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{6}, & \frac{ϵ_{1}}{τ_{1}} < u \leq \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ, \\ 0, & - \frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}, \\ \frac{5}{8 τ_{2}^{2} μ^{3}} {(u + \frac{ϵ_{2}}{τ_{2}})}^{4} - \frac{1}{4 τ_{2}^{4} μ^{5}} {(u + \frac{ϵ_{2}}{τ_{2}})}^{6}, & - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ \leq u < - \frac{ϵ_{2}}{τ_{2}}, \\ - τ_{2} (u + \frac{ϵ_{2}}{τ_{2}}) - \frac{5}{8} τ_{2}^{2} μ, & u < - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ . \end{matrix}

(14)

where the parameters

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2}

take non-negative real values. Note that when

μ = 0

, we recover the generalized pinball loss

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)

. Figure 1 displays the graphic representation of

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

for various values of the parameter

μ

, with

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2}

fixed (

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2} = 1

).

It is easy to verify that the

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

are

C^{2}

-functions and belong to a category of smoothing functions for

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u) .

Figure 1 hints that the mapping

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

converges to

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)

as

μ \to 0^{+}

, which we will prove.

In the support vector machine models described below, the parameters

τ_{1}

and

τ_{2}

control the asymmetric penalties assigned to positive and negative misclassification errors, respectively, while

ε_{1}

and

ε_{2}

determine the width of the insensitivity region and thus tolerance to noise. The smoothing parameter

μ

governs the trade-off between approximation accuracy and the numerical smoothness: smaller values of

μ

yield a closer approximation to the original generalized pinball loss, whereas larger values improve numerical stability during optimization.

In the following proofs, we will use the symbols

L (u)

and

P (u, μ)

in place of

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)

and

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

, respectively, since the parameters

τ_{1}, τ_{2}, ϵ_{1}, ϵ_{2}

remain fixed.

The next theorem shows that our proposed smooth loss functions indeed approach the generalized pinball loss uniformly as the parameter

μ

tends to zero.

Theorem 1.

Let

L (u)

and

P (u, μ)

be defined as in (11) and (14), respectively. Then

(i): for all $u \in R$ , $0 \leq L (u) - P (u, μ) \leq \frac{5}{8} τ_{0}^{2} μ$ , where $τ_{0}$ = max ${τ_{1}, τ_{2}}$ ,
(ii): $lim_{μ \to 0^{+}} P (u, μ) = L (u)$ uniformly on $R$ .

Proof.

(i) Let

ϵ_{1}, ϵ_{2} \geq 0, τ_{1}, τ_{2} > 0

be fixed and

μ > 0

. In light of the definitions of

L (u)

and

P (u, μ)

, we divide the proof into five cases, depending on the values of the variable u.

Case 1: $\frac{ϵ_{1}}{τ_{1}} + τ_{1} μ < u$ . We obtain

$L (u) - P (u, μ) = τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}) - [τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}) - \frac{5}{8} τ_{1}^{2} μ] = \frac{5}{8} τ_{1}^{2} μ \leq \frac{5}{8} τ_{0}^{2} μ .$
Case 2: $\frac{ϵ_{1}}{τ_{1}} < u \leq \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ$ . We have

$L (u) - P (u, μ) = τ_{1} (u - \frac{ϵ_{1}}{τ_{1}}) - [\frac{5}{8 τ_{1}^{2} μ^{3}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{4} - \frac{1}{4 τ_{1}^{4} μ^{5}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{6}] .$

Set $I (u) = L (u) - P (u, μ)$ , we obtain that

$I^{'} (u) = τ_{1} - [\frac{5}{2 τ_{1}^{2} μ^{3}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{3} - \frac{3}{2 τ_{1}^{4} μ^{5}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{5}] .$

Let $x = u - \frac{ϵ_{1}}{τ_{1}}$ . Then, x lies in the interval $(0, τ_{1} μ]$ , and $I (x) = τ_{1} x - [\frac{5}{8 τ_{1}^{2} μ^{3}} x^{4} - \frac{1}{4 τ_{1}^{4} μ^{5}} x^{6}]$ , so that $I^{'} (x) = τ_{1} - [\frac{5}{2 τ_{1}^{2} μ^{3}} x^{3} - \frac{3}{2 τ_{1}^{4} μ^{5}} x^{5}]$ . The critical points of the function $I^{'} (x)$ are $x = 0, τ_{1} μ$ and $- τ_{1} μ$ , so that $I^{'} (x)$ is monotone on $[0, τ_{1} μ]$ . As $I^{'} (0) = τ_{1} > 0$ and $I^{'} (τ_{1} μ) = 0$ , therefore, $I^{'} (x) \geq 0$ on $(0, τ_{1} μ]$ , and hence $I (u)$ is an increasing function on $\frac{ϵ_{1}}{τ_{1}} < u \leq \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ$ . In particular,

$L (u) - P (u, μ) \leq I {(u)}_{| u = \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ} = \frac{5}{8} τ_{1}^{2} μ \leq \frac{5}{8} τ_{0}^{2} μ .$
Case 3: $\frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}$ . We obtain

$L (u) - P (u, μ) = 0 .$
Case 4: $- \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ \leq u < \frac{ϵ_{2}}{τ_{2}}$ . Proceeding similarly to Case 2, we obtain that $I (u)$ is a decreasing function on $- \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ \leq u \leq \frac{ϵ_{2}}{τ_{2}}$ . Then

$L (u) - P (u, μ) \leq I (- \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ) = \frac{5}{8} τ_{2}^{2} μ \leq \frac{5}{8} τ_{0}^{2} μ .$
Case 5: $u < - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ$ . Proceeding similarly to Case 1, we obtain

$L (u) - P (u, μ) = - τ_{2} (u + \frac{ϵ_{2}}{τ_{2}}) - [- τ_{2} (u + \frac{ϵ_{2}}{τ_{2}}) - \frac{5}{8} τ_{2}^{2} μ] = \frac{5}{8} τ_{2}^{2} μ \leq \frac{5}{8} τ_{0}^{2} μ .$

All five cases show that

$0 \leq sup_{u \in R} [L (u) - P (u, μ)] \leq \frac{5}{8} τ_{0}^{2} μ .$

(15)

(ii) By the Squeeze Theorem, it follows that

lim_{μ \to 0^{+}} P (u, μ) = L (u),

uniformly on

R

, as

μ \to 0^{+}

. □

3.2. The Support Vector Machine with Smooth Loss Function

Consider a dataset

X = {(\vec{x_{i}}, y_{i}) | i = 1, 2, \dots, m}, \vec{x_{i}} \in R^{n}, y_{i} \in {+ 1, - 1}

. We replace the usage of the hinge loss function

L_{hinge} (u)

with the proposed smooth approximation

P (u, μ)

, where

μ

is the parameter. The SVM model (2) changes to the optimization problem

min_{\vec{ω}} φ_{μ} (\vec{ω}) : = \frac{1}{2} {∥ \vec{ω} ∥}^{2} + \frac{C}{m} \sum_{i = 1}^{m} P (1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}, μ),

(16)

where

\vec{ω} = {[{\vec{w}}^{T}, b]}^{T} \in R^{n + 1}, \vec{x_{i}} = {[\vec{x_{i}}, 1]}^{T}

.

In the following, we will demonstrate that each optimization problem in the family defined by (16) admits a unique solution, and that the sequence of solutions for this family converges to the solution of the exact problem:

min_{\vec{ω}} ψ_{0} (\vec{ω}) : = \frac{1}{2} {∥ \vec{ω} ∥}^{2} + \frac{C}{m} \sum_{i = 1}^{m} L (1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}),

(17)

as

μ \to 0^{+}

. The next theorem shows that the objective function of (16) converges to the objective function of (17) uniformly as

μ \to 0^{+}

.

Theorem 2.

Let

τ_{0}

= max

{τ_{1}, τ_{2}}

, then for all

\vec{ω} \in R^{n + 1}

and

μ > 0

,

∥ ψ_{0} (\vec{ω}) - φ_{μ} (\vec{ω}) ∥_{\infty} \leq C (\frac{5}{8} τ_{0}^{2} μ)

.

Proof.

For all

\vec{ω} \in R^{n + 1}

as

L (u) - P (u, μ) \geq 0

,

\begin{matrix} 0 \leq ψ_{0} (\vec{ω}) - φ_{μ} (\vec{ω}) = & \frac{C}{m} \sum_{i = 1}^{m} L (1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}) - \frac{C}{m} \sum_{i = 1}^{m} P (1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}, μ) \\ = & \frac{C}{m} \sum_{i = 1}^{m} [L (u_{i}) - P (u_{i}, μ)] [Note that u_{i} = 1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}] \\ \leq & \frac{C}{m} \sum_{i = 1}^{m} \frac{5}{8} τ_{0}^{2} μ [By (15) in Theorem 1] \\ = & C (\frac{5}{8} τ_{0}^{2} μ) . \end{matrix}

It follows that

∥ ψ_{0} (\vec{ω}) - φ_{μ} (\vec{ω}) ∥_{\infty} \leq C (\frac{5}{8} τ_{0}^{2} μ)

. □

Finally, we show that the solution of (16) converges to the solution of (17) as

μ \to 0^{+}

.

Theorem 3.

Let

φ_{μ} (\vec{ω})

and

ψ_{0} (\vec{ω})

be defined as in (16) and (17), respectively, and let

{\vec{ω}}^{*}

be an optimal solution of problem (17). Then:

(i): There exists a unique solution ${\vec{ω}}_{μ}^{*}$ of problem (16);
(ii): $∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥_{2}^{2} \leq C (\frac{5}{8} τ_{0}^{2} μ)$ ;
(iii): ${\vec{ω}}_{μ}^{*} \to {\vec{ω}}^{*}$ as $μ \to 0^{+}$ .

Proof.

Let

ϵ_{1}, ϵ_{2} \geq 0, τ_{1}, τ_{2} > 0

be fixed.

(i) Let

μ > 0

be arbitrary, but given. For

c \in R

, the set

S_{c} (φ_{μ}) : = {\vec{ω} \in R^{n + 1} : φ_{μ} (\vec{ω}) \leq c}

is called a sublevel set. Clearly, we can pick

c > 0

so that

S_{c} (φ_{μ}) \neq Ø

. Since

φ_{μ}

is continuous, the sublevel set

S_{c} (φ_{μ})

is closed. By the definition of

φ_{μ}

,

S_{c} (φ_{μ}) \subseteq {\vec{ω} \in R^{n + 1} : {\vec{ω}}^{T} \vec{ω} = ∥ \vec{ω} ∥^{2} \leq 2 c} .

Since the set

{\vec{ω} \in R^{n + 1} : {\vec{ω}}^{T} \vec{ω} = ∥ \vec{ω} ∥^{2} \leq 2 c}

is bounded, it follows that

S_{c} (φ_{μ})

is bounded. By

S_{c} (φ_{μ})

closed and bounded, we obtain that

S_{c} (φ_{μ})

is a compact set in

R^{n + 1}

. By the Extreme Value Theorem, it follows that

φ_{μ}

has a minimizer

{\vec{ω}}_{μ}^{*}

on

S_{c} (φ_{μ})

, i.e., Problem (16) has a solution. On the other hand,

P (u, μ)

is convex and

u_{i} = 1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}

is affine in

\vec{ω}

; therefore,

\frac{C}{m} \sum_{i = 1}^{m} P (1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}, μ)

is convex. As

\frac{1}{2} {∥ \vec{ω} ∥}^{2}

is strongly convex, it follows that

φ_{μ} (\vec{ω})

is also strongly convex. By strong convexity, we achieve the uniqueness of the solution of problem (16).

(ii) Let

{\vec{ω}}_{μ}^{*}

and

{\vec{ω}}^{*}

be optimal solutions of (16) and (17), respectively.

Let

\nabla φ_{μ} (\vec{ω})

be the gradient of

φ_{μ} (\vec{ω})

and

\nabla_{0} (\vec{ω})

be subgradients of

ψ_{0} (\vec{ω})

; that is,

\nabla_{0} (\vec{ω}) \in \partial ψ_{0} (\vec{ω}) = \vec{ω} - C \sum_{i = 1}^{m} y_{i} \vec{x_{i}} \partial L (u_{i}), [u_{i} = 1 - y_{i} {\vec{ω}}^{T} \vec{x_{i}}]

where

\partial L (u) = \{\begin{matrix} {τ_{1}}, & \frac{ϵ_{1}}{τ_{1}} < u, \\ [0, τ_{1}], & u = \frac{ϵ_{1}}{τ_{1}}, \\ {0}, & - \frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}, \\ [- τ_{2}, 0], & u = - \frac{ϵ_{2}}{τ_{2}}, \\ {- τ_{2}}, & u < - \frac{ϵ_{2}}{τ_{2}} . \end{matrix}

By strong convexity with parameter 1, we obtain that

ψ_{0} ({\vec{ω}}_{μ}^{*}) - ψ_{0} ({\vec{ω}}^{*}) \geq ({\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*}) \nabla_{0} ({\vec{ω}}^{*}) + \frac{1}{2} {∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥}_{2}^{2},

and

φ_{μ} ({\vec{ω}}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) \geq ({\vec{ω}}^{*} - {\vec{ω}}_{μ}^{*}) \nabla φ_{μ} ({\vec{ω}}_{μ}^{*}) + \frac{1}{2} {∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥}_{2}^{2} .

The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we obtain

ψ_{0} ({\vec{ω}}_{μ}^{*}) - ψ_{0} ({\vec{ω}}^{*}) \geq \frac{1}{2} {∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥}_{2}^{2},

and

φ_{μ} ({\vec{ω}}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) \geq \frac{1}{2} {∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥}_{2}^{2} .

By Theorem 1, we have

ψ_{0} (\vec{ω}) - φ_{μ} (\vec{ω}) = \frac{C}{m} \sum_{i = 1}^{m} L (u_{i}) - \frac{C}{m} \sum_{i = 1}^{m} P (u_{i}, μ) \geq 0, \forall \vec{ω} .

Consider

\begin{matrix} ∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥_{2}^{2} \leq & ψ_{0} ({\vec{ω}}_{μ}^{*}) - ψ_{0} ({\vec{ω}}^{*}) + φ_{μ} ({\vec{ω}}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) \\ = & ψ_{0} ({\vec{ω}}_{μ}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) - [ψ_{0} ({\vec{ω}}^{*}) - φ_{μ} ({\vec{ω}}^{*})] \\ \leq & ψ_{0} ({\vec{ω}}_{μ}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) . \end{matrix}

and by Theorem 2, we achieve

\begin{matrix} 0 \leq ∥ {\vec{ω}}_{μ}^{*} - {\vec{ω}}^{*} ∥_{2}^{2} \leq ψ_{0} ({\vec{ω}}_{μ}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) = & | ψ_{0} ({\vec{ω}}_{μ}^{*}) - φ_{μ} ({\vec{ω}}_{μ}^{*}) | \\ \leq & ∥ ψ_{0} - φ_{μ} ∥_{\infty} \\ \leq & C (\frac{5}{8} τ_{0}^{2} μ) . \end{matrix}

(iii) By the above and the Squeeze Theorem, we obtain that

{\vec{ω}}_{μ}^{*} \to {\vec{ω}}^{*}

as

μ \to 0^{+}

. □

3.3. The Twin-Bounded Support Vector Machine with Smooth Loss Function

The combination of the TBSVM with the new smooth generalized pinball loss function

P (u, μ)

is designed to improve the algorithm’s performance in handling complex or noisy data, which traditional loss functions (such as the hinge loss) often struggle with. Consequently, a novel TBSVM with smooth generalized pinball loss is introduced to enhance the algorithm’s ability to manage noise sensitivity and imbalanced data. This approach not only improves the model’s predictive performance but also ensures faster and more stable computations during the optimization process.

The formulation of the TBSVM model (7) and (8) can be simplified to

min_{\vec{ω_{1}}} Φ_{0}^{(1)} (\vec{ω_{1}}) : = \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2} + c_{3} {∥ \vec{ω_{1}} ∥}^{2} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} L (1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}),

(18)

min_{\vec{ω_{2}}} Φ_{0}^{(2)} (\vec{ω_{2}}) : = \frac{1}{2} \sum_{i = 1}^{m_{2}} {({\vec{ω_{2}}}^{T} {\vec{x_{i}}}^{(2)})}^{2} + c_{4} {∥ \vec{ω_{2}} ∥}^{2} + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} L (1 - {\vec{ω_{2}}}^{T} {\vec{x_{i}}}^{(1)}),

(19)

where

\vec{ω_{1}} = {[{\vec{w_{1}}}^{T}, b_{1}]}^{T}, \vec{ω_{2}} = {[{\vec{w_{2}}}^{T}, b_{2}]}^{T} and {\vec{x_{i}}}^{(1)} = {[{\vec{x_{i}}}^{(1)}, 1]}^{T}, {\vec{x_{i}}}^{(2)} = {[{\vec{x_{i}}}^{(2)}, 1]}^{T}

.

We replace the right-most terms with the new smooth loss function

P (\cdot, μ)

, and obtain the following structure:

min_{\vec{ω_{1}}} ϕ_{μ}^{(1)} (\vec{ω_{1}}) : = \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2} + c_{3} {∥ \vec{ω_{1}} ∥}^{2} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} P (1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}, μ),

(20)

min_{\vec{ω_{2}}} ϕ_{μ}^{(2)} (\vec{ω_{2}}) : = \frac{1}{2} \sum_{i = 1}^{m_{2}} {({\vec{ω_{2}}}^{T} {\vec{x_{i}}}^{(2)})}^{2} + c_{4} {∥ \vec{ω_{2}} ∥}^{2} + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} P (1 - {\vec{ω_{2}}}^{T} {\vec{x_{i}}}^{(1)}, μ) .

(21)

Theorem 4.

Let

τ_{0}

= max

{τ_{1}, τ_{2}}

. Then

∥ Φ_{0}^{(1)} (\vec{ω_{1}}) - ϕ_{μ}^{(1)} (\vec{ω_{1}}) ∥_{\infty} \leq c_{1} (\frac{5}{8} τ_{0}^{2} μ)

and

∥ Φ_{0}^{(2)} (\vec{ω_{2}}) - ϕ_{μ}^{(2)} (\vec{ω_{2}}) ∥_{\infty} \leq c_{2} (\frac{5}{8} τ_{0}^{2} μ)

for

\vec{ω_{1}}, \vec{ω_{2}} \in R^{n + 1}

.

Proof.

Recall from the proof of Theorem 1 that

L (u) - P (u, μ) \leq \frac{5}{8} τ_{0}^{2} μ, \forall u \in R

. Then

\begin{matrix} 0 \leq Φ_{0}^{(1)} (\vec{ω_{1}}) - ϕ_{μ}^{(1)} (\vec{ω_{1}}) = & \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} L ({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)} + 1) - \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} P (1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}, μ) \\ = & \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} [L (u_{i}) - P (u_{i}, μ)] [Note that u_{i} = 1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}] \\ \leq & \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} \frac{5}{8} τ_{0}^{2} μ \\ = & c_{1} (\frac{5}{8} τ_{0}^{2} μ), \end{matrix}

that is,

∥ Φ_{0}^{(1)} (\vec{ω_{1}}) - ϕ_{μ}^{(1)} (\vec{ω_{1}}) ∥_{\infty} \leq c_{1} (\frac{5}{8} τ_{0}^{2} μ)

.

In a similar way, one shows that

∥ Φ_{0}^{(2)} (\vec{ω_{2}}) - ϕ_{μ}^{(2)} (\vec{ω_{2}}) ∥_{\infty} \leq c_{2} (\frac{5}{8} τ_{0}^{2} μ)

. □

Theorem 5.

Let

{\vec{ω}}_{1}^{*}

and

{\vec{ω}}_{2}^{*}

be the optimal solutions of problems (18) and (19), respectively. Then

(i): there exist unique solutions of problems (20) and (21), denoted ${\vec{ω}}_{1}^{μ}$ and ${\vec{ω}}_{2}^{μ}$ , respectively.
(ii): $∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥_{2}^{2} \leq c_{1} (\frac{5}{8} τ_{0}^{2} μ)$ and $∥ {\vec{ω}}_{2}^{μ} - {\vec{ω}}_{2}^{*} ∥_{2}^{2} \leq c_{2} (\frac{5}{8} τ_{0}^{2} μ)$ ,
(iii): ${\vec{ω}}_{1}^{μ} \to {\vec{ω}}_{1}^{*}$ and ${\vec{ω}}_{2}^{μ} \to {\vec{ω}}_{2}^{*}$ as $μ \to 0^{+}$ .

Proof.

(i) Let

μ > 0

be arbitrary, but fixed. Pick

ν > 0

, so that the sublevel set

S_{ν} (ϕ_{μ}^{(1)}) : = {\vec{ω_{1}} \in R^{n + 1} : ϕ_{μ}^{(1)} (\vec{ω_{1}}) \leq ν}

is not empty. Since

ϕ_{μ}^{(1)} (\vec{ω_{1}})

is continuous, the sublevel set

S_{ν} (ϕ_{μ}^{(1)})

is closed. Furthermore, by the definition of

ϕ_{μ}^{(1)}

,

S_{ν} (ϕ_{μ}^{(1)}) \subseteq {\vec{ω_{1}} \in R^{n + 1} : {\vec{ω_{1}}}^{T} \vec{ω_{1}} = ∥ \vec{ω_{1}} ∥^{2} \leq \frac{ν}{c_{3}}} .

Since the set

{\vec{ω_{1}} \in R^{n + 1} : {\vec{ω_{1}}}^{T} \vec{ω_{1}} = ∥ \vec{ω_{1}} ∥^{2} \leq \frac{ν}{c_{3}}}

is bounded, then

S_{ν} (ϕ_{μ}^{(1)})

is bounded. By

S_{ν} (ϕ_{μ}^{(1)})

closed and bounded, we obtain that

S_{ν} (ϕ_{μ}^{(1)})

is a compact set in

R^{n + 1}

. By the Extreme Value Theorem, a solution to problem (20) exists, which we denote by

{\vec{ω}}_{1}^{μ}

. As before, since

P (u, μ)

is convex, then

\frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} P (u_{i}, μ)

is convex.

Next, we show that the first term in (20) is convex in

\vec{ω_{1}}

. In fact, as the function

y = x^{2}

is convex, then

\begin{matrix} {[{(t \vec{v_{1}} + (1 - t) \vec{v_{2}})}^{T} {\vec{x_{i}}}^{(1)}]}^{2} = & {[t {\vec{v_{1}}}^{T} {\vec{x_{i}}}^{(1)} + (1 - t) {\vec{v_{2}}}^{T} {\vec{x_{i}}}^{(1)}]}^{2} \\ \leq & t {({\vec{v_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2} + (1 - t) {({\vec{v_{2}}}^{T} {\vec{x_{i}}}^{(1)})}^{2} \end{matrix}

\forall \vec{v_{1}}, \vec{v_{2}} \in R^{n + 1}, \forall t \in [0, 1]

. This shows that

\vec{ω_{1}} \mapsto {({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2}

is convex

\forall i

, so that

\frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2}

is convex. As

c_{3} {∥ \vec{ω_{1}} ∥}^{2}

is strongly convex and the remaining terms in (20) are convex, it follows that

ϕ_{μ}^{(1)} (\vec{ω_{1}})

is also strongly convex. By strong convexity, we obtain that the solution of problem (20) is unique. For the uniqueness of the solution of problem (21), we proceed in a similar way to obtain a unique solution

{\vec{ω}}_{2}^{μ}

.

(ii) Let

{\vec{ω}}_{1}^{*}

and

{\vec{ω}}_{1}^{μ}

be the optimal solutions of (18) and (20), respectively.

Let

\nabla ϕ_{μ}^{(1)} ({\vec{ω}}_{1})

be the gradient of

ϕ_{μ}^{(1)} ({\vec{ω}}_{1})

and

\nabla_{0}^{(1)}

be a subgradient of

Φ_{0}^{(1)} ({\vec{ω}}_{1})

, that is

\nabla_{0}^{(1)} ({\vec{ω}}_{1}) \in \partial Φ_{0}^{(1)} ({\vec{ω}}_{1}) = \sum_{i = 1}^{m_{1}} {\vec{x_{i}}}^{(1)} {[{\vec{x_{i}}}^{(1)}]}^{T} \vec{ω_{1}} + 2 c_{3} \vec{ω_{1}} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} {\vec{x_{i}}}^{(2)} \partial L (1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}) .

By strong convexity with parameter 1, we obtain that

Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - Φ_{0}^{(1)} ({\vec{ω}}_{1}^{*}) \geq ({\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*}) \nabla_{0}^{(1)} ({\vec{ω}}_{1}^{*}) + \frac{1}{2} {∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥}_{2}^{2},

and

ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{*}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) \geq ({\vec{ω}}_{1}^{*} - {\vec{ω}}_{1}^{μ}) \nabla ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) + \frac{1}{2} {∥ {\vec{ω}}_{1}^{*} - {\vec{ω}}_{1}^{μ} ∥}_{2}^{2} .

The first-order necessary condition for optimality states that the subgradient of a strongly convex function at a local minimum point must contain zero. Thus, we achieve

Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - Φ_{0}^{(1)} ({\vec{ω}}_{1}^{*}) \geq \frac{1}{2} {∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥}_{2}^{2},

and

ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{*}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) \geq \frac{1}{2} ∥ {\vec{ω}}_{1}^{*} - {\vec{ω}}_{1}^{μ} ∥_{2}^{2} = \frac{1}{2} {∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥}_{2}^{2} .

By Theorem 1, we have

Φ_{0}^{(1)} (\vec{ω_{1}}) - ϕ_{μ}^{(1)} (\vec{ω_{1}}) = \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} L (u_{i}^{(1)}) - \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} P (u_{i}^{(1)}, μ) \geq 0,

where

u_{i}^{(1)} = 1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}

.

Consider

\begin{matrix} ∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥_{2}^{2} \leq & Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - Φ_{0}^{(1)} ({\vec{ω}}_{1}^{*}) + ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{*}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) \\ = & Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) - [Φ_{0}^{(1)} ({\vec{ω}}_{1}^{*}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{*})] \\ \leq & Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}), \end{matrix}

and by Theorem 4, we obtain

\begin{matrix} 0 \leq ∥ {\vec{ω}}_{1}^{μ} - {\vec{ω}}_{1}^{*} ∥_{2}^{2} \leq Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) = & | Φ_{0}^{(1)} ({\vec{ω}}_{1}^{μ}) - ϕ_{μ}^{(1)} ({\vec{ω}}_{1}^{μ}) | \\ \leq & ∥ Φ_{0}^{(1)} - ϕ_{μ}^{(1)} ∥_{\infty} \\ \leq & c_{1} (\frac{5}{8} τ_{0}^{2} μ) . \end{matrix}

The inequality

∥ {\vec{ω}}_{2}^{μ} - {\vec{ω}}_{2}^{*} ∥_{2}^{2} \leq c_{2} (\frac{5}{8} τ_{0}^{2} μ)

, can be proven in the same way.

(iii) By the above arguments and the Squeeze Theorem, we have

{\vec{ω}}_{1}^{μ} \to {\vec{ω}}_{1}^{*}

as

μ \to 0^{+}

. Similarly, we obtain

{\vec{ω}}_{2}^{μ} \to {\vec{ω}}_{2}^{*}

as

μ \to 0^{+}

. □

In summary, the above theorems have established the convergence properties of the proposed smooth loss within the TBSVM framework, demonstrating both theoretical soundness and practical relevance. From a practical viewpoint, the established uniform convergence guarantees predictable behavior of the model as the smoothing parameter decreases, while the uniqueness of solutions implies robustness with respect to initialization and numerical perturbations. These properties are particularly important for large-scale or noisy datasets, where ill-posed optimization problems may otherwise lead to unstable training or inconsistent classification results.

3.4. Quasi-Newton Smooth Generalized Pinball Twin-Bounded Support Vector Machine

The Broyden–Fletcher–Goldfarb–Shanno (BFGS) method is one of the most widely used quasi-Newton algorithms, named after its developers. The focus of this section is on the application of the BFGS method, which we used to solve a strongly convex differentiable problem. This approach is used in optimization to find the minimum of a strongly convex and

C^{2}

function. As before, we focus on the strongly convex differentiable problem (20):

\begin{matrix} min_{\vec{ω_{1}}} ϕ_{μ}^{(1)} (\vec{ω_{1}}) : = \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(1)})}^{2} + c_{3} {∥ \vec{ω_{1}} ∥}^{2} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} P (1 + {\vec{ω_{1}}}^{T} {\vec{x_{i}}}^{(2)}, μ) \end{matrix}

If

{\vec{ω_{1}}}_{k}

denotes the value of

\vec{ω_{1}}

obtained in the k-th iteration step, then the gradient of the objective function

ϕ_{μ}^{(1)}

at

{\vec{ω_{1}}}_{k}

is

\nabla ϕ_{μ}^{(1)} ({\vec{ω_{1}}}_{k}) = \sum_{i = 1}^{m_{1}} {\vec{x_{i}}}^{(1)} {[{\vec{x_{i}}}^{(1)}]}^{T} {\vec{ω_{1}}}_{k} + 2 c_{3} {\vec{ω_{1}}}_{k} + \frac{c_{1}}{m_{2}} \sum_{i = 1}^{m_{2}} {\vec{x_{i}}}^{(2)} \partial P (u_{i_{k}}, μ),

where

u_{i_{k}} = 1 + {\vec{ω_{1}}}_{k}^{T} {\vec{x_{i}}}^{(2)}

and the partial derivative of

P (u, μ)

with respect to the variable u is

\partial P (u, μ) = \{\begin{matrix} τ_{1}, & \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ < u, \\ \frac{5}{2 τ_{1}^{2} μ^{3}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{3} - \frac{3}{2 τ_{1}^{4} μ^{5}} {(u - \frac{ϵ_{1}}{τ_{1}})}^{5}, & \frac{ϵ_{1}}{τ_{1}} < u \leq \frac{ϵ_{1}}{τ_{1}} + τ_{1} μ, \\ 0, & - \frac{ϵ_{2}}{τ_{2}} \leq u \leq \frac{ϵ_{1}}{τ_{1}}, \\ \frac{5}{2 τ_{2}^{2} μ^{3}} {(u + \frac{ϵ_{2}}{τ_{2}})}^{3} - \frac{3}{2 τ_{2}^{4} μ^{5}} {(u + \frac{ϵ_{2}}{τ_{2}})}^{5}, & - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ \leq u < - \frac{ϵ_{2}}{τ_{2}}, \\ - τ_{2}, & u < - \frac{ϵ_{2}}{τ_{2}} - τ_{2} μ . \end{matrix}

The BFGS method is outlined as follows:

{\vec{ω_{1}}}_{k + 1} = {\vec{ω_{1}}}_{k} + α_{k} d_{k} .

(22)

where

d_{k} = - F {(x_{k})}^{- 1} \nabla f (x_{k})

and

α_{k}

is determined by the Armijo condition.

Let

B_{k}

be an approximation of the Hessian matrix

\nabla^{2} ϕ_{μ}^{(1)} ({\vec{ω_{1}}}_{k})

. For the next iteration, the Hessian matrix is updated by the Sherman–Morrison–Woodbury formula, that is,

B_{k + 1} = B_{k} - \frac{B_{k} Δ x_{k} Δ x_{k}^{T} B_{k}}{Δ x_{k}^{T} B_{k} Δ x_{k}} + \frac{Δ g_{k} Δ g_{k}^{T}}{Δ x_{k}^{T} Δ g_{k}} .

where

Δ x_{k} = {\vec{ω_{1}}}_{k + 1} - {\vec{ω_{1}}}_{k},

and

Δ g_{k} = \nabla ϕ_{μ}^{(1)} ({\vec{ω_{1}}}_{k + 1}) - \nabla ϕ_{μ}^{(1)} ({\vec{ω_{1}}}_{k}) .

The BFGS algorithm for the strongly convex differentiable problem (21) can be computed similarly to the problem (20).

3.5. The Kernel Trick

Many real-world datasets have complex structures where classes cannot be separated by a hyperplane. To address this, in the SVM or TBSVM models, one maps the original feature space into a higher-dimensional feature space in which separating hyperplanes can be found. It turns out that the mapping itself need not be known, this is called the kernel trick, as we explain now.

Let

Φ : R^{n} \to H

be a mapping, where

H

is a Hilbert space, called the feature map. Here again, n is the number of features in the given dataset. Let

\tilde{H}

denote the linear span of

{Φ (\vec{x_{1}}), \dots, Φ (\vec{x_{m}})}

, so that

\tilde{H}

is a finite-dimensional subspace of

H

that is typically of high dimension. For convenience, we relabel the sets of positive and negative data samples by

{\vec{x_{i}}}_{i = 1}^{m_{1}}

and

{\vec{x_{i}}}_{i = m_{1} + 1}^{m}

, respectively. We build our TBSVM in

\tilde{H}

, where (20) and (21) become

min_{\vec{w_{1}}, b_{1}} \frac{1}{2} \sum_{i = 1}^{m_{1}} {({\vec{w_{1}}}^{T} Φ (\vec{x_{i}}) + b_{1})}^{2} + c_{3} (∥ \vec{w_{1}} ∥^{2} + b_{1}^{2}) + \frac{c_{1}}{m_{2}} \sum_{i = m_{1} + 1}^{m} P (1 + ({\vec{w_{1}}}^{T} Φ (\vec{x_{i}}) + b_{1}), μ),

(23)

min_{\vec{w_{2}}, b_{2}} \frac{1}{2} \sum_{i = m_{1} + 1}^{m} {({\vec{w_{2}}}^{T} Φ (\vec{x_{i}}) + b_{2})}^{2} + c_{4} (∥ \vec{w_{2}} ∥^{2} + b_{2}^{2}) + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} P (1 - ({\vec{w_{2}}}^{T} Φ (\vec{x_{i}}) + b_{2}), μ) .

(24)

Consider the symmetric mapping

K : R^{n} \times R^{n} \to R

, called the kernel, given by

K (\vec{y_{1}}, \vec{y_{2}}) = Φ {(\vec{y_{1}})}^{T} Φ (\vec{y_{2}}) = Φ {(\vec{y_{2}})}^{T} Φ (\vec{y_{1}}) = K (\vec{y_{2}}, \vec{y_{1}}), \vec{y_{i}} \in R^{n} .

Its Gramian Matrix is

X = {[K (\vec{x_{i}}, \vec{x_{j}})]}_{i, j} = [\begin{matrix} K (\vec{x_{1}}, \vec{x_{1}}) & K (\vec{x_{1}}, \vec{x_{2}}) & \dots & K (\vec{x_{1}}, \vec{x_{m}}) \\ K (\vec{x_{2}}, \vec{x_{1}}) & K (\vec{x_{2}}, \vec{x_{2}}) & \dots & K (\vec{x_{2}}, \vec{x_{m}}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ K (\vec{x_{m}}, \vec{x_{1}}) & K (\vec{x_{m}}, \vec{x_{2}}) & \dots & K (\vec{x_{m}}, \vec{x_{m}}) \end{matrix}] .

As

\vec{w_{1}}, \vec{w_{2}} \in \tilde{H}

, we can write

\vec{w_{1}} = \sum_{j = 1}^{m} α_{j} Φ (\vec{x_{j}}) and \vec{w_{2}} = \sum_{j = 1}^{m} β_{j} Φ (\vec{x_{j}})

where

α_{j}, β_{j} \in R

need not be unique. Then (23) becomes

\begin{matrix} min_{\vec{α}, b_{1}} & \frac{1}{2} \sum_{i = 1}^{m_{1}} {({(\sum_{j = 1}^{m} α_{j} Φ (\vec{x_{j}}))}^{T} Φ (\vec{x_{i}}) + b_{1})}^{2} + c_{3} ({(\sum_{j = 1}^{m} α_{j} Φ (\vec{x_{j}}))}^{T} (\sum_{k = 1}^{m} α_{k} Φ (\vec{x_{k}})) + b_{1}^{2}) \\ + \frac{c_{1}}{m_{2}} \sum_{i = m_{1} + 1}^{m} P (1 + \sum_{j = 1}^{m} (α_{j} Φ {(\vec{x_{j}})}^{T} Φ (\vec{x_{i}}) + b_{1}), μ), \end{matrix}

(25)

where

\vec{α} = {(α_{1}, α_{2}, \dots, α_{m})}^{T} \in R^{m}, b_{1} \in R

, or equivalently,

\begin{matrix} min_{\vec{α}, b_{1}} & \frac{1}{2} \sum_{i = 1}^{m_{1}} {(\sum_{j = 1}^{m} K (\vec{x_{i}}, \vec{x_{j}}) α_{j} + b_{1})}^{2} + c_{3} ((\sum_{j, k = 1}^{m} α_{j} α_{k} K (\vec{x_{j}}, \vec{x_{k}})) + b_{1}^{2}) \\ + \frac{c_{1}}{m_{2}} \sum_{i = m_{1} + 1}^{m} P (1 + \sum_{j = 1}^{m} (K (\vec{x_{i}}, \vec{x_{j}}) α_{j} + b_{1}), μ), \end{matrix}

(26)

that is,

min_{\vec{α}, b_{1}} \frac{1}{2} \sum_{i = 1}^{m_{1}} {(X_{i} \vec{α} + b_{1})}^{2} + c_{3} ({\vec{α}}^{T} X \vec{α} + b_{1}^{2}) + \frac{c_{1}}{m_{2}} \sum_{i = m_{1} + 1}^{m} P (1 + (X_{i} \vec{α} + b_{1}), μ),

(27)

where

X_{i}

denotes the i-th row of

X

. Similarly, (24) changes to

min_{\vec{β}, b_{2}} \frac{1}{2} \sum_{i = m_{1} + 1}^{m} {(X_{i} \vec{β} + b_{2})}^{2} + c_{4} ({\vec{β}}^{T} X \vec{β} + b_{2}^{2}) + \frac{c_{2}}{m_{1}} \sum_{i = 1}^{m_{1}} P (1 - (X_{i} \vec{β} + b_{2}), μ)

(28)

with

\vec{β} = {(β_{1}, β_{2}, \dots, β_{m})}^{T}

.

An unknown sample point

\vec{x} \in R^{n}

is assigned to class i

(i = + 1 or i = - 1)

by the following:

class (\vec{x}) = sgn [\frac{{\vec{w_{1}}}^{T} Φ (\vec{x}) + b_{1}}{∥ \vec{w_{1}} ∥} + \frac{{\vec{w_{2}}}^{T} Φ (\vec{x}) + b_{2}}{∥ \vec{w_{2}} ∥}] .

(29)

Since

\vec{w_{1}} = \sum_{i = 1}^{m} α_{i} Φ (\vec{x_{i}})

, then

{\vec{w_{1}}}^{T} Φ (\vec{x}) = \sum_{i = 1}^{m} α_{i} K (\vec{x_{i}}, \vec{x}),

and

∥ \vec{w_{1}} ∥^{2} = {\vec{w_{1}}}^{T} \vec{w_{1}} = \sum_{i, j = 1}^{m} α_{i} K (\vec{x_{i}}, \vec{x_{j}}) α_{j} = {\vec{α}}^{T} X \vec{α},

and similarly for

\vec{w_{2}}

, so that (29) becomes

class (\vec{x}) = sgn [\frac{b_{1} + \sum_{i = 1}^{m} α_{i} K (\vec{x_{i}}, \vec{x})}{\sqrt{{\vec{α}}^{T} X \vec{α}}} + \frac{b_{2} + \sum_{i = 1}^{m} β_{i} K (\vec{x_{i}}, \vec{x})}{\sqrt{{\vec{β}}^{T} X \vec{β}}}] .

(30)

Equations (27), (28) and (30) show that knowledge of the kernel suffices for building a TBSVM model; the specific feature map

Φ

need not be known.

4. Numerical Experiments

This section illustrates the performance of our proposed algorithm through experimental results with a selection of nine datasets from the UCI dataset collection [17]. The UCI benchmark datasets provide standardized and widely accepted testbeds for classification algorithms. The datasets chosen are the Australian, Diabetes, Ionosphere, Monk2, Phoneme, Ring, Saheart, Spectfheart, and Twonorm datasets, as shown in Table 1.

All computations were performed with Python3 using the numpy and sklearn packages under the Linux operating system. In the following, we present a comparison of three TBSVM algorithms that differ in the loss functions used: the smooth approximation to the pinball loss function

ϕ_{τ_{1}} (u, ϵ_{1})

of (13), the generalized pinball loss function

L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)

of (11), and our smooth generalized pinball loss function

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

of (14).

For each algorithm, random grid search was used to optimize the parameters of the TBSVM model and the parameter

μ

of our smooth generalized pinball loss function

P (u, μ)

, as shown in Table 2. After having obtained the best parameters, we trained our proposed algorithm using these parameters.

The experimental results are presented in Table 3, Table 4, Table 5 and Table 6. Accuracy (in %), standard deviation, and training time (in seconds) are used for evaluation, denoted as Acc, sd, and time (s), respectively. The numbers in bold show the best results for each row.

4.1. Linear Models

In this step, to assess the performance of the classifiers, a five-fold cross-validation technique was employed for all experiments. We divided the experiment into two cases, “fixed splits” and “variable splits”. In the case of fixed splits, the same five-fold cross-validation splits used in parameter optimization were also used in evaluation. In case of variable splits, the average results of 50 different tests are reported, where in each test the five-fold cross-validation splits were different, which is a more realistic scenario than with fixed splits.

The results of the fixed split experiments in Table 3 show that our smooth loss achieved the highest accuracies in three datasets (Diabetes, Spectfheart, Twonorm). As for the variable split experiments, Table 4 shows that our smooth loss achieved the highest accuracies in two datasets (Australian and Ring). It can be seen that the generalized pinball loss and our smooth loss show similar accuracies overall in the TBSVM model, although the training times differ. However, smooth pinball loss takes less training time than our proposed algorithm for all datasets.

Table 3. Linear TBSVM performance with various loss functions, fixed splits (Acc ± sd).

	Loss Function
Dataset	$ϕ_{τ_{1}} (u, ϵ_{1})$	$L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$	$P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$
Australian	88.4058 ± 2.2915	87.8261 ± 2.1201	87.8261 ± 2.5681
time (s)	0.21679	0.370989	0.877043
Diabetes	77.8593 ± 2.5482	77.4688 ± 3.6535	77.8627 ± 2.2755
time (s)	0.092696	0.163853	0.253656
Monk2	83.3280 ± 4.5861	87.7252 ± 3.9495	87.0356 ± 4.0527
time (s)	0.047582	0.230813	0.178027
Phoneme	77.8127 ± 0.7475	77.9793 ± 0.4456	77.9237 ± 0.5031
time (s)	0.335106	0.480017	0.678361
Saheart	74.0159 ± 5.2307	73.8125 ± 4.4559	73.8055 ± 3.6154
time (s)	0.132691	0.186257	0.246309
Spectfheart	81.6562 ± 5.7405	81.6771 ± 4.9851	83.9064 ± 5.6250
time (s)	0.851100	4.725302	1.593111
Twonorm	97.8649 ± 0.2550	97.9054 ± 0.3963	97.9324 ± 0.3954
time (s)	0.430177	0.704223	0.658462
Ring	76.7973 ± 1.7075	76.9189 ± 1.5664	76.8378 ± 1.7058
time (s)	0.556805	0.667775	0.902595
Ionosphere	89.4567 ± 2.6563	89.1751 ± 2.7936	88.8893 ± 3.3052
time (s)	0.399795	0.56281	0.680536

Table 4. Linear TBSVM performance with various loss functions, variable splits (Acc ± sd).

	Loss Function
Dataset	$ϕ_{τ_{1}} (u, ϵ_{1})$	$L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$	$P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$
Australian	86.6000 ± 2.3183	86.7130 ± 2.3182	86.9768 ± 2.5594
time (s)	0.218793	0.362228	0.813517
Diabetes	75.6744 ± 3.1206	76.6450 ± 2.7918	75.4544 ± 2.8918
time (s)	0.114313	0.159036	0.195348
Monk2	79.8147 ± 5.4047	86.4165 ± 3.6584	83.7312 ± 4.2069
time (s)	0.047766	0.210939	0.177863
Phoneme	77.4807 ± 1.0521	77.7872 ± 1.1616	76.9989 ± 1.4930
time (s)	0.338369	0.497339	0.747009
Saheart	72.2705 ± 3.7282	72.6528 ± 4.3875	72.6122 ± 3.9976
time (s)	0.123355	0.183273	0.270993
Spectfheart	77.2806 ± 7.2642	79.0070 ± 4.9500	78.5426 ± 7.4243
time (s)	0.772305	5.034258	1.382591
Twonorm	97.7586 ± 0.3190	97.8059 ± 0.3403	97.7232 ± 0.3141
time (s)	0.413597	0.710938	0.656289
Ring	76.3803 ± 0.9844	76.6319 ± 0.9745	76.6335 ± 0.9697
time (s)	0.538940	0.678837	0.889948
Ionosphere	86.9708 ± 3.6478	88.3367 ± 3.4795	88.1153 ± 3.2979
time (s)	0.464421	0.563288	0.726181

4.2. Non-Linear Models

Table 5 and Table 6 show the experimental results using the kernel trick applied to the most commonly used kernel, the RBF kernel. This kernel is of the form

K (\vec{x}, \vec{y}) = e^{- γ ∥ \vec{x} - \vec{y} ∥^{2}}

where

γ > 0

is a parameter. However, to reduce problem size and thus decrease computation time, we have made use of the RBF sampler. This is a randomized technique that avoids utilizing kernels and the huge matrices that appear in the presence of large datasets, thus reducing problem size and computation time. It functions by approximating the feature map of the RBF kernel, mapping the given data into a vector space of substantially lower dimension than the RBF feature space, in which the linear vector machine models can still be applied.

Table 5. Kernel TBSVM performance with various loss functions, fixed splits (Acc ± sd).

	Loss Function
Dataset	$ϕ_{τ_{1}} (u, ϵ_{1})$	$L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$	$P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$
Australian	82.4638 ± 2.3098	83.6232 ± 2.8102	84.0580 ± 1.8332
time (s)	1.736451	0.666318	2.079387
Diabetes	74.9979 ± 1.8926	75.3875 ± 3.2180	75.9087 ± 4.1264
time (s)	5.969881	1.443527	1.220819
Monk2	91.4274 ± 1.7705	95.5948 ± 1.5548	94.8998 ± 1.5939
time (s)	0.772924	3.983829	1.956448
Phoneme	79.4781 ± 0.5841	79.8668 ± 0.6885	80.3479 ± 0.9708
time (s)	2.184906	2.325309	3.43196
Saheart	74.0112 ± 4.6302	74.0089 ± 5.8561	74.2380 ± 4.2839
time (s)	1.177418	1.585177	1.247391
Spectfheart	83.5360 ± 4.6123	83.1586 ± 4.0473	84.6681 ± 5.5610
time (s)	0.471961	1.081096	1.35906
Twonorm	93.7162 ± 0.8547	94.6486 ± 0.5881	94.4595 ± 0.7914
time (s)	4.962333	2.749745	2.804496
Ring	95.0676 ± 0.4441	96.0000 ± 0.4041	95.7973 ± 0.5429
time (s)	1.994101	3.135659	3.292427
Ionosphere	90.3219 ± 2.8753	93.1630 ± 2.4558	91.7384 ± 3.5440
time (s)	0.955793	2.926564	2.245855

Table 6. Kernel TBSVM performance with various loss functions, variable splits (Acc ± sd).

	Loss Function
Dataset	$ϕ_{τ_{1}} (u, ϵ_{1})$	$L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$	$P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$
Australian	79.9333 ± 3.2045	81.1130 ± 3.1104	81.9826 ± 3.0698
time (s)	1.100957	0.692429	1.772433
Diabetes	70.8914 ± 3.5184	74.4581 ± 3.1129	73.7404 ± 3.0545
time (s)	2.557176	1.513731	1.235133
Monk2	89.4627 ± 4.2806	94.3527 ± 2.2290	93.3532 ± 2.7422
time (s)	1.751879	4.059142	2.015005
Phoneme	78.1296 ± 2.2966	79.8060 ± 1.0184	78.9168 ± 1.7498
time (s)	2.553766	1.772117	4.101333
Saheart	69.8903 ± 4.3887	71.9447 ± 4.3071	72.1198 ± 3.8393
time (s)	1.996333	1.579952	1.461621
Spectfheart	72.5744 ± 10.0496	79.9280 ± 4.8487	80.0032 ± 4.6938
time (s)	0.864064	1.085297	1.819914
Twonorm	93.3878 ± 0.7616	94.0049 ± 0.9483	93.8722 ± 0.7545
time (s)	6.243437	2.917959	2.576174
Ring	94.9878 ± 0.4759	94.9262 ± 1.6240	93.4192 ± 3.6378
time (s)	1.546431	3.376859	3.962855
Ionosphere	86.9302 ± 5.1465	91.2884 ± 2.8932	90.3624 ± 3.3991
time (s)	1.512256	4.066119	2.140556

Overall, our proposed smooth loss function can deliver the highest accuracy on five datasets (Australian, Diabetes, Phoneme, Saheart, Spectfheart). Moreover, our smooth loss function takes less training time with three of the nine datasets, namely Diabetes, Saheart, and Twonorm.

4.3. Noise Sensitivity

The generalized pinball loss function [7] has shown to lead to reduced noise sensitivity and improved stability during resampling. To evaluate the sensitivity of our smooth loss functions, normally distributed noise with a mean of zero and standard deviation (r) was added to the selected UCI datasets at ratios of r = 0.02, 0.05, 0.10, 0.15, 0.20, 0.25, and 0.30 standard deviation to test the noise sensitivity of the algorithms.

As the Twonorm dataset is the dataset that gives the best performance, its results are presented first. Figure 2 shows the performance of the BFGS algorithm applied to the TBSVM model at different noise levels. Loss functions implemented are our proposed loss (labeled BFGS-SGPTBSVM in the figures), the generalized pinball loss (BFGS-GPTBSVM), and the smooth approximation to the pinball loss (BFGS-SPTBSVM).

From Figure 2, we observe that as noise increases, the accuracy of all algorithms tends to decrease. Figure 3 shows the performance of different noise levels for the Australian dataset. We observe that, overall, our smooth generalized pinball loss function retains the noise insensitivity property of the generalized pinball loss function and shows lower noise sensitivity on the Australian dataset in most cases.

The experimental results indicate that the proposed method exhibits stable performance across repeated runs, as reflected by relatively small standard deviations. Moreover, the smoothing parameter

μ

allows a controlled trade-off between approximation accuracy and numerical stability. While the proposed smooth generalized pinball loss involves higher-order polynomial terms and may introduce additional per-iteration computational cost compared with simpler pinball losses, the resulting twice differentiable objective function guarantees efficient quasi-Newton optimization. This partially compensates for the increased complexity, leading to acceptable overall training times.

Although widely used, the UCI benchmark datasets may not fully reflect the complexity of highly structured, large-scale, or application-specific data. Therefore, the reported experiments primarily serve to validate the general effectiveness and stability of the proposed loss function. Future work will investigate the performance of the proposed approach on more challenging datasets, including imbalanced and domain-specific problems.

5. Conclusions and Discussion

This paper has introduced a family of smooth generalized pinball loss functions to address the issues associated with non-differentiability found in traditional loss functions, such as the hinge loss, pinball loss, and generalized pinball loss function. In addition, a novel twin-bounded support vector machine model with a smooth generalized pinball loss function is proposed. We proved that the generalized pinball loss function can be approximated by the proposed smooth generalized pinball loss function in the uniform norm with arbitrary precision, and that the solution of our TBSVM model is unique and converges to that of the non-smooth problem. In experiments, we selected nine UCI datasets and used a quasi-Newton method to solve the corresponding strongly convex unconstrained optimization problems with twice continuously differentiable objective functions. We then compared the proposed BFGS-SGPTBSVM algorithm with BFGS-GPTBSVM and BFGS-SPTBSVM algorithms in terms of classification performance, accuracy, and computational speed. From the numerical experiments, we found that the proposed BFGS-SGPTBSVM algorithm shows the best performance for the TBSVM with RBFSampler.

The proposed smooth generalized pinball loss-based TBSVM is particularly suitable for classification problems involving noisy, imbalanced, or asymmetric data distributions, where robustness and stable optimization are critical. For problems requiring explicit physical constraints or domain-specific modeling, hybrid loss formulations may be more appropriate.

In future studies, we plan to assess the performance of our model by experiments with complex, large-scale datasets. Moreover, we will evaluate sensitivity to hyperparameters and improve the techniques of parameter optimization for speed and efficacy enhancements. Finally, we will further apply our proposed loss function to other support vector machine models.

Author Contributions

Conceptualization, P.Y. and E.S.; Methodology, P.Y. and E.S.; Software, P.S., P.Y. and E.S.; Validation, P.S., P.Y. and E.S.; Formal analysis, P.S., P.Y. and E.S.; Investigation, P.S., P.Y. and E.S.; Resources, P.S., P.Y. and E.S.; Data curation, P.S., P.Y. and E.S.; Writing—original draft, P.S., P.Y. and E.S.; Writing—review and editing, P.S., P.Y. and E.S.; Visualization, P.Y. and E.S.; Supervision, P.Y. and E.S.; Project administration, P.Y. and E.S.; Funding acquisition, P.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Suranaree University of Technology and the Development and Promotion of Science and Technology Talents Project Scholarship.

Data Availability Statement

The original data presented in the study are openly available in the UCI Machine Learning Repository at https://archive.ics.uci.edu (accessed on 5 June 2024).

Acknowledgments

The first author (P.S.) wishes to express thanks for the financial support received from the School of Mathematical Sciences Geoinformatics at SUT, and through a Scholarship of the Development and Promotion of Science and Technology Talents Project.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SVM	Support Vector Machine
TWSVM	Twin Support Vector Machine
TBSVM	Twin-Bounded Support Vector Machine
BFGS	Broyden–Fletcher–Goldfarb–Shanno

References

Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Joachims, T. Text categorization with support vector machines: Learning with many relevant features. In Machine Learning: ECML-98, Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 21–23 April 1998; Nédellec, C., Rouveirol, C., Eds.; Springer: Berlin/Heidelberg, Germany, 1998; pp. 137–142. [Google Scholar] [CrossRef]
Yin, H.; Jiao, X.; Chai, Y.; Fang, B. Scene classification based on single-layer SAE and SVM. Expert Syst. Appl. 2015, 42, 3368–3380. [Google Scholar] [CrossRef]
Khemchandani, R.; Chandra, S. Twin support vector machines for pattern classification. IEEE Trans. Pattern Anal. Mach. Intell. 2007, 29, 905–910. [Google Scholar] [CrossRef] [PubMed]
Shao, Y.H.; Zhang, C.H.; Wang, X.B.; Deng, N.Y. Improvements on twin support vector machines. IEEE Trans. Neural Netw. 2011, 22, 962–968. [Google Scholar] [CrossRef] [PubMed]
Huang, X.; Shi, L.; Suykens, J.A. Support vector machine classifier with pinball loss. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 984–997. [Google Scholar] [CrossRef] [PubMed]
Rastogi, R.; Pal, A.; Chandra, S. Generalized Pinball Loss SVMs. Neurocomputing 2018, 322, 151–165. [Google Scholar] [CrossRef]
Makmuang, D.; Ratiphaphongthon, W.; Wangkeeree, R. Smooth support vector machine with generalized pinball loss for pattern classification. J. Supercomput. 2023, 79, 11684–11706. [Google Scholar] [CrossRef]
Li, K.; Lv, Z. Smooth twin bounded support vector machine with pinball loss. Appl. Intell. 2021, 51, 5489–5505. [Google Scholar] [CrossRef]
Shi, Y.; Zhang, L.; Wang, Z.; Li, X. Smooth and semi-smooth pinball twin support vector machine. Expert Syst. Appl. 2023, 226, 120084. [Google Scholar] [CrossRef]
Shan, X.; Zhang, Z.; Li, X.; Xie, Y.; You, J. Robust online support vector regression with truncated ϵ-insensitive pinball loss. Mathematics 2023, 11, 709. [Google Scholar] [CrossRef]
Li, F.; Yang, H. A novel bounded loss framework for support vector machines. Neural Netw. 2024, 178, 104–118. [Google Scholar]
Wang, L.; Liu, Z. The SVM classifier with quartic truncated pinball loss. Appl. Math. 2025, 16, 245–260. [Google Scholar] [CrossRef]
Diao, S. Support vector machine classifier with rescaled huberized pinball loss. arXiv 2025, arXiv:2511.22065. [Google Scholar] [CrossRef]
Song, L.K.; Tao, F.; Peng, G.Z. Mixed loss-guided modular regression for dependent system reliability. Reliab. Eng. Syst. Saf. 2026, 267, 111898. [Google Scholar] [CrossRef]
Shifei, D.; Junzhao, Y.; Bingjuan, Q.; Huajuan, H. An overview on twin support vector machines. Artif. Intell. Rev. 2014, 42, 245–252. [Google Scholar] [CrossRef]
Dua, D.; Graff, C. UCI Machine Learning Repository. 2019. Available online: http://archive.ics.uci.edu/ml (accessed on 5 June 2024).

Figure 1.

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

for various values of

μ

.

Figure 1.

P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)

for various values of

μ

.

Figure 2. Performance of different noise levels for the Twonorm dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.

Figure 3. Performance of different noise levels for the Australian dataset. (a) Fixed splits; linear model, (b) Fixed splits; RBFSampler, (c) Variable splits; linear model and (d) Variable splits; RBFSampler.

Table 1. Properties of the nine UCI datasets.

Dataset Name	Number of Instances	Number of Features
Australian	690	14
Diabetes	768	8
Monk2	432	6
Phoneme	5404	5
Saheart	462	9
Spectfheart	267	44
Twonorm	7400	20
Ring	7400	20
Ionosphere	351	33

Table 2. The range of parameters in the random grid search. The symbol ‘+’ denotes the range for the positive TBSVM, and ‘−’ for the negative TBSVM.

Model/Loss Function	Parameter Ranges for the Loss Functions
$ϕ_{τ_{1}} (u, ϵ_{1})$ , $L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$ , $P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$	$ϵ_{1}^{+} : 0.01, 0.05, 0.1, 0.2, 0.25, 0.45, 0.6, 0.7, 0.8, 1, 2$
	$ϵ_{1}^{-} : 0.01, 0.05, 0.1, 0.2, 0.25, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 1$
	$τ_{1}^{+} : 0.1, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2$
	$τ_{1}^{-} : 0.1, 0.3, 0.4, 0.45, 0.5, 0.6, 0.7, 0.8, 0.9, 1, 1.2$
$L_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u)$ , $P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$	$ϵ_{2}^{+} : 0.1, 0.25, 0.45, 0.7, 0.8, 1, 1.5, 2$
	$ϵ_{2}^{-} : 0.1, 0.25, 0.45, 0.5, 0.7, 0.8, 1, 1.5, 2$
	$τ_{2}^{+} : 0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1, 3.5$
	$τ_{2}^{-} : 0.1, 0.3, 0.5, 0.7, 0.9, 1, 2$
$P_{τ_{1}, τ_{2}}^{ϵ_{1}, ϵ_{2}} (u, μ)$	$μ^{+} : 0.01, 0.2, 0.4, 0.6, 0.8, 1, 2$
	$μ^{-} : 0.01, 0.2, 0.4, 0.6, 1, 2$
TBSVM penalties	$c_{1}, c_{2}, c_{3}, c_{4} : 0.1, 0.2, 0.4, 0.8, 1.2, 1.6, 2, 3.2, 6, 10, 20, 50, 80$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Srichok, P.; Yimmuang, P.; Schulz, E. A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics 2026, 14, 549. https://doi.org/10.3390/math14030549

AMA Style

Srichok P, Yimmuang P, Schulz E. A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics. 2026; 14(3):549. https://doi.org/10.3390/math14030549

Chicago/Turabian Style

Srichok, Patcharapa, Panu Yimmuang, and Eckart Schulz. 2026. "A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss" Mathematics 14, no. 3: 549. https://doi.org/10.3390/math14030549

APA Style

Srichok, P., Yimmuang, P., & Schulz, E. (2026). A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss. Mathematics, 14(3), 549. https://doi.org/10.3390/math14030549

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Twin-Bounded Support Vector Machine with Smooth Generalized Pinball Loss

Abstract

1. Introduction

2. Background and Literature Review

2.1. The Support Vector Machine

2.2. The Twin Support Vector Machine

2.3. The Twin-Bounded Support Vector Machine

2.4. Loss Functions

3. Proposed Work

3.1. The Proposed Smooth Loss Function

3.2. The Support Vector Machine with Smooth Loss Function

3.3. The Twin-Bounded Support Vector Machine with Smooth Loss Function

3.4. Quasi-Newton Smooth Generalized Pinball Twin-Bounded Support Vector Machine

3.5. The Kernel Trick

4. Numerical Experiments

4.1. Linear Models

4.2. Non-Linear Models

4.3. Noise Sensitivity

5. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI