Robust Support Vector Data Description with Truncated Loss Function for Outliers Depression

Support vector data description (SVDD) is widely regarded as an effective technique for addressing anomaly detection problems. However, its performance can significantly deteriorate when the training data are affected by outliers or mislabeled observations. This study introduces a universal truncated loss function framework into the SVDD model to enhance its robustness and employs the fast alternating direction method of multipliers (ADMM) algorithm to solve various truncated loss functions. Moreover, the convergence of the fast ADMM algorithm is analyzed theoretically. Within this framework, we developed the truncated generalized ramp, truncated binary cross entropy, and truncated linear exponential loss functions for SVDD. We conducted extensive experiments on synthetic and real-world datasets to validate the effectiveness of these three SVDD models in handling data with different noise levels, demonstrating their superior robustness and generalization capabilities compared to other SVDD models.


Introduction
Anomaly detection refers to the identification of data points in a dataset that deviate from normal behavior.These deviations are known as anomalies or outliers in various application domains.This mode of detection is extensively used in real-world settings, including credit card detection, insurance detection, cybersecurity intrusion detection, error detection in security systems, and military activity monitoring [1,2].However, the acquisition of anomaly data in practical applications, such as medical diagnostics, machine malfunction detection, and circuit quality inspection, is expensive [3,4].Consequently, there is significant interest in one-class classification (OCC) problems, where training samples only include normal data (also known as target data) or normal data with a small number of anomalies (also referred to as non-target data) [5][6][7].In this context, it is important to define the following terms:

•
Normal data: Normal data refers to data points that conform to the characteristics and behavior patterns of the majority of data points within a dataset.They represent the normal operating state of a system or process.• Anomalies: Anomalies are data points that significantly deviate from normal patterns and usually reflect actual problems or critical events in the system.

•
Outliers: Outliers are data points that are significantly different from other data points in the dataset, which may be due to natural fluctuations, special circumstances, or noise.

•
Noise: Noise refers to irregular, random errors or fluctuations, usually caused by measurement errors or data entry mistakes, and does not reflect the actual state of the system.
Entropy 2024, 26, 628.https://doi.org/10.3390/e26080628https://www.mdpi.com/journal/entropySupport vector data description (SVDD) is a method extensively used for one-class classifications (OCCs) [8].The core idea of SVDD is to construct a hyper-sphere of minimal volume that encompasses all (or most) of the target class samples.Data points inside the hyper-sphere are considered normal, while those outside are considered anomalies.SVDD can be easily integrated with popular kernel methods or some deep neural network models [7], making it highly scalable and flexible.Due to these attractive features, SVDD has garnered significant attention and has been extensively developed.SVDD is regarded as an effective and excellent technique for anomaly detection problems; however, it remains sensitive to outliers and noise present in training datasets.In real-world scenarios, various issues, such as instrument failure, formatting errors, and unrepresentative sampling, result in datasets with anomalies, which degrade the performance of the SVDD [9,10].
The existing methods for mitigating the impact of noise are typically categorized as follows: In order to reduce the impact of outliers on the OCC method, researchers have attempted to remove outliers through data preprocessing methods.Stanley Fong used methods such as cluster analysis to remove anomalies from the training set to achieve a robust classifier [11].Breunig et al. tried to assign an outlier score to each sample in the dataset by estimating its local density, known as LOF [12].Zheng et al. used LOF to filter raw samples and remove outliers [13].Khan et al. [14] and Andreou and Karathanassi [15] calculated the interquartile range (IQR) of training samples, which provides a method for indicating a boundary beyond which samples are marked as outliers and removed.Clustering (k-means, DBSCAN) or LOF methods can significantly reduce noise points in data preprocessing, thereby improving the quality and effectiveness of subsequent model training.For datasets with obvious noise and significant distribution characteristics, preprocessing methods can be very effective in enhancing model performance.However, preprocessing methods have several issues: they require additional computational resources and time, especially with large datasets, potentially making the preprocessing step time-consuming.Moreover, there is a risk of overfitting and inadvertently deleting useful normal data points, impacting the model's ability to accurately detect anomalies.Additionally, these preprocessing methods are sensitive to parameter settings, necessitating careful selection to achieve satisfactory results.
Zhao et al. proposed the dynamic radius SVDD method that accounts for hyper-sphere radius information and the existing data distribution [16,17].This approach achieves a more flexible decision boundary by assigning different radii to different samples.Moreover, the framework for the dynamic radius SVDD method is based on the traditional SVDD framework.However, if the traditional SVDD has not undergone adequate training, the good performance of these dynamic approaches will be difficult to guarantee.
Density-weighted SVDD, position-weighted SVDD, Stahel-Donoho outlier-weighted SVDD, global plus local joint-weighted SVDD, and confidence-weighted SVDD are examples of weighted methods [18][19][20][21][22][23][24].These methods assign smaller weights to sparse data, which are commonly outliers, thus excluding them from the sphere.These methods balance the target class data and outliers in the training phase, thus enhancing the classification performance, especially when the data are contaminated by outliers.However, when the number of outliers in the dataset increases and they form sparse clusters, the number of outliers might surpass that of normal samples.In such cases, weighted methods assign higher weights to the outliers and lower weights to the normal samples, leading to decreased algorithm performance.
The convex property of the hinge loss function of the SVDD algorithm makes it sensitive to outliers.To address this issue, Xing et al. proposed a new robust least squares one-class support vector machine (OCSVM) that employs a bounded, non-convex entropic loss function, instead of the unbounded convex quadratic loss function used in traditional least squares OCSVM [25].The non-convex nature of the ramp loss function makes this model more robust than the traditional OCSVM [26].Tian et al. introduced the ramp loss function to the traditional OCSVM to create the Ramp-OCSVM model [27], and the non-convex nature of this model makes it more robust than the traditional OCSVM.Xing et al. enhanced the robustness of the OCSVM by introducing a re-scaled hinge loss function [28].Additionally, Zhong et al. proposed a new robust SVDD method, called pinball loss SVDD [29], to perform OCC tasks when the data are contaminated by outliers.The pinball loss function ensures minimal dispersion at the center of the sphere, thus creating a tighter decision boundary.Recently, Zheng introduced a mixed exponential loss function to the design of the SVDD model, enhancing its robustness and making its implementation easier [30].
Extensive research has shown that, as a result of unbounded convex loss functions being sensitive to anomalies, loss functions with boundedness or bounded influence functions are more robust to the influence of outliers.To address this issue, researchers introduced an upper limit to unbounded loss functions, effectively preventing them from increasing beyond a certain point.The truncated loss function thus makes the SVDD model more robust.The advantages of truncated loss functions include:

•
Robustness to noise: Truncated loss functions can enhance the model's robustness and stability by limiting the impact of outliers without removing data points.

•
Reduction of error propagation: In anomaly detection tasks, outliers may significantly contribute to the loss function, leading to error propagation and model instability.Truncated loss functions can effectively reduce error propagation caused by outliers, thereby improving overall model performance.

•
Generalization ability: Using truncated loss functions can prevent the model from overfitting to outliers, enhancing the model's generalization ability.Truncated loss functions are well-suited for various types of datasets and noise conditions, particularly when noise is not obvious or easily detectable.
However, robust SVDD algorithms still face considerable challenges in the research.They are designed to address specific types of losses and lack an appropriate framework for constructing robust loss functions.Thus, researchers are required to learn how to use different algorithms and modify loss functions before use.Since truncated loss functions are often non-differentiable, methods such as the difference of convex algorithm (DCA) [31] and concave-convex procedures (CCCPs) [32,33] are commonly employed to provide solutions.For some truncated loss functions, the DCA cannot ensure straightforward decompositions or the direct use of comprehensive convex toolboxes, potentially increasing development and maintenance costs [34].At present, no unified framework exists in the literature for the design of robust loss functions or a unified optimization algorithm.Therefore, even though this is challenging, providing a new bounded strategy for the SVDD model is crucial, with the potential for developing more efficient and universally applicable solutions.
In response to the several issues previously outlined, this study proposes a universal framework for the truncated loss functions of the SVDD model.To address and solve the non-differentiable, non-convex optimization problem introduced by the truncated loss function, we employ the fast ADMM algorithm.Our contributions to this field of study are as follows: • We define a universal truncated loss framework that smoothly and adaptively binds loss functions, while preserving their symmetry and sparsity.• To solve different truncated loss functions, we propose the use of a unified proximal operator algorithm.• We introduce a fast ADMM algorithm to handle any truncated loss function within a unified scheme.• We implement the proposed robust SVDD model for various datasets with different noise intensities.The experimental results for real datasets show that the proposed model exhibits superior resistance to outliers and noise compared to more traditional methods.
The remainder of this paper is organized as follows: Section 2: We review related support vector data description (SVDD) models, providing a foundational understanding of the existing methodologies and their limitations.
Section 3: We propose a general framework for truncated loss functions.Within this framework, we examine the representative loss functions' proximal operators and present a universal algorithm for solving these proximal operators.
Section 4: This section introduces the SVDD model that utilizes the truncated loss function, detailing its structure and theoretical framework.
Section 5: A new algorithm for solving the SVDD model with truncated loss functions is presented.This section also includes an analysis of the algorithm's convergence properties, ensuring that the method is both robust and reliable.
Section 6: Numerical experiments and parameter analysis are conducted to validate the effectiveness of the proposed model.This section provides empirical evidence of the model's performance across various datasets and noise scenarios.
Section 7: The conclusion summarizes the findings and contributions of the study, and discusses potential future research directions.

Related Works
SVDD has been widely applied in anomaly detection, and numerous learning algorithms based on SVDD have been proposed.In this section, we provide a brief overview of these algorithms.

SVDD
The goal of SVDD is to discover a hyper-sphere that encompasses target samples, while excluding non-target samples located outside it [8].The objective function of SVDD is represented by the following equation: min where µ and R represent the radius and center of the hyper-sphere, C is a regularization parameter, and ξ i is a slack variable.By using Lagrange multipliers and incorporating the constraints into the objective function, the dual problem of Equation (1) can be expressed as: where α i is the Lagrange multiplier, the optimization problem in Equation ( 2) is a standard quadratic programming problem, and α i can be obtained using quadratic programming algorithms.µ and R can be calculated using the following equation: where x s represents support vectors.The decision function for the test sample z is presented as follows: If f (z) ≤ 0, then sample z belongs to the target class; otherwise, z is considered a non-target class sample.

Robust SVDD Variants
Due to its sensitivity to outliers, the classification performance of SVDD significantly deteriorates when the data are contaminated.Thus, to enhance the robustness of SVDD, various improved SVDD methods have been proposed over the past decades.

Weighted SVDDs
One common approach is the use of weighted SVDD methods, where different slack variables are assigned different weights [18][19][20][21][22][23][24].Although the specific methods for weight distribution vary, they can generally be represented in a unified form: min where {w i } n i=1 represents pre-calculated weights.The density weighting method permits these weights to be calculated as follows [20]: where x k i denotes the k-nearest neighbor value of x i , and d x i , x k i denotes the Euclidean distance between x i and x k i .Due to outliers typically being located in relatively low-density areas, the distance to their neighboring samples is greater when compared to normal samples, which results in their smaller weights.The R-SVDD algorithm constructs weights by introducing a local density to each data point based on truncated distances [19].
where d ij denotes the distance between x i and x j , and d c is the truncation distance.The calculation of the weight function is as follows: Other methods for calculating can refer to [21][22][23][24].The dual problem of Equation ( 6) is represented by the following equation: From this equation, it can be observed that each Lagrange multiplier has an upper limit, denoted as α i ≤ w i C.

Pinball Loss SVDD
The pinball loss SVDD (Pin-SVDD) modifies the hinge loss SVDD by replacing its loss function with the pinball loss function to create an optimized problem formulation [29].This modification enables a more robust handling of outliers by adjusting the sensitivity toward deviations, depending on their direction relative to the decision boundary.min where 0 < τ ≤ 1 is a constant.With the use of the Lagrange multipliers method, the dual optimization problem in Equation ( 11) can be expressed as follows: where h = − (1−τ) 2 τnC+1 , g 1 = 1 − τ, and Zhong demonstrated that the use of a pinball to minimize scattering at the center of the sphere enhances the robustness of the developed model.

SVDD with Mixed Exponential Loss Function
Zheng used a mixed exponential loss function to design a classification model, and the optimization problem of the method is presented by the following equation [30]: where ρ(ξ i ) is a mixed exponential function of ξ i , expressed as follows: where τ 1 > 0 and τ 2 > 0 are two scale parameters, and 1 ≥ λ ≥ 0 is a mixture parameter used to balance the contributions of the two exponential functions.The mixed exponential loss function highlights the importance of samples involved in the target class while reducing the influence of samples that are outliers.This approach significantly enhances the robustness of the SVDD model.This loss function achieves more accurate and stable anomaly detection results in various settings by preserving the integrity of the target class and diminishing the effect of potential anomalies.

Truncated Loss Function
SVDD models with unbounded loss functions can achieve satisfactory results when addressing scenarios lacking noise.However, the continual growth of these loss functions results in the collapse of the model when it is subjected to noise.Therefore, truncating the SVDD model's loss function makes it more robust.The general definition of a truncated loss function is as follows: where δ is a constant, and ϕ(u) is an unbounded loss function, such that, when u ≤ 0, ϕ(u) = 0. Since ϕ(u) is an abstract function, a general form of the truncated loss function includes several loss functions.The three specific truncated loss functions we created in our study are present as follows: 1. Truncated generalized ramp loss function: Truncated binary cross entropy loss function: where ϕ f (u)= log 1 + u θ , u, θ > 0.
Assuming the truncation point u = δ ′ , the mathematical properties of the three truncated loss functions presented above can be summarized as follows: 1.
For samples with u ≤ 0, the loss value is 0; for samples with u > δ ′ , the loss value is δ.Thus, the general truncated loss function exhibits sparsity and robustness to outliers.2.
In the next section, we provide explicit expressions for the proximal operators of

Proximal Operators of Truncated Loss Functions
Definition 1 (Proximal Operator [35]).Assume f : R → R is a proper lower-semi-continuous loss function.The expression for the proximal operator of f (u) at x ∈ R is defined as follows: when f (u)is a convex loss function, it presents a single-value proximal operator; when f (u)is a non-convex loss function, it exhibits a multi-value proximal operator.
The expressions for the proximal operators are as follows: .
Lemma 2. The explicit expression of theL ϕ R (u, δ) proximal operator is as follows: 1.When 0 < λ < 2δv 2 , the explicit expression of the L ϕ R (u, δ) proximal operator is presented as follows:

the following conclusion can be reached by comparing the values of φ
, which means u * = u * 5 = x.According to (1.1)-(1.5),Equation ( 18) can be derived.When λ ≥ 2δv 2 , the following conclusion can be reached by comparing the values of 19) can be derived.□ Lemma 3. The expression representing the proximal operator of the truncation function is as follows: where u * 1 , u * 2 , u * 3 , and u * 4 represent the minimizers of the piecewise function, and 3 ), and φ 4 u * 4 , represent the minimal values of the piecewise function.
Proof of Lemma 3. It can be deduced from Equations ( 15) and ( 16) that prox λL represents the local minimum of the following piecewise function: , u * 4 = 0, and u * 5 = x, and their minimal values are , and We can determine the following conclusions by comparing the values of , and φ 4 u * 4 < φ 3 (u * 3 ) are met, it follows that u * = u * 4 = 0; (1.6)When the condition of x < 0 is met, it follows that u * = u * 5 = x.Thus, it is possible to successfully derive Equation (20).□

The Use of the Proximal Operator Algorithm to Solve Truncated Loss Functions
When ϕ(u) in the truncated loss function is a monotonic and non-piecewise function, and v(x) = arg minϕ(u) + 1 2λ (u − x) 2 can be expressed explicitly, the proximal operator of the truncated loss function can be calculated using Formula (20).In practical applications, however, it is sometimes impossible to obtain the explicit expression for v(x); for example, ϕ L (u) = exp(ayu) − ayu − 1 does not provide an explicit expression.The calculation of the proximal operator in such scenarios is discussed below.
For arg minϕ(u) + 1 2λ (u − x) 2 , if it is smooth and has a second derivative, the problem is a smooth unconstrained optimization problem.Newton's method is used to solve for the minimum of φ(u) in unconstrained optimization problems due to its high convergence rate.The gradient and Hessian matrix for problem (16) can be expressed as follows: The minimizer u * 3 and the minimal value φ 3 (u * 3 ) can be obtained with Newton's method for x.
If an explicit expression for v(x) cannot be achieved, the calculation of the proximal operator follows the same process as Lemma 3. Once the minimizers u * 3 and φ 3 (u * 3 ) are obtained, the proximal operator can be calculated.When the conditions of 0 3 ), and φ 3 (u * 3 ) < φ 4 u * 4 are met, we can derive u * = u * 3 .Therefore, Formula (20) can be modified to express the following: Based on the analysis presented above, the algorithm for solving the proximal operator of the truncated loss function is as following Algorithm 1: Algorithm 1: Algorithm for solving the proximal operator of the truncated loss function Input : x, δ, λ, ϕ(u) Output : Proximal operator u * 1 : Choose an initial point u (0) = x. 2 : While t ≤ MAX do 3 : Calculate g (t) according to Formula (21). 4 : If ∥g (t) ∥ < eps, then stop the loop, the approximate solution u * 3 = u (t+1) is obtained.

:
Calculate H (t) according to Formula ( 22). 6 : Calculate d, the Newton iteration direction, according to the Newton iteration equation Solve for the step size θ using the backtracking Armijo method, and update u (t) = u (t−1) + θd.

Robust SVDD Model
Formula (1), for the SVDD formula, can be rewritten as follows: min where [u] + = max(0, u) represents the hinge loss function.Since the hinge loss function is sensitive to outliers, it can be replaced with the truncated loss function from Formula (15).Thus, the objective function for obtaining the robust SVDD model is as follows: min As the truncated loss function is non-differentiable, solving the objective function of the robust SVDD model is a non-convex optimization problem, and it cannot be solved using standard SVDD model methods.
Theorem 1 (Nonparametric Representation Theorem [37]).Suppose we are designated a non-empty set χ; a positive definite real-valued kernel K: χ × χ → R ; a training sample (x i , y i )(i ∈ N m ) ∈ χ × R; a strictly monotonically increasing real-valued function g on [0, +∞]; an arbitrary cost function c: χ × R 2 m → R ∪ {∞} ; and a class of functions: Then, any f ∈ F minimizing the regularized risk function A set of vectors a exists in the nonparametric representation theorem, where the center is the optimal solution for problem (25).Therefore, Formula (25) can be transformed into the following: min Formula ( 28) represents the single-class SVDD model.To obtain data that include negative samples, these samples must be integrated into the SVDD model; then, the center is µ = n ∑ i=1 a i y i ϕ(x i ), and the objective function of the robust SVDD model is as follows: min when a y = a.* y, it follows that µ = n ∑ i=1 a y i ϕ(x i ).Problem (29) is rewritten as the following matrix form: min where This study discusses the use of the SVDD model as a solution for addressing data with negative samples, and the Lagrangian function for problem (30) is as follows: The KKT conditions for problem (30) are provided below: where R 2 * , a * y , u * represents any KKT point.The generalized non-smooth optimization problem can be represented by the following Formula [38]: min R,u f (s) + cl(q) q + g(s) = 0 (33) where f (•) and g(•) are continuously differentiable functions on R n → R , and l(q) represents a non-smooth function on R n → R .Problem (31) is considered as a form of the aforementioned generalized non-smooth optimization problem, where s = R 2 , a y , q = u, f (s) = s 1 , and g(s , representing the optimization model's constraints, are nonlinear equality constraints.In this study, the fast ADMM algorithm was employed to solve problem (33), and the algorithm will be introduced in a subsequent chapter.

Fast ADMM Algorithm
The previous section presented the optimization mathematical model of the robust SVDD model.Since the optimization mathematical model includes nonlinear equality constraints, the fast ADMM method was used to solve Formula (33).The augmented Lagrangian function of the robust SVDD model is as follows: where λ ∈ R m represents the vector of Lagrange multipliers, and β > 0 represents the penalty factor, g a y = K a − 2K b a y + a T y Ka y D.

Fast ADMM Algorithm Framework
Based on the augmented Lagrangian function previously presented, we obtained the following framework for the fast ADMM algorithm: 1.

Computing u k+1
The solution of u k+1 in Formula ( 35) is equivalent to the following problem: where Therefore, the calculation of u k+1 can be transformed into the computation of the proximal operator, which can be determined using Formula (20) or Algorithm 1.

2.
Computing a k+1 y When solving for variable a k+1 y , if a closed-form solution for this subproblem is not obtained, an optimization algorithm must be used for the iterative solution, which results in a slow computational speed.Thus, to solve this problem in a more efficient manner, we used a linearization technique.
Given the convex differentiable function φ, the Bregman distance between x and y,x, y ∈ dom(φ) is defined as follows: When , the Bregman distance ∆φ(x, y) is as follows: Formula ( 36) is equivalent to the following: With the use of Formula ( 42), we obtain where µ ≥ β∥∇g a k y ∥ 2 .

Computing R 2k+1
If φ(x) = 1 2 ∥x∥ 2 , based on Formula (40), we obtain Formula ( 37) is equivalent to Formula (45) represents a convex quadratic programming problem.By solving the following Equation (46), we also solve Formula (45): The value of R 2 k+1 is directly updated using the following formula:
Base on above analysis, the framework of our method can be summarized in Algorithm 2.
Algorithm 2: Fast ADMM Algorithm Input : K a , K b , D Output : R 2 k , a k y 1 : Take an initial point R 2 0 , a 0 y , u 0 , η 0 , and k MAX . 2 : While the stop condition is not satisfied and k ≤ k MAX perform the following 3 : Calculate a k y according to Formula (43). 4 : Calculate R 2 k according to Formula (47).5 : If the proximal operator of ϕ(u) has an explicit expression, use Formula (20) for the calculation u k 6 : If the proximal operator of ϕ(u) does not have an explicit expression, use Algorithm 1 for the calculation u k 7 : Calculate η k according to Formula (38).8 : Update k = k + 1. 9: End 10: Output the final solution R 2 k , a k y .

Global Convergence Analysis of the Fast ADMM Algorithm
In this section, we provide a convergence analysis of the fast ADMM algorithm.Specifically, the convergence of the algorithm is discussed using Lemma 7, and Lemma 7 is proven.According to Formulas ( 35)-( 38), it can be observed that the optimality conditions for each update iteration of the fast ADMM algorithm can be written as follows: where ∇g a y = 2Da y T K − 2K b T .
Lemma 4. Assume w k = R 2 k , a k y , u k , η k is a sequence generated using the fast ADMM algorithm; then, for any k ≥ 1, we obtain the following: where λ min DD T represents the strictly positive minimum eigenvalue of DD T .
Proof of Lemma 4. According to the optimality conditions of the iteration, the following equation is relevant: According to the Cauchy-Schwarz inequality, it follows that □ Lemma 5. Assume w k = R 2 k , a k y , u k , η k is a sequence generated using the fast ADMM algorithm; then, for any k ≥ 1, the following is true: Proof of Lemma 5. From the definition of the augmented Lagrangian function presented in Formula (34), it can be argued that Formula (48) shows the following: Therefore, By substituting Formula (53) into (51), the following is achieved: Thus, the proof is completed.□ Lemma 6. Assume w k = R 2 k , a k y , u k , η k is a sequence generated using the fast ADMM algorithm; then, for k ≥ 1, we obtain the following: Proof of Lemma 6.The definition of the augmented Lagrangian function in Formula (34) shows the following: According to the first-order condition for convex functions, and since g a y is a convex function, we obtain the following: The substitution of Formula (56) yields the following: Thus, the proof is complete.□ Lemma 7. Assume w k = R 2 k , a k y , u k , η k is a sequence generated using the fast ADMM algorithm; if µ ≥ β∥∇g a k y ∥ 2 and 1 λ min (DD T )β ≤ 1 2 , then for any k ≥ 1, we obtain the following: Proof of Lemma 7. The definition of the augmented Lagrangian function in Formula (36) shows the following: The combination of Lemmas 5 and 6 produces the following: When the conditions of µ ≥ β∥∇g a k y ∥ 2 and 1 λ min (DD T )β ≤ 1 2 are met, according to Lemma 7, L β R 2 , a y , u, η monotonically decreases, thus causing the fast ADMM algorithm to converge.□

Fast ADMM Algorithm Termination Conditions
Based on Formula (35), it can be observed that the u k+1 that minimizes L β R 2 k , a k y , u, η k can be derived as follows: where β g a k+1 y Thus, we obtain the following equation: Equation ( 60) represents the residual value of the dual feasibility condition.Hence, S k+1 1 represents the dual residual of iteration.
Equations ( 61)-( 63) represent the primal residuals of iteration.The KKT conditions of problem (32) consist of four components, corresponding to the primal and dual residuals.These two types of residuals gradually converge to zero with the use of the fast ADMM algorithm.
Both the primal and dual residuals must be sufficiently small, meeting the conditions presented below, for the termination of the use of the fast ADMM algorithm: max where ε tol > 0. Typically, ε tol is selected so that 10 −4 ≤ ε tol ≤ 10 −3 ; in this study, ε tol = 10 −3 .This ensures that the algorithm terminates when the solution's accuracy is within an acceptable range.

Experiment
In this section, we describe extensive experiments conducted on various datasets to validate the effectiveness and robustness of the three truncated loss function SVDD algorithms proposed in this study, which are presented below.The experimental datasets consisted of synthetic and several UCI datasets.We also compared hinge loss SVDD [8], DW-SVDD [20], R-SVDD [19], and GL-SVDD [16] algorithms to validate their performance.
ϕ L -SVDD: SVDD algorithm with a truncated linear exponential loss function.

Experimental Setup
This subsection introduces the evaluation metrics, kernels, and parameter settings used in the experiments.

Evaluation Metrics
We used G-mean and F 1 -score as evaluation metrics to assess the performance of the different methods described in this study [30,[39][40][41].They are based on TP, FN, FP, and TN, where TP denotes the number of target data predicted as target data, FN denotes the number of target data predicted as outlier data, FP denotes the number of outlier data predicted as target data, and TN denotes the number of outlier data predicted as outlier data.
It can be seen from ( 65) and ( 66) that G-mean can provide a good balance between recall and specificity, and F 1 -score provides a good balance between precision and recall.Hence, both of them can measure the performance of the proposed method, comprehensively.

Kernels
The Gaussian kernel function has the best mapping ability and is the most practical method, commonly used to handle the classification problems of nonlinear data, and we used it when conducting our experiments.The definition of the Gaussian kernel function is as follows:

Synthetic Datasets with Noise
In this study, we constructed two synthetic datasets: circular and banana-shaped, to test the robustness of the three proposed truncated loss function support vector data description (SVDD) algorithms.To verify the robustness of the algorithms in handling different types of noise, we introduced two types of noise:

•
Neighboring noise: Noise points are randomly distributed near the normal samples but do not completely overlap with the normal samples, forming a more discrete distribution characteristic.This noise simulates the common boundary ambiguity in practical applications.

•
Regional noise: Noise points are randomly distributed within a specified area, forming sparse clusters.This noise simulates possible local anomalies in practical applications.
The description of the synthetic dataset is given below: 1. Circular Dataset

•
Normal samples: This consists of 300 two-dimensional sample points distributed within a concentric circle, forming a normal distribution pattern.

•
Noise samples: This consists of 40 noise points, divided into two types: 20 noise points randomly distributed near the normal sample points, showing a discrete distribution characteristic, and another 20 noise points randomly distributed within a specified area, forming sparse clusters.

•
Illustration: Figure 1a shows the circular dataset, where blue points represent normal samples, and red points represent noise.For circular outliers, the classification results of SVDD, DW-SVDD, DBSCAN,  R - SVDD,  f -SVDD, and  L -SVDD algorithms for this dataset are presented in Figure 2. We can observe that the SVDD algorithm accurately identifies seven noise points as outliers, while misclassifying thirty-three noise points as normal values.Therefore, the performance of SVDD is severely affected by noise.Unlike SVDD, DW-SVDD accurately identified 22 noise points as outliers, while misclassifying 18 clustered noise points as normal values.DW-SVDD is based on a weighted idea where outliers are commonly sparse data; hence, a smaller weight is designated to these sparse data, which are then excluded from the sphere.The weighted method struggles to exclude clustered noise since 20 clustered noise points are present in the circular outliers.DBSCAN accurately identified the 20 sparsely distributed noise points but incorrectly classified the 20 clustered noise points as normal values.Therefore, DBSCAN cannot accurately identify clustered noise.The  R -SVDD,  f -SVDD, and  L -SVDD algorithms successfully identified all of the noise points, and their classification boundaries encompassed all of the target samples.
The classification results of the SVDD, DW-SVDD, DBSCAN,  R -SVDD,  f -SVDD, and  L -SVDD algorithms for the banana-shaped dataset are presented in Figure 3.It can be observed that SVDD only identifies two noise points, while misclassifying the remaining noise points as normal values.DW-SVDD accurately differentiated nine noise points distributed around the banana data, while misclassifying the remaining noise

•
Normal samples: This consists of 300 two-dimensional sample points distributed along curved lines, resembling a banana shape.

•
Noise samples: This consists of 40 noise points, divided into two types: 10 noise points randomly distributed near the banana-shaped normal sample points, and another 30 noise points randomly distributed within a specified area, forming sparse clusters.

•
Illustration: Figure 1b shows the banana-shaped dataset, where blue points represent normal samples, and red points represent noise samples.
By choosing these noise distributions and quantities, we aimed to simulate different real-world noise scenarios and test the robustness of the proposed algorithms under varying noise conditions.
For circular outliers, the classification results of SVDD, DW-SVDD, DBSCAN, ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD algorithms for this dataset are presented in Figure 2. We can observe that the SVDD algorithm accurately identifies seven noise points as outliers, while misclassifying thirty-three noise points as normal values.Therefore, the performance of SVDD is severely affected by noise.Unlike SVDD, DW-SVDD accurately identified 22 noise points as outliers, while misclassifying 18 clustered noise points as normal values.DW-SVDD is based on a weighted idea where outliers are commonly sparse data; hence, a smaller weight is designated to these sparse data, which are then excluded from the sphere.The weighted method struggles to exclude clustered noise since 20 clustered noise points are present in the circular outliers.DBSCAN accurately identified the 20 sparsely distributed noise points but incorrectly classified the 20 clustered noise points as normal values.Therefore, DBSCAN cannot accurately identify clustered noise.The ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD algorithms successfully identified all of the noise points, and their classification boundaries encompassed all of the target samples.The classification results of the SVDD, DW-SVDD, DBSCAN, ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD algorithms for the banana-shaped dataset are presented in Figure 3.It can be observed that SVDD only identifies two noise points, while misclassifying the remaining noise points as normal values.DW-SVDD accurately differentiated nine noise points distributed around the banana data, while misclassifying the remaining noise points as normal values.ϕ R -SVDD accurately identified 35 noise points as outliers, while misclassifying the remaining noise points as normal values, and erroneously classifying 16 target values as outliers.ϕ f -SVDD accurately identified 36 noise points as outliers, while misclassifying the remaining noise points as normal values, and erroneously classifying 14 target values as outliers.ϕ L -SVDD accurately identified 38 noise points as outliers, while misclassifying the remaining noise points as normal values, and erroneously classifying 13 target values as outliers.
(e) (f) From the experiments on the two synthetic datasets, it can be seen that the DBSCAN algorithm is very effective in handling obvious noise, but it cannot accurately identify noise when the noise is less obvious or close to normal data.DW-SVDD and DBSCAN From the experiments on the two synthetic datasets, it can be seen that the DBSCAN algorithm is very effective in handling obvious noise, but it cannot accurately identify noise when the noise is less obvious or close to normal data.DW-SVDD and DBSCAN methods cannot accurately identify sparse clustered noise, whereas ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD can effectively identify clustered noise in the dataset.Figures 1 and 2 show that the classification boundaries of ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD are tighter and smoother than SVDD and DW-SVDD methods.Therefore, it can be determined that the three truncated loss function SVDD algorithms proposed in this study are more robust to noise in both synthetic datasets.

UCI Datasets with the Presence of Noise
To further validate the effectiveness of the proposed method, we used eight standard datasets obtained from the UCI machine learning repository to conduct our experiments [43].For each standard dataset, one class of samples was used as normal data, and the remaining classes were used as outlier data.Moreover, to eliminate the impact of data scale, each feature of every dataset was normalized to [0, 1].Grid parameter optimization was used to determine the optimal parameters in all of the experiments, and the average value of ten repetitions of each experiment was the final result.For each dataset, parts of the target and non-target data were randomly selected for the training set, and the remaining target and non-target data were selected for the test set.

Training Dataset without Non-Target Data
For each dataset, 70% of the normal data were randomly selected for the training set, and noise-contaminated data were added to the training set.Noise data were created by changing the labels of non-target data from −1 to 1, with added noise data proportions of 10%, 20%, 30%, 40%, and 50% non-target data, and the test set consisted of the remaining target and non-target data.
Table 1 presents the G-mean averages for different methods across 10 trials, with the best results presented in bold.In 40 experimental datasets, ϕ R -SVDD and ϕ f -SVDD achieved higher G-means on 35 datasets compared to benchmark methods, while ϕ L -SVDD achieved higher G-means on 34 datasets.As the proportion of noise-contaminated data increases, i.e., more data are contaminated by noise, the classification accuracy of all models generally declines.It can be observed that, as the proportion of noise-contaminated data increases, the ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD algorithms show less of a decline in accuracy compared to the other algorithms.Taking the Iris dataset as an example, when the proportion of noise-contaminated data r increases from 0.1 to 0.5, the G-means of ϕ R -SVDD,ϕ f -SVDD, and ϕ L -SVDD decline by 13.02%, 13.02%, and 11.5%, while those of the DW-SVDD, R-SVDD, and GLE-SVDD decline by 19.55%, 16.6%, and 17.7%, respectively.
Table 2 shows the F 1 -score averages of different methods over 10 trials.In the 40 experimental datasets, ϕ R -SVDD and ϕ f -SVDD achieved higher F 1 -scores than the compared methods in 32 datasets, while ϕ L -SVDD had higher F 1 -scores in 31 datasets.ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD demonstrate higher classification accuracy in most test datasets compared to current noise-resistant SVDD models (i.e., DW-SVDD, R-SVDD, and GL-SVDD), indicating that the use of a truncated loss function makes SVDD less sensitive to noise.For each dataset, the training set was randomly selected to consist of 70% target and 20% non-target data, with added proportions of 10%, 20%, 30%, 40%, and 50% non-target data as noise data, and the test set included the remaining target and non-target data.
Table 3 presents the G-means from 10 trials for different methods, with the best results presented in bold.In 40 experimental datasets, ϕ R -SVDD and ϕ f -SVDD achieved higher G-means on 35 datasets compared to the benchmark methods, while ϕ L -SVDD achieved a higher G-mean on 33 datasets.As the data contaminated by noise increase, the ϕ R -SVDD, ϕ f -SVDD, and ϕ L -SVDD algorithms exhibit less of a decline in accuracy than the other algorithms.ϕ R -SVDD,ϕ f -SVDD, and ϕ L -SVDD present almost no decline in G-means for datasets such as Blood, Iris, and Haberman.Table 4 shows the F 1 -score averages of different methods over 10 trials.In the 40 experimental datasets, ϕ R -SVDD achieved higher F 1 -scores than the compared methods in 35 datasets, ϕ f -SVDD performed better in 32 datasets, and ϕ L -SVDD in 31 datasets.From Tables 1-4, it is evident that -SVDD, -SVDD, and -SVDD achieve better experimental performance than the other four methods, demonstrating superior noise resistance due to the use of the truncated loss function.

Conclusions
This study aimed to enhance the robustness and effectiveness of the SVDD algorithm.We propose a general framework for the truncated loss function of this algorithm, which uses bounded loss functions to mitigate the impact of outliers.Due to the nondifferentiability of the truncated loss function, we employed the fast ADMM algorithm to solve the SVDD model with the truncated loss function, which handles truncated loss functions within a unified framework.In this context, the truncated generalized ramp, truncated binary cross entropy, and truncated linear exponential loss functions for the SVDD algorithm were constructed, and extensive experiments show that these three SVDD models exhibit more robustness than other SVDD models in most cases.However, this method still has the following shortcomings.Firstly, introducing the truncated loss function increases the complexity of model training, as some truncated loss functions cannot directly provide explicit expressions for neighboring point operators, requiring additional computational overhead.These factors may limit the application of the method to large-scale datasets.To overcome these limitations, future work can consider using a distributed computing framework to accelerate the training process of the ADMM algorithm.Secondly, the truncated loss function SVDD introduces new free parameters, which increases the time required for grid search parameter selection.When the data scale is large, the computation time for the grid search method may become unacceptable.To address this drawback, algorithms such as Bayesian optimization can be considered in the future to find the optimal parameters, further improving model performance and optimization efficiency.Finally, for extremely noisy data, the truncated loss function may not completely eliminate its impact, and the effect is limited.In this case, methods combining clustering algorithms such as DBSCAN can be adopted.First, clustering algorithms like DBSCAN can be used to preprocess the data and remove noise, and then the proposed method can be used to detect anomalies.

Figure 1 .
Figure 1.Two synthetic datasets, where blue samples are normal and the samples marked with a red cross represent noise.(a) Circular dataset.(b) Banana-shaped dataset.

Figure 1 .
Figure 1.Two synthetic datasets, where blue samples are normal and the samples marked with a red cross represent noise.(a) Circular dataset.(b) Banana-shaped dataset.

Table 1 .
G-mean values for different methods for UCI datasets with noise (not containing non-target data).

Table 2 .
F 1 -score values of different methods for UCI datasets with noise (not containing non-target data).

Table 3 .
G-mean values of different methods on UCI datasets with noise (containing non-target data).

Table 4 .
F 1 -score values of different methods for UCI datasets with noise (containing non-target data).