Next Article in Journal
Sharp Coefficient Bounds for a Class of Analytic Functions Related to Exponential Function
Next Article in Special Issue
Multi-Aspect Sentiment Analysis of Arabic Café Reviews Using Machine and Deep Learning Approaches
Previous Article in Journal
KAN-Former: 4D Trajectory Prediction for UAVs Based on Cross-Dimensional Attention and KAN Decomposition
Previous Article in Special Issue
Comparative Analysis of Machine Learning Models for Predicting Student Success in Online Programming Courses: A Study Based on LMS Data and External Factors
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control

Department of Industrial Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, Saudi Arabia
Mathematics 2025, 13(23), 3876; https://doi.org/10.3390/math13233876
Submission received: 17 October 2025 / Revised: 19 November 2025 / Accepted: 2 December 2025 / Published: 3 December 2025
(This article belongs to the Special Issue Data Mining and Machine Learning with Applications, 2nd Edition)

Abstract

This paper addresses semi-supervised anomaly detection in settings where only a small subset of normal data can be labeled. Such conditions arise, for example, in industrial quality control of windshield wiper noise, where expert labeling is costly and limited. Our objective is to learn a one-class decision boundary that leverages the geometry of unlabeled data while remaining robust to contamination and scarcity of labeled normals. We propose a graph-attention-regularized deep support vector data description (GAR-DSVDD) model that combines a deep one-class enclosure with a latent k-nearest-neighbor graph whose edges are weighted by similarity- and score-aware attention. The resulting loss integrates (i) a distance-based enclosure on labeled normals, (ii) a graph smoothness term on squared distances over the attention-weighted graph, and (iii) a center-pull regularizer on unlabeled samples to avoid over-smoothing and boundary drift. Experiments on a controlled simulated dataset and an industrial windshield wiper acoustics dataset show that GAR-DSVDD consistently improves the F1 score under scarce label conditions. On average, F1 increases from 0.78 to 0.84 on the simulated benchmark and from 0.63 to 0.86 on the industrial case study relative to the best competing baseline.

1. Introduction

Anomaly detection is a critical problem in machine learning, underpinning applications such as process fault detection in industrial manufacturing [1], cardiac anomaly identification from ambulatory ECG signals [2], cyber-intrusion and malicious object control in networked systems [3], and early fire detection in complex, non-fire environments using advanced SVDD-based monitoring [4]. The primary objective is to learn a representation of normal operating conditions and identify deviations as anomalies. Unlike standard supervised classification, anomaly detection is characterized by severe class imbalance, where anomalies are rare, diverse, and often scarcely labeled, while normal samples are abundant. However, despite this abundance, labeling a sample as truly normal is often costly and challenging. In many operational settings, normal labeling requires expert review against process specifications, additional quality-control checks (sometimes destructive or time-consuming), or prolonged observation to rule out hidden faults. These steps consume engineering time and test resources and can cause production downtime, making exhaustive normal labeling economically impractical at scale. For instance, in automotive quality management, the current practice for windshield wiper noise relies on expert assessment. After the windshield wiper operation sound is recorded and mel frequency cepstral coefficients (MFCCs) and spectrogram features are computed, an expert manually inspects these representations and labels the noise types [5]. This practice is costly and time-consuming and would benefit from a model that can learn to identify wiper faults from a few labeled recordings.
Early efforts in one-class classification introduced support vector data description (SVDD) and the one-class support vector machine (OC-SVM) as core frameworks. SVDD learns a minimum-radius hypersphere that encloses the normal training data, predicting observations outside the decision boundary as anomalies [6]. The OC-SVM learns a maximum-margin separator from the origin in feature space to capture the support of the normal distribution [7]. While these models produce compact, interpretable decision boundaries, they generally require a clean set of normal samples and do not utilize unlabeled data, thereby limiting their effectiveness when normal labeling is costly and unlabeled observations are abundant.
Graph-based semi-supervised learning offers a systematic approach to utilize unlabeled data by representing adjacency and a manifold structure. The basic concepts of spectral graphs and Laplacian regularization define the cluster assumption through the concept of smoothness across neighborhoods [8,9]. Recent studies and applications in graph anomaly detection indicate extensive utilization across nodes, edges, subgraphs, and whole graphs [10,11,12]. SVDD has been extended using semi-supervised methods that utilize the structure of unlabeled data while employing a small number of labeled-normal observations to guide the boundary. In order to leverage large unlabeled data without hard pseudo-labels, a common practice is to enhance the SVDD objective with manifold/graph regularization so that points connected on a similarity graph remain close in the learned representation [8,9]. One SVDD variant, graph-based semi-supervised SVDD (S3SVDD), leverages a global/local geometric structure with limited labels by utilizing a k -nearest neighbor ( k -NN) spectral graph and a Laplacian smoothness term in the SVDD objective to exploit unlabeled data [13]. Separately, a manifold-regularized SVDD for noisy label detection was introduced, showing that incorporating a graph Laplacian into the SVDD objective improves robustness to label noise while still leveraging abundant unlabeled data [14]. In addition, a semi-supervised convolutional neural network (CNN) with SVDD has demonstrated practical viability in industrial monitoring [15]. While traditional graph-based SVDD variants have shown potential, these methods typically rely on fixed, pseudo-labeled neighbor graphs (e.g., k -NN with a chosen metric and k ) and uniform edge weights, which can over-connect across data density breaks, propagate contamination from suspect (anomalous) neighbors, and require careful tuning [8,13,16,17,18].
The introduction of deep learning has led to significant advances in deep anomaly detection. Deep SVDD (DeepSVDD) extended the SVDD framework by combining it with a deep encoder that maps data into a latent representation enclosed by a hypersphere [19]. Autoencoders and their variants have been widely adopted to learn low-dimensional embeddings and reconstruction-based anomaly scores in industrial and time-series contexts [20,21]. Additionally, self-supervised anomaly detection has emerged as a promising direction by utilizing contrastive learning and related pretext objectives that enhance representation quality under limited labels [22,23]. However, common deep anomaly detection methods often use objectives that are indirectly aligned with the decision boundary (e.g., reconstruction error), leverage unlabeled data using proxy tasks rather than task-aware constraints in the decision boundary learner, and offer limited interpretability of decisions [24,25,26,27,28,29].
Despite this progress, important limitations remain across these lines of work. As discussed above, classical one-class and SVDD-based approaches assume access to a clean, well-labeled normal set and do not exploit the structure of abundant unlabeled data. Graph-based SVDD variants partially address this by introducing manifold regularization over similarity graphs, but they rely on fixed, uniformly weighted neighborhoods that are sensitive to density breaks and contamination from suspect neighbors [8,13,16,17,18]. Deep anomaly detection methods offer flexible representations, yet they typically optimize proxy objectives such as reconstruction error and use unlabeled data through indirect pretext tasks, while providing limited interpretability of individual decisions [24,25,26,27,28,29]. Consequently, there is a need for a semi-supervised anomaly detection framework that (i) directly integrates unlabeled data into the one-class objective, (ii) uses a flexible, attention-weighted graph to suppress contaminated neighbors and respect density breaks, and (iii) preserves instance-level interpretability, all within a unified deep SVDD formulation.
Motivated by the aforementioned limitations, this paper presents graph-attention-regularized deep SVDD (GAR-DSVDD). Specifically, we develop a semi-supervised, graph-regularized deep SVDD variant that learns a deep encoder and center-based one-class model end-to-end, exploiting an unlabeled structure via an attention-weighted k -NN graph and an unlabeled center-pull term. To build this graph, we introduce an attention-weighted latent k -NN mechanism that assigns similarity- and score-aware importance to neighbors, emphasizing prototypical normal observations and down-weighting suspect or anomalous samples, thereby mitigating label scarcity and limiting error propagation from contaminated edges. Unlike iterative pseudo-labeling methods, our training uses unlabeled data directly through the center-pull and graph smoothness regularizers, which stabilize optimization and reduce computation. For safety-critical deployment, per-instance attention weights provide transparent explanations over the top- k neighbors. In addition, to the best of our knowledge, this is the first implementation of the GAR-DSVDD in a semi-supervised automotive quality-control setting, detecting reversal-noise anomalies in windshield wiper acoustics. Comprehensive experiments on a simulated dataset and an industrial windshield wiper acoustics case study in a semi-supervised setting show that GAR-DSVDD achieves superior detection of anomalies compared with classical and deep learning baseline methods.
The remainder of this paper is structured as follows. Section 2 reviews preliminaries and notation for semi-supervised anomaly detection. Section 3 presents the proposed GAR-DSVDD method. Section 4 describes the experimental setup and reports results on simulated datasets and the windshield wiper acoustics case study, along with sensitivity analyses. Section 5 discusses findings, conclusions, and potential future work directions.

2. Preliminaries

2.1. Data and Basic Notation

Given a training dataset X = x i x i R p , i = 1 , , n that decomposes into two disjoint subsets: a small set of labeled-normal samples X l + and a large unlabeled set X u , with
X = X l + X u ,   X l + X u = .
Let L 1 , , n be the index set of labeled-normal observations and U = { 1 , , n } L the unlabeled indices, with L = l , U = u , and label rate ρ = l / n . The unlabeled pool is assumed predominantly normal and may contain a small contamination of anomalies, upper-bounded by ε 0,1 . Features are standardized by coordinates (z-scoring). A learnable encoder ϕ θ : R p R s maps each input to a latent representation, where θ denotes the encoder’s trainable parameters; we write z i = ϕ θ x i and collect latents as Z = z i i = 1 n . We stack them row-wise in Z = z 1 , , z n R n × s . We use · 2 for the Euclidean norm and , for the standard inner product in R s . This setting follows the deep one-class paradigm: the normal class is to be enclosed within a compact latent region while avoiding pseudo-labels for the unlabeled pool. Instead, X u informs the latent geometry via a graph defined over z i (details in Section 3).
Prior work motivating this formulation includes Deep SVDD-style one-class modeling and graph-based semi-supervised learning that leverages unlabeled geometry [8,9,19].

2.2. Latent One-Class Score

We summarize the normal set by a latent center c R s and a soft margin (squared radius) used for score calibration
m = softplus η ,   η R .
Given latents Z = z i i = 1 n with z i = ϕ θ x i , the per-sample anomaly score is
f i =   z i c   2 2 m .
A sample is more anomalous as f i increases; the decision boundary is f = 0 . Equation (1) gives the deep “hypersphere” view of SVDD in latent space, where c summarizes the normal set and ϕ θ maps inputs to latents. The offset m > 0 specifies the boundary tolerance (squared radius) via a softplus parameterization, ensuring positivity and smooth gradients near the boundary. The center c is typically initialized as the mean of z i : x i X l + to stabilize early training. The score is scale-aware: increasing m relaxes the boundary, whereas decreasing m tightens the enclosure, preserving the boundary-level interpretation (distance to c versus tolerance m ). During training, m is treated as a calibration constant, so choosing a train-quantile threshold on f is equivalent to thresholding the same quantile of d 2 =   z     c   2 2 .

3. Graph-Attention-Regularized Deep SVDD

We now detail GAR-DSVDD. The objective has three parts: (i) a one-class enclosure on labeled-normal observations, (ii) an unlabeled neighbor smoothness on squared distances over an attention-weighted, directed, row-normalized k -NN graph, and (iii) standard parameter regularization.

3.1. Latent Attention-Weighted k -NN Graph

We construct a directed row-normalized graph G = V , E over the latent set z i i = 1 n , where each latent is produced by the encoder z i = ϕ θ x i , and we stack them row-wise as Z = z 1 , , z n R n × s . For each node i , define the directed neighborhood N k i as the indices of the k -NN of z i under a base metric (Euclidean by default). Each candidate edge ( i , j ) carries a base affinity κ base z i , z j 0 ,   1 , chosen as either a Gaussian kernel or a constant:
κ base z i , z j = e x p   z i z j   2 2 / σ 2   or   κ base z i , z j   1 .
To emphasize true normal observations and suppress suspect connections (possible anomalies), we compute score-aware attention on each edge. With H heads, the per-head logits are
e i j h = q h z i v h z j s a t t γ max 0 , f i + max 0 , f j , h = 1 , , H ,
where q h , v h : R s R s a t t are linear projections (shared across pairs for head h ), s a t t is the attention width, γ 0 is a contamination-reducing coefficient, and f i = z i c 2 2 m is the current one-class score (Equation (1)). We normalize within N k i using a temperature τ a -scaled softmax:
α i j h = exp e i j h / τ a u N k i exp e i u h / τ a , j N k i , τ a > 0
and aggregate heads by averaging
a ~ i j = 1 H h = 1 H α i j h .
We integrate attention with the base affinity using μ [ 0,1 ] and then row-normalize the outgoing weights:
w ~ i j = ( 1 μ )   κ base z i , z j +   μ a ~ i j (   j N k i ) , w ^ i j = w ~ i j u N k i w ~ i u .
Edges outside N k i have weight 0 ( w ~ i j = 0 if j N k i ). Row normalization preserves scale across heterogeneous densities (outgoing weights sum to 1) and reduces error propagation by down-weighting high-score (potentially anomalous) edges during training. We compute attention weights without backpropagation and keep them fixed between graph refreshes. Prior work motivating the attention graphs to interpret unlabeled data geometry includes [8,9,30].
To keep training tractable, we do not materialize a full graph at every step. Instead, we rebuild a latent-space k -NN index periodically and, for each mini-batch, form edges as the union of (i) cached neighbors of the batch points and (ii) within-batch k -NN edges—preserving the local geometry while keeping computation practical.

3.2. GAR-DSVDD Loss Components

3.2.1. Unlabeled Geometry via Neighbor Smoothness on Squared Distances

Let
d i 2 =   z i c   2 2 , d = d 1 2 , , d n 2 .
Unlabeled samples help create a smooth decision boundary by discouraging sharp variations of d i 2 along high-weight edges of the attention-weighted graph (described in Section 3.1):
L g r a p h = 1 n i = 1 n j N k i w ^ i j d i 2 d j 2 2 .
Penalizing differences of d i 2 (rather than feature vectors) aligns the boundary with high-density regions without collapsing representations and preserves the simple one-class test rule on f . The derivatives of L g r a p h are shown in Appendix A.
During training, we rebuild the graph periodically and hold w ^ i j fixed within each interval, so we do not backpropagate through the weights in this term.

3.2.2. Labeled-Normal Enclosure

Labeled-normal observations X l + anchor the hypersphere in latent space through a distance-based objective:
L e n c = 1 X l + x i X l + z i c 2 2 + β c 2 2 ,
where z i = ϕ θ x i , c   R s is a latent center summarizing the normal set, and β 0 weakly centers the hypersphere. This formulation directly penalizes the squared distance of labeled-normal observations from c , yielding stable gradients and a simple coupling to the encoder. The gradients of L e n c are shown in Appendix A.

3.2.3. Unlabeled Center Pull

To stabilize training when the labeled-normal set X l + is small, we add a label-free pull of the unlabeled latents toward the center c . This acts on raw squared distances d i 2 =   z i     c   2 2 and complements the neighbor smoothness. The graph term equalizes the differences of d i 2 along edges and is largely insensitive to the global level of d i 2 ; with few labeled observations, this can cause scale drift and a poorly calibrated score f = d 2 m . Moreover, early graphs may include contaminated edges (unlabeled anomalies can appear inside or near the hypersphere), so smoothing alone can propagate their influence. The center pull softly anchors the mean of d 2 for the normal unlabeled pool, reducing drift and reducing contamination while the graph term aligns local variations.
We define
L c p = 1 X u x i X u z i c 2 2 .
This imposes a global constraint by shrinking the mean of d i 2 over x i X u toward c . Unlike L g r a p h —which is local and relative— L c p fixes the overall scale and prevents drift when labels are scarce, improving calibration of f = d 2 m . The gradients of L c p are shown in Appendix A.

3.3. GAR-DSVDD Overall Objective Function

The total loss combines unlabeled geometry, labeled-normal enclosure, and unlabeled center pull:
  min   θ , c L θ , c = λ u L g r a p h unlabeled   geometry + L e n c labeled   normals + λ c p L c p unlabeled   center - pull , λ u , λ c p 0
with L g r a p h , L e n c , and L c p as in Equations (3)–(5), respectively, and the scalars λ u , λ c p control the strengths of the graph regularizer and unlabeled center pull.
Together, L e n c , L g r a p h , and L c p play complementary roles. L e n c anchors the center c using trusted labeled normals, preventing center drift. L g r a p h enforces local consistency of squared distances on the attention-weighted, row-normalized k -NN graph, transferring the structure from the unlabeled pool without collapsing features. Lastly, L c p imposes a global moment constraint on unlabeled data, fixing the overall scale of d 2 and improving the calibration of the score f = d 2 m .
We train θ , c jointly using the AdamW optimizer. The attention-weighted k -NN graph is recomputed every T epochs and held fixed within those intervals. No gradients flow through the attention computation or through m . At deployment, a testing observation x t e s t can be labeled as an anomaly if its f t e s t score exceeds a train-quantile threshold τ t h r ; otherwise, it will be considered as normal.
In summary, GAR-DSVDD combines a deep one-class boundary with graph-based semi-supervision in a way that is both robust and label-efficient. By regularizing squared distance (scores)—not features—over an attention-weighted latent k -NN graph, it aligns the decision boundary with the data manifold without collapsing representations and preserves the simple test-time rule f x > τ thr . Score-aware attention selectively down-weights suspicious neighbors, curbing over-smoothing from contaminated edges and density breaks. In addition to this, local, edge-wise smoothing of d 2 , an unlabeled center pull, imposes a global moment constraint that stabilizes the overall scale of the distance field, improving score calibration when labeled-normal observations are scarce. Because unlabeled data enter only through these geometric regularizers (no pseudo-labels), a small labeled set is enough to anchor the hypersphere while the unlabeled pool shapes the boundary. Thus, the proposed method effectively utilizes information from labeled and unlabeled observations for enhanced anomaly detection in semi-supervised settings. Algorithm 1 summarizes the GAR-DSVDD training and inference procedures, and Figure 1 illustrates the framework for GAR-DSVDD.
From a computational standpoint, the only additional overhead relative to DeepSVDD is the periodic reconstruction of the latent k -NN graph, which we perform once every T epochs. Between refreshes, the cached neighbor sets and edge weights are reused. As such, the per-iteration cost is essentially the same as DeepSVDD plus the graph-smoothness loss. Memory usage is modest since we store only k neighbors per point. In terms of stability, the   k -NN graph and its attention weights are kept fixed within each refresh interval, and in our experiments on both datasets, we did not observe training instabilities or divergence.
Algorithm 1: GAR-DSVDD (training)
Inputs: X l + , X u ; ϕ θ ; graph params k , σ , μ , H , τ a , γ ; weights λ u , λ c p , β ; refresh period T ; total epochs T epochs . ( τ a is the attention softmax temperature)
Outputs: θ , c , decision threshold   τ t h r . ( τ t h r is the decision threshold)
  • Initiation :   Compute   z i = ϕ θ x i for x i X l + ; set c 1 X l + x i X l + z i .
  • Build   graph   ( t = 0 ): Compute latents for all data; form k -NN; compute multi-head attentions α i j h with temperature τ a ; average to a ~ i j ; integrate via μ ; row-normalize to obtain w ^ i j .
  • For   t = 1 , , T epochs :
    (a)
    Forward   on   mini - batch   B : z i = ϕ θ x i , d i 2 =   z i     c   2 2 , and for attention/monitoring f i = z i c 2 2 m .
    (b)
    Loss :   L = λ u L g r a p h + L e n c + λ c p L c p .
    (c)
    Backpropagation   +   AdamW   updates   for   ( θ , c ).
    (d)
    Graph   refresh :   if   t   m o d   T = 0 : recompute latents; rebuild k -NN, attentions and row-normalize w ^ i j .
Threshold selection: τ t h r Quantile   o f   f i : x i X l + or via validation.
Inference (testing)
Given x n e w :
  • Compute   latent   z n e w = ϕ θ x n e w .
  • Score   f ( x n e w ) = z n e w c 2 2 m .
  • Predict   anomaly   if   f x n e w > τ t h r otherwise normal.

3.4. GAR-DSVDD Hyperparameter Tuning and Selection

The proposed GAR-DSVDD method contains several hyperparameters associated with the encoder architecture, the latent graph, and the regularization terms. The graph structure is controlled by the neighborhood size k , the Gaussian kernel width σ , the integration factor μ , the number of attention heads H , the attention temperature τ a , and the contamination-reducing coefficient γ . The learning objective includes the graph smoothness weight λ u and the center-pull weight λ c p . The choice of latent dimensionality and network depth governs the encoder capacity. The hyperparameters are selected over a validation set, where one can perform a grid search over logarithmic ranges for the scalar coefficients (e.g., λ u , λ c p ) and a discrete set of values can be used for structural parameters such as k , H , and the latent dimension.
Among these, λ u and λ c p play the most critical role in the GAR-DSVDD loss function. Increasing λ u strengthens the effect of the graph smoothness term, allowing information from the latent k -NN graph to propagate more strongly and typically leading to a broader decision boundary, whereas decreasing λ u limits the impact of the graph regularizer and yields a narrower and more concentrated boundary around labeled normals. The parameter λ c p scales the center-pull term on unlabeled samples, where larger values of λ c p give unlabeled points greater influence on the learned center and radius, encouraging the model to expand the hypersphere to cover more of the unlabeled distribution. In contrast, smaller values of λ c p keep the decision boundary more tightly governed by the labeled normals and graph term.
Moreover, the neighborhood size k , the attention temperature τ a , the contamination-reducing coefficient γ , and the integration factor μ play a crucial role in the graph structure. The neighborhood size k in the k -NN graph controls the density and spatial extent of the latent graph. Increasing k produces a denser graph that propagates smoothness over larger neighborhoods and tends to expand the effective decision boundary, whereas decreasing k focuses interactions on more local neighborhoods and results in a narrower, more localized boundary. The attention temperature τ a governs the sharpness of the attention distribution, where smaller τ a yields more peaked weights over a few neighbors, leading to sharper but potentially less stable boundaries, and larger τ a produces smoother, more diffuse weights that stabilize convergence but weaken the effect of attention reweighting. The coefficient γ controls how strongly high anomaly scores penalize edges in the attention logits. Larger γ values reduce the attention on edges connected to high-scoring (suspect) nodes, weakening their connections in the latent graph and limiting how much contamination can propagate through the smoothness term. However, very large γ can also remove the useful structure by overly suppressing these edges. In contrast, when γ is small, the damping is weak, and attention is dominated by similarity, so the graph behaves closer to a standard latent k -NN kernel and the influence of contaminated regions can propagate more easily. Finally, the integration factor μ [ 0,1 ] controls the blend between the base k -NN affinity and the attention-reweighted graph. When μ is closer to 0, the graph is similar to a standard kernel k -NN structure, while increasing μ to be close to 1 makes the boundary predominantly shaped by attention weights.
In practice, we find that moderate values of λ u , λ c p , k , τ a , γ , and μ provide stable convergence and well-calibrated boundaries, whereas performance degrades only when these hyperparameters are pushed to extreme values.

4. Experiments

In order to evaluate the performance of the proposed method, we apply it to a semi-supervised setting over a simulated dataset to visualize behavior and an industrial case study in the domain of industrial automotive quality management, where windshield wiper reversal noise is detected.

4.1. Experimental Setup

Experiments are conducted on a simulated two-dimensional dataset and a real industrial windshield wiper acoustics dataset to compare the proposed GAR-DSVDD against established methods, including DeepSVDD, OCSVM, classical SVDD, and S3SVDD. All deep models share the same encoder capacity and optimization schedule for fairness. Threshold calibration follows each method’s policy: for GAR-DSVDD and DeepSVDD, we use a train-quantile rule at a fixed level over labeled-normal scores, whereas OCSVM, SVDD, and S3SVDD use their native decision functions. Across both experiments, training follows a semi-supervised setting: a small subset of labeled-normal samples and a large unlabeled pool that may include a small contamination of anomalies (bounded by ε ). We construct four disjoint partitions: labeled-normal observations, unlabeled, validation, and a held-out test set.
Performance is evaluated on the held-out test split using four measures derived from the confusion counts—true positives (TPs), false positives (FPs), true negatives (TNs), and false negatives (FNs).
Overall accuracy (fraction of correctly classified instances):
Accuracy = TP + TN TP + TN + FP + FN .
Detection rate (also called recall or true positive rate), measuring how many anomalies are correctly detected:
Detection   Rate = TP TP + FN .
The F1 score summarizes the trade-off between precision and detection rate under class imbalance. Define precision as
Precision = TP TP + FP ,
then
F 1 = 2 Precision × Detection   Rate Precision + Detection   Rate = 2 TP 2 TP + FP + FN .
Balanced accuracy, averaging sensitivity (detection rate) and specificity (true negative rate). First, define specificity
Specificity = TN TN + FP ,
Then,
Balanced   Accuracy = 1 2 Detection   Rate + Specificity .
For each dataset/seed, we test the directional alternative H 1 : GAR-DSVDD > baseline on the F1 score using paired t -tests and Wilcoxon signed-rank tests across seeds and report their p-values.
All experiments were run on a Windows workstation with a 13th Gen Intel Core i9-13900K CPU (3.00 GHz), 64 GB RAM, and an NVIDIA RTX 4090 GPU.

4.2. Simulated Data

In the simulated study, we use a two-dimensional dataset consisting of a normal cluster and an anomalous cluster. We generate n norm = 1000 normal observations and n anom = 500 anomalous observations using a Gaussian mixture. The normal samples are generated from a cluster centered at ( 0,0 ) and anomalous samples from a cluster centered at ( 3,0 ) , both with standard deviation 0.8 in each coordinate. Class labels are assigned as y = 0 for normals and y = 1 for anomalies. The data are then split class-wise into 60% for training, 20% for validation, and 20% for testing. Within training normals, only a small fraction ρ = 0.03 is treated as labeled-normal observations, and the remaining normal observations are considered combined with the training subset of anomaly observations to form an unlabeled pool with a contamination level ε = 0.20 . This construction provides a controlled, low-dimensional setting with scarce labeled normals and a contaminated unlabeled pool, allowing us to visualize and compare the decision boundaries learned by GAR-DSVDD and the baselines. Figure 2 shows the training dataset used for the experiment, where labeled and unlabeled observations are used to train GAR-DSVDD and S3SVDD, while the other methods use labeled observations only. We perform 10 experiments with different random seeds for data generation and splitting.
All deep methods share the same encoder ϕ θ for fairness: a two-layer MLP with hidden widths 64 64 mapping x z = ϕ θ x R 16 with ReLU activations. Our method constructs a latent k -NN graph G on { z i } with k = 9 and attention weights with 8 heads, per-head projection width s a t t   =   32 , temperature τ a = 1 , score-damping coefficient γ = 1 , blend factor μ = 1 . The attention-weighted graph is recomputed every T = 5 epochs and held fixed between refreshes; no gradients flow through the attention computation. We optimize the objective in Equation (6) using AdamW for 120 epochs with refresh rate T = 5 and learning rate 5 × 10 4 , graph regularizer strength λ u = 1300 , and unlabeled center-pull strength λ c p = 0.9 . Lastly, we calibrate m by setting it to the same train-quantile used for threshold selection over labeled normals.
As for the remaining methods, DeepSVDD is trained on labeled normals, representing a supervised deep one-class baseline, using the same applicable parameters of the proposed method, OSCVM uses ν = 0.1 , SVDD uses radial basis function (RBF) kernel parameter γ = 1 , and S3SVDD uses RBF kernel parameter γ = 1 with k = 9. The deep methods use a train-quantile rule at q = 0.95 , and the native decision function is used for the rest.
The performance results of the proposed methods compared to baseline methods over all testing performance metrics for an experiment are shown in Table 1. All methods achieve very high detection rates (0.99–1.00); however, they differ greatly in correctly identifying normal observations. DeepSVDD, OCSVM, SVDD, and S3SVDD all reach near-perfect anomaly detection rates, yet their specificity remains low (0.11–0.53), which leads to reduced accuracy (0.56–0.76) and F1 scores (0.69–0.81). In contrast, GAR-DSVDD attains both a high detection rate (0.99) and a much higher specificity (0.84), resulting in the best overall accuracy (0.92), F1 (0.92), and balanced accuracy (0.92) among all methods. This indicates that GAR-DSVDD detects almost all anomalies while correctly identifying most normal samples, avoiding the narrow one-class boundaries produced by the baselines.
Table 2 examines robustness across 10 simulated experiments with different random seeds. GAR-DSVDD consistently achieves the highest F1 scores, with a mean of 0.84 and standard deviation 0.04, whereas the best competing baseline (OCSVM) reaches a mean F1 of 0.78 with standard deviation 0.02, and the remaining methods obtain mean F1 scores between 0.70 and 0.74. The Wilcoxon signed-rank and paired t-tests on F1 (bottom rows of Table 2) yield p-values lower than 1.3 × 10 3 for all pairwise comparisons with GAR-DSVDD, indicating that the observed improvements are statistically significant across different data realizations.
These quantitative patterns are consistent with the geometric behavior and design of the methods. DeepSVDD trains only on labeled-normal observations and learns a tight hypersphere in latent space. Thus, many normal observations in low-density regions fall outside the boundary, increasing the false-positive rate and lowering accuracy and F1 despite a high detection rate. OCSVM lacks representation learning and geometry-aware regularization and therefore constructs a narrow boundary directly in the input space where detection remains high, but specificity and overall accuracy suffer. Similarly, classical SVDD does not utilize unlabeled-geometry guidance and builds its decision boundary solely from labeled normals, thus yielding a tight hypersphere that produces many false positives. Graph-based SVDD variants such as S3SVDD introduce manifold regularization on a fixed k -NN graph, which can improve robustness, but the use of uniformly weighted edges on a static graph makes them sensitive to density breaks and contamination from suspect neighbors; hence, their specificity and balanced accuracy remain limited.
Meanwhile, GAR-DSVDD addresses these limitations using mechanisms grounded in prior work on graph-regularized learning and deep one-class anomaly detection. The score-smoothness term on the latent k -NN graph follows the manifold-regularization principle [8,9,11], which encourages samples connected on the data manifold to receive similar anomaly scores and has been shown to improve generalization when unlabeled data are available. The attention mechanism on graph edges is inspired by graph-attention networks [30] and allows the model to down-weight suspect or cross-cluster neighbors, reducing the propagation of contamination from anomalous points and keeping the boundary aligned with density valleys. At the same time, the one-class enclosure term extends DeepSVDD-style hypersphere training [19] with explicit use of the unlabeled structure, and the center-pull term on unlabeled samples stabilizes the decision region under label scarcity by letting the abundant unlabeled data influence the center and radius in a controlled way. Together, these components enable GAR-DSVDD to achieve near-perfect detection while substantially improving specificity, accuracy, and F1 relative to classical SVDD, S3SVDD, OCSVM, and DeepSVDD, as evidenced by the results in Table 1 and Table 2.
The obtained anomaly score and its quantiles are shown in Figure 3. In the figure, the 95th quantile contour for the anomaly score is drawn to indicate the decision boundary for the deep methods, where the native decision boundary is shown for the shallow methods. Note that DeepSVDD and GAR-DSVDD use a single hypersphere in latent space as their decision region based on the learned quantile. In the figure, after the encoder’s nonlinear mapping back to input space, this can look complex or even disconnected, despite being one sphere in the learned latent space.
In our experiment implementation over the simulated dataset, the average latent k -NN graph reconstruction (including attention computation and row normalization) took 0.075 s per refresh on average, the total training time per run was 5.66 s, and the full test-time inference pass over the held-out test set took 0.0004 s. These timings confirm that the periodic graph refresh is computationally affordable and adds a minimal overhead over the GAR-DSVDD training time.

4.3. Case Study: Windshield Wiper Acoustics

This section presents our case study on windshield wiper acoustics. The windshield wiper sound recordings dataset is gathered under tightly controlled laboratory conditions. Each sample was captured from production wiper assemblies inside an anechoic chamber while a sprinkler system supplied water to replicate rainfall. To eliminate confounding noise sources, the vehicle’s engine remained off, and the wiper motor was powered by an external supply. A fixed in-cabin microphone recorded the acoustic pressure signal, which was calibrated and expressed in decibels at a constant sampling rate. Domain specialists reviewed the recordings and their MFCC features to identify reversal-noise fault phenomena and assign ground-truth labels for normal and faulty operations. The dataset consists of a total of 120 windshield wiper recordings, where 61 recordings are for normal operations and 59 for faulty (anomalous) operations. These specialists’ annotations are treated as ground truth for evaluation.
Reversal noise arises when the wiper changes direction at the end of its sweep. Prior studies on windshield wiper noise report that reversal events behave primarily as impact-type noises with dominant energy below about 500 Hz [31]. However, they also exhibit friction-induced noise components extending into the 500–3500 Hz band, where human hearing is highly sensitive [32]. Therefore, reliable detection of these noises directly affects vehicle manufacturing quality. Even under controlled anechoic conditions, these reversal events show substantial variability in timing, amplitude, and spectral balance across cycles and across wipers, driven by factors such as blade wear, glass curvature, contact pressure, wiping speed, and water-film thickness. As a result, faulty reversal noise can partially overlap with the spectral content of normal wiping sounds, making it a challenging anomaly detection problem.
Each observation is a single full recording summarized as a fixed-length vector from recording descriptors, in line with recent acoustic research for sound analysis. We compute 32 MFCCs to capture the spectral envelope on a perceptual scale, together with first- and second-order deltas to encode short-term dynamics; these cepstral families remain standard and effective in contemporary studies. We also include a 12-bin chroma profile to reflect a tonal/resonant structure and simple spectral/temporal statistics—spectral centroid, spectral bandwidth, spectral roll-off at 95%, root-mean-square (RMS) energy, and zero-crossing rate (ZCR)—which are routinely used and compared in modern journal work. For every multi-frame descriptor with r coefficients, we pool across time by concatenating the per-coefficient mean and standard deviation, so each such descriptor contributes 2 r values; scalar descriptors contribute two values (mean and standard deviation). Summing up all parts yields a 226 -dimensional vector per recording of means and standard deviations across time that we use for the reversal-noise dataset.
These feature choices align with recent studies documenting their efficacy and definitions across environmental-sound, speech-emotion, smart-audio, and urban-acoustic applications [33,34,35]. Moreover, since MFCC-based descriptors are already standard in conventional windshield wiper noise evaluation [5], using an MFCC-based fixed-length representation keeps the GAR-DSVDD windshield reversal-noise detector consistent with established practice and allows performance differences to be attributed primarily to the anomaly detection framework rather than to a specialized feature extractor. Figure 4 shows normal and anomalous operation recordings of windshield wipers with their corresponding MFCC features on the log scale. Table 3 summarizes the feature extraction methods used for acoustic analysis of the windshield wiper sound recordings.
We follow the same semi-supervised training setting as in the simulated study, where the training set includes a small subset of labeled-normal recordings and a large unlabeled pool with mild anomaly contamination. The remainder of the dataset is split into a validation and a held-out test set. Precisely, we divide the data into 60% for training, 20% for validation, and 20% for testing. Within the training portion, we label 10% of the normal observations ( ρ = 0.10 ) and place the rest into the unlabeled set with an anomaly contamination ratio ε = 0.10 .
All deep methods use the shared encoder ϕ θ : a two-layer MLP 64 64 16 with ReLU activations. For our method, GAR-DSVDD, we build a latent k -NN graph with k = 3 and multi-head attention with 8 heads, per-head projection width s a t t   =   32 , temperature τ a = 1 , score-damping coefficient γ = 1 , and blend factor μ = 1 . The attention-weighted graph is recomputed every T = 5 epochs and held fixed between refreshes; no gradients flow through the attention computation. The objective in Equation (6) for 120 epochs using AdamW (learning rate 5 × 10 4 ), graph regularizer strength λ u = 8 , and unlabeled center-pull strength λ c p = 1000 . m is calibrated by setting it to the same train-quantile used for threshold selection over labeled normals.
Baseline methods are configured to match their best settings from the wiper runs: DeepSVDD is trained on labeled-normal observations, representing a supervised deep one-class baseline, with the same encoder/optimizer, soft-boundary objective regularizer value = 1000 ; OCSVM uses ν = 0.1 with an RBF kernel γ = 0.01 ; SVDD uses an RBF kernel with γ = 1.0 ; and S3SVDD employs a k - NN graph with k = 3 and RBF γ = 0.01 . The deep methods apply a train-quantile rule with q = 0.95 , while OCSVM, SVDD, and S3SVDD use their native decision function.
The performance of GAR-DSVDD and the existing methods across all test metrics on the industrial windshield wiper dataset is presented in Table 4. All methods achieve a perfect detection rate (1.0), but the baseline methods completely fail to reject normal recordings, where DeepSVDD, OCSVM, SVDD, and S3SVDD all obtain specificity equal to 0, which yields a balanced accuracy of 0.50 and an overall accuracy of only 0.42, with identical F1 scores of 0.59. In other words, these baseline models classify almost all test recordings as anomalous, reflecting the extreme difficulty of the task under semi-supervised conditions with very few labeled-normal observations. In contrast, GAR-DSVDD maintains perfect detection while substantially improving specificity (0.86), which leads to the highest accuracy (0.92), F1 score (0.91), and balanced accuracy (0.93) among all methods. Thus, on this challenging industrial case study, GAR-DSVDD is the only method that simultaneously detects all faults and correctly retains most normal recordings as non-anomalous.
Table 5 further analyzes robustness over 10 different train/validation/test splits generated with different random seeds. GAR-DSVDD achieves a mean F1 score of 0.86 with a standard deviation of 0.10, whereas the baseline methods cluster around a mean F1 of 0.62–0.63 with a standard deviation of 0.11. The Wilcoxon signed-rank and paired t-tests on F1 yield p-values below 9.8 × 10 4 for all pairwise comparisons with GAR-DSVDD, indicating that the improvement is statistically significant and consistent across different realizations of the semi-supervised splits.
These findings mirror the behavior observed in the simulated study and can be explained by the underlying geometry and learning mechanisms of the models. The baseline methods operate with very limited labeled-normal data and do not effectively exploit the structure of the unlabeled recordings when constructing their decision boundaries. As a result, they tend to learn very narrow one-class regions that treat most test recordings as anomalous, leading to perfect detection but extremely poor specificity and accuracy.
In contrast, GAR-DSVDD leverages an attention-weighted latent k -NN graph over labeled-normal and unlabeled samples to regularize anomaly scores along the data manifold, in line with graph- and manifold-regularized learning principles [8,9,11]. The attention mechanism allows the model to weaken edges in mixed or uncertain neighborhoods and emphasize reliable neighbors, reducing the propagation of anomalous influence through the graph, and keeping the decision boundary aligned with low-density regions, similar in spirit to graph-attention networks [30]. In parallel, the one-class enclosure term extends DeepSVDD-style hypersphere [19] training to use an unlabeled structure, while the center-pull regularizer on unlabeled observations stabilizes the decision region under label scarcity. Together, these components enable GAR-DSVDD to achieve perfect detection with higher specificity, accuracy, and F1 than classical SVDD, S3SVDD, OCSVM, and DeepSVDD on the windshield wiper dataset.
Furthermore, the neighbor attention weights serve as per-instance attributions, indicating which historical normal recordings most strongly influence an observation’s score. In an industrial setting, such attributions can streamline expert review and reduce the volume of fully verified normal labels required over time, aligning GAR-DSVDD with label-efficiency needs in automotive quality control.
Lastly, for the computational time in our case study experiment, the average latent k -NN graph reconstruction, including attention computation and row normalization, required 0.008 s per refresh on average with a total training time of 3.11 s per run. A full test-time inference pass over the held-out test set took 0.0004 s. These times indicate that, in the industrial applications as well, the graph-refresh mechanism adds only a minor training overhead.

4.4. Sensitivity Analysis

In this section, we perform sensitivity analysis on the simulated data. In Table 6 and Figure 5, we show the effect of increasing the normal-labeled observation ratio ρ on the F1 performance of the methods. Across labeled-normal ratios ρ     { 0.10 ,   0.20 ,   0.50 } , the results show a clear label-efficiency advantage for our semi-supervised GAR-DSVDD in the low-label setting, with supervised baselines gradually converging as ρ increases. With scarce labels, GAR-DSVDD leads, consistent with its attention-weighted graph regularization that leverages the unlabeled structure while mitigating contamination. As the label budget grows to a moderate level, GAR-DSVDD remains best, but the gap narrows, particularly with DeepSVDD, which benefits directly from richer labeled evidence. OCSVM improves steadily with more labels yet trails the deep methods, while classical SVDD consistently lags despite incremental gains. S3SVDD exhibits non-monotonic behavior, sensitive to label density and split composition, but becomes more competitive with an increased labeling ratio. In summary, when only a small portion of data can be labeled, our target operating point, GAR-DSVDD, outperforms all alternatives; as labeling increases, purely supervised approaches like DeepSVDD catch up and have similar performance, whereas OCSVM and SVDD continue to benefit but remain behind, and S3SVDD’s competitiveness depends on the labeling setting.
Moreover, by setting ρ   =   0.03 , we analyze the effect of λ u on F1 and the accuracy of GAR-DSVDD, as illustrated in Figure 6. For small λ u   ( 1 500 ) , less importance is given to unlabeled geometry; thus, the unlabeled information is underutilized, leading to a narrow decision boundary and underperforming results. Increasing λ u gives more importance to the graph-loss term (unlabeled geometry), which expands the decision boundary with consistent gains in performance. The performance is highest near λ u   1300, where the boundary best balances the influence of labeled data information, unlabeled geometry, and the central-pull term on unlabeled observations. At higher λ u values, excessive emphasis on unlabeled geometry over-smooths the graph structure, widening the decision boundary and allowing anomalous unlabeled observations to be included within the decision boundary, thereby degrading performance. These findings indicate that λ u should be carefully selected via validation to allow an optimal construction of the decision boundary that keeps anomalous unlabeled observations outside and expands enough to keep normal behavior observations inside.
Taken together with the robustness results over multiple seeds reported in Table 2 and Table 5, these sensitivity curves indicate that GAR-DSVDD maintains its performance advantage across random splits, label budgets, and a broad range of λ u values, suggesting that the method is robust to both data variability and reasonable hyperparameter choices rather than relying on a narrowly tuned configuration.

5. Discussion

Across both the simulated experiment and the industrial windshield wiper case study, the results indicate a consistent pattern where GAR-DSVDD preserves the high detection rates of SVDD-based and deep one-class baselines while substantially improving the trade-off between detection and false alarms under semi-supervised conditions with scarce labeled normals and a contaminated unlabeled pool. The baseline methods either learn tight or overly narrow one-class boundaries that reject many normal samples, or fail to exploit the unlabeled structure, which leads to low specificity and reduced overall utility in practice. By contrast, GAR-DSVDD achieves a more favorable balance between detecting rare faults and retaining normal observations across different data realizations and experimental seeds, and, together with the sensitivity analysis over label budgets and graph-regularization strengths, this indicates that the method is robust to data variability and does not rely on finely tuned hyperparameters.
These empirical findings are in line with the architectural choices underlying GAR-DSVDD and with prior work on graph-regularized and deep one-class anomaly detection. Classical SVDD and OCSVM operate solely on labeled normals and ignore the geometry of unlabeled data, while DeepSVDD introduces representation learning but still does not use the unlabeled structure when shaping the decision region. Graph-based SVDD variants add manifold regularization on a fixed similarity graph, but uniformly weighted edges on a static k -NN graph remain vulnerable to density breaks and contamination from anomalous neighbors. GAR-DSVDD combines a DeepSVDD-style one-class enclosure with a graph smoothness term on a latent k -NN graph and an attention mechanism that reweights edges by neighborhood reliability. This attention-weighted graph encourages anomaly scores to follow the data manifold while respecting density valleys and down-weighting suspect neighbors, and the center-pull term on unlabeled samples stabilizes the decision region when labeled normals are scarce. Together, these components provide a plausible explanation for the systematic gains observed over SVDD, S3SVDD, OCSVM, and DeepSVDD in both simulated and real-world data.
From an application standpoint, the windshield wiper case study illustrates that GAR-DSVDD is well-suited to industrial quality-control scenarios where only a small portion of normal operation can be exhaustively verified and most data are available as unlabeled production records. In such settings, methods that label the majority of samples as anomalous are difficult to deploy, as they generate excessive false alarms and impose a high validation burden on domain experts. GAR-DSVDD’s ability to leverage a contaminated unlabeled pool, maintain high detection, and substantially reduce false positives directly addresses these constraints. Moreover, the neighbor attention weights provide per-instance attributions that highlight which historical normal recordings most strongly influence a given score, offering a degree of interpretability that can support expert diagnosis and prioritization.

6. Conclusions

Identifying anomalies is a critical problem in wide application areas, as they provide information about deviations from normal behavior in critical environments. Among available approaches, SVDD-based procedures remain attractive for their compact decision boundaries and test-time simplicity.
In this paper, we presented GAR-DSVDD for semi-supervised anomaly detection. The method encloses labeled-normal observations with a deep one-class boundary while propagating score-level smoothness over an attention-weighted latent k -NN graph to leverage abundant unlabeled data. The attention mechanism reduces over-smoothing from questionable neighbors and respects local density breaks, improving specificity without sacrificing detection. On a two-cluster simulated dataset, GAR-DSVDD achieved the highest accuracy, F1 score, and balanced accuracy among all methods (e.g., mean F1 of 0.84 versus 0.78 for the best baseline over 10 runs), with paired Wilcoxon and t-tests indicating statistically significant gains. On the industrial windshield wiper acoustics dataset, GAR-DSVDD maintained perfect detection while increasing specificity from 0.00 (for all baselines) to 0.86 and raising mean F1 from approximately 0.62–0.63 for competing methods to 0.86 across multiple splits. Qualitatively, these results reflect a more robust decision boundary that follows the data manifold, aligns with density valleys, and avoids the narrow one-class regions that cause classical SVDD, S3SVDD, OCSVM, and DeepSVDD to misclassify observations as anomalous. In addition, the neighbor attention weights provide per-instance attributions that highlight which historical normal observations most strongly influence a given score, offering useful diagnostic insight in industrial settings where label efficiency and interpretability are important.
Considering future work, many industrial datasets contain multiple normal classes (e.g., distinct operating conditions or product variants). Our current formulation targets a single normal class; extending it to multi-normal settings is a natural next step. One direction is to learn several hyperspheres using a shared encoder, where each hypersphere retains its corresponding normal class while discouraging overlapping with others; the graph attention can be made class-aware to limit cross-class leakage. Finally, online updates with periodic graph refresh and automatic selection of graph/attention hyperparameters would enhance robustness in evolving production environments.

Funding

The project was funded by KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, Saudi Arabia.

Data Availability Statement

The industrial windshield wiper acoustic data presented in this study are available on reasonable requests from the corresponding author due to confidentiality. The code for the simulated dataset can be found at https://github.com/alhinditaha/GAR-DSVDD (accessed on 16 October 2025).

Acknowledgments

The project was funded by KAU Endowment (WAQF) at King Abdulaziz University, Jeddah, Saudi Arabia. The authors, therefore, acknowledge with thanks WAQF and the Deanship of Scientific Research (DSR) for technical and financial support.

Conflicts of Interest

The author declares no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AdamWAdaptive Moment Estimation with (decoupled) Weight decay
DeepSVDDDeep Support Vector Data Description
GAR-DSVDDGraph-Attention-Regularized Deep Support Vector Data Description
k -NNk-Nearest Neighbors
MFCCMel-Frequency Cepstral Coefficients
OCSVMOne-Class Support Vector Machine
ReLURectified Linear Unit
S3SVDDGraph-based Semi-Supervised Support Vector Data Description
SVDDSupport Vector Data Description

Appendix A. Gradients of the GAR-DSVDD Objective

Here, we derive the gradients of the three loss components in GAR-DSVDD defined in Equations (3)–(5). The gradients of L g r a p h , defined in Equation (3), are
L g r a p h d i 2 = 2 n j N k i w ^ i j d i 2 d j 2 + u :   i N k u w ^ u i d i 2 d u 2 ,
and by the chain rule
d i 2 z i = 2 z i c , d i 2 c = 2 z i c , z i θ = θ ϕ θ x i ,
so
L g r a p h z i = 4 n z i c j N k i w ^ i j d i 2 d j 2 + u :   i N k u w ^ u i d i 2 d u 2 , L g r a p h c = 4 n i = 1 n z i c j N k i w ^ i j d i 2 d j 2 + u :   i N k u w ^ u i d i 2 d u 2 ,
and
L g r a p h θ = i = 1 n θ ϕ θ x i T L g r a p h z i .
The gradients of L e n c , defined in Equation (4), are
L e n c z i = 2 X l + z i c f o r   x i X l + , L e n c z i = 0   o t h e r w i s e , L e n c c = 2 X l + x i X l + z i c + 2 β c .
Since z i = ϕ θ x i , by the chain rule
L e n c θ = 2 X l + x i X l + θ ϕ θ x i T z i c .
The gradients of L c p , defined in Equation (5), are
L c p z i = 2 X u z i c f o r x i X u , L c p z i = 0   o t h e r w i s e , L c p c = 2 X u x i X u z i c .
Since z i = ϕ θ x i , by the chain rule
L c p θ = 2 X u x i X u θ ϕ θ x i T z i c .
Finally, combining these components, the gradients of the full GAR-DSVDD objective
L θ , c = λ u L g r a p h + L e n c + λ c p L c p
used in Equation (6) and Algorithm 1 are
L θ = λ u L g r a p h θ + L e n c θ + λ c p L c p θ , L c = λ u L g r a p h c + L e n c c + λ c p L c p θ .

References

  1. Cai, L.; Yin, H.; Lin, J.; Zhou, H.; Zhao, D. A relevant variable selection and SVDD-based fault detection method for process monitoring. IEEE Trans. Autom. Sci. Eng. 2022, 20, 2855–2865. [Google Scholar] [CrossRef]
  2. Li, H.; Boulanger, P. A survey of heart anomaly detection using ambulatory Electrocardiogram (ECG). Sensors 2020, 20, 1461. [Google Scholar] [CrossRef]
  3. Sakong, W.; Kim, W. An adaptive policy-based anomaly object control system for enhanced cybersecurity. IEEE Access 2024, 12, 55281–55291. [Google Scholar] [CrossRef]
  4. Alhindi, T.J.; Alturkistani, O.; Baek, J.; Jeong, M.K. Multi-class support vector data description with dynamic time warping kernel for monitoring fires in diverse non-fire environments. IEEE Sens. J. 2025, 25, 21958–21970. [Google Scholar] [CrossRef]
  5. Alhindi, T.J.; Baek, J.; Jeong, Y.-S.; Jeong, M.K. Orthogonal binary singular value decomposition method for automated windshield wiper fault detection. Int. J. Prod. Res. 2024, 62, 3383–3397. [Google Scholar] [CrossRef]
  6. Tax, D.M.; Duin, R.P. Support vector data description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef]
  7. Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the support of a high-dimensional distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef]
  8. Zhu, X.; Ghahramani, Z.; Lafferty, J.D. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 912–919. [Google Scholar]
  9. Belkin, M.; Niyogi, P.; Sindhwani, V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res. 2006, 7, 2399–2494. [Google Scholar]
  10. Luo, X.; Wu, J.; Yang, J.; Xue, S.; Peng, H.; Zhou, C.; Chen, H.; Li, Z.; Sheng, Q.Z. Deep graph level anomaly detection with contrastive learning. Sci. Rep. 2022, 12, 19867. [Google Scholar] [CrossRef]
  11. Ma, X.; Wu, J.; Xue, S.; Yang, J.; Zhou, C.; Sheng, Q.Z.; Xiong, H.; Akoglu, L. A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans. Knowl. Data Eng. 2021, 35, 12012–12038. [Google Scholar] [CrossRef]
  12. Qiao, H.; Tong, H.; An, B.; King, I.; Aggarwal, C.; Pang, G. Deep graph anomaly detection: A survey and new perspectives. IEEE Trans. Knowl. Data Eng. 2025, 37, 5106–5126. [Google Scholar] [CrossRef]
  13. Duong, P.; Nguyen, V.; Dinh, M.; Le, T.; Tran, D.; Ma, W. Graph-based semi-supervised support vector data description for novelty detection. In Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN), Killarney, Ireland, 12–17 July 2015; pp. 1–6. [Google Scholar] [CrossRef]
  14. Wu, X.; Liu, S.; Bai, Y. The manifold regularized SVDD for noisy label detection. Inf. Sci. 2023, 619, 235–248. [Google Scholar] [CrossRef]
  15. Peng, D.; Liu, C.; Desmet, W.; Gryllias, K. Semi-supervised CNN-based SVDD anomaly detection for condition monitoring of wind turbines. In Proceedings of the International Conference on Offshore Mechanics and Arctic Engineering, Boston, MA, USA, 7–8 December 2022; p. V001T001A019. [Google Scholar] [CrossRef]
  16. Von Luxburg, U. A tutorial on spectral clustering. Stat. Comput. 2007, 17, 395–416. [Google Scholar] [CrossRef]
  17. Zelnik-Manor, L.; Perona, P. Self-tuning spectral clustering. Adv. Neural Inf. Process. Syst. 2004, 17, 1601–1608. [Google Scholar]
  18. Song, Y.; Zhang, J.; Zhang, C. A survey of large-scale graph-based semi-supervised classification algorithms. Int. J. Cogn. Comput. Eng. 2022, 3, 188–198. [Google Scholar] [CrossRef]
  19. Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep one-class classification. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4393–4402. [Google Scholar]
  20. Pota, M.; De Pietro, G.; Esposito, M. Real-time anomaly detection on time series of industrial furnaces: A comparison of autoencoder architectures. Eng. Appl. Artif. Intell. 2023, 124, 106597. [Google Scholar] [CrossRef]
  21. Neloy, A.A.; Turgeon, M. A comprehensive study of auto-encoders for anomaly detection: Efficiency and trade-offs. Mach. Learn. Appl. 2024, 17, 100572. [Google Scholar] [CrossRef]
  22. Tack, J.; Mo, S.; Jeong, J.; Shin, J. Csi: Novelty detection via contrastive learning on distributionally shifted instances. Adv. Neural Inf. Process. Syst. 2020, 33, 11839–11852. [Google Scholar]
  23. Hojjati, H.; Ho, T.K.K.; Armanfard, N. Self-supervised anomaly detection in computer vision and beyond: A survey and outlook. Neural Netw. 2024, 172, 106106. [Google Scholar] [CrossRef]
  24. Chalapathy, R.; Chawla, S. Deep learning for anomaly detection: A survey. arXiv 2019, arXiv:1901.03407. [Google Scholar] [CrossRef]
  25. Pang, G.; Shen, C.; Cao, L.; Hengel, A.V.D. Deep learning for anomaly detection: A review. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
  26. Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R. A unifying review of deep and shallow anomaly detection. Proc. IEEE 2021, 109, 756–795. [Google Scholar] [CrossRef]
  27. Li, Z.; Zhu, Y.; Van Leeuwen, M. A survey on explainable anomaly detection. ACM Trans. Knowl. Discov. Data 2023, 18, 1–54. [Google Scholar] [CrossRef]
  28. Bouman, R.; Heskes, T. Autoencoders for Anomaly Detection are Unreliable. arXiv 2025, arXiv:2501.13864. [Google Scholar] [CrossRef]
  29. Kim, S.; Lee, S.Y.; Bu, F.; Kang, S.; Kim, K.; Yoo, J.; Shin, K. Rethinking reconstruction-based graph-level anomaly detection: Limitations and a simple remedy. Adv. Neural Inf. Process. Syst. 2024, 37, 95931–95962. [Google Scholar] [CrossRef]
  30. Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2017, 1050, 10–48550. [Google Scholar]
  31. Salleh, I.; Zain, M.M.; Bakar, A.A. Modeling and simulation of acting force on a flexible automotive wiper. Appl. Model. Simul. 2018, 2, 51–58. [Google Scholar]
  32. Reddyhoff, T.; Dobre, O.; Le Rouzic, J.; Gotzen, N.-A.; Parton, H.; Dini, D. Friction induced vibration in windscreen wiper contacts. J. Vib. Acoust. 2015, 137, 041009. [Google Scholar] [CrossRef]
  33. Bansal, A.; Garg, N.K. Environmental Sound Classification: A descriptive review of the literature. Intell. Syst. Appl. 2022, 16, 200115. [Google Scholar] [CrossRef]
  34. Madanian, S.; Chen, T.; Adeleye, O.; Templeton, J.M.; Poellabauer, C.; Parry, D.; Schneider, S.L. Speech emotion recognition using machine learning—A systematic review. Intell. Syst. Appl. 2023, 20, 200266. [Google Scholar] [CrossRef]
  35. Mannem, K.R.; Mengiste, E.; Hasan, S.; de Soto, B.G.; Sacks, R. Smart audio signal classification for tracking of construction tasks. Autom. Constr. 2024, 165, 105485. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed GAR-DSVDD framework for semi-supervised anomaly detection.
Figure 1. Overview of the proposed GAR-DSVDD framework for semi-supervised anomaly detection.
Mathematics 13 03876 g001
Figure 2. Simulated training dataset for one run of the simulated experiment: labeled normals (green), unlabeled normals (blue), and unlabeled anomalies (orange) in two-dimensional space.
Figure 2. Simulated training dataset for one run of the simulated experiment: labeled normals (green), unlabeled normals (blue), and unlabeled anomalies (orange) in two-dimensional space.
Mathematics 13 03876 g002
Figure 3. Decision boundaries of GAR-DSVDD and baseline methods on the simulated dataset. The white dashed line shows the decision boundary learned by each method. GAR-DSVDD learns a boundary that closely matches the normal region while excluding anomalies by exploiting unlabeled data through its attention-weighted graph and deep structure.
Figure 3. Decision boundaries of GAR-DSVDD and baseline methods on the simulated dataset. The white dashed line shows the decision boundary learned by each method. GAR-DSVDD learns a boundary that closely matches the normal region while excluding anomalies by exploiting unlabeled data through its attention-weighted graph and deep structure.
Mathematics 13 03876 g003aMathematics 13 03876 g003b
Figure 4. Example of windshield wiper sound recordings and their log-scaled MFCC representations for (a) normal and (b) anomalous operations.
Figure 4. Example of windshield wiper sound recordings and their log-scaled MFCC representations for (a) normal and (b) anomalous operations.
Mathematics 13 03876 g004
Figure 5. Test F1 versus labeled-normal ratio ρ . GAR-DSVDD achieves the highest F1 when labels are scarce ( ρ = 0.10) and remains competitive as ρ increases, while DeepSVDD improves with more labels. OCSVM increases steadily but lags behind the deep methods; SVDD performs worst overall; S3SVDD shows non-monotonic behavior with a dip at ρ = 0.20.
Figure 5. Test F1 versus labeled-normal ratio ρ . GAR-DSVDD achieves the highest F1 when labels are scarce ( ρ = 0.10) and remains competitive as ρ increases, while DeepSVDD improves with more labels. OCSVM increases steadily but lags behind the deep methods; SVDD performs worst overall; S3SVDD shows non-monotonic behavior with a dip at ρ = 0.20.
Mathematics 13 03876 g005
Figure 6. Effect of the graph-regularization weight λ u on GAR-DSVDD performance at ρ = 0.03 . F1 (left axis) and accuracy (right axis) improve as λ u increases from 1 to 1300, peaking near λ u 1300 , then declining for larger values, indicating over-regularization beyond the optimum.
Figure 6. Effect of the graph-regularization weight λ u on GAR-DSVDD performance at ρ = 0.03 . F1 (left axis) and accuracy (right axis) improve as λ u increases from 1 to 1300, peaking near λ u 1300 , then declining for larger values, indicating over-regularization beyond the optimum.
Mathematics 13 03876 g006
Table 1. GAR-DSVDD and baseline methods testing performance over a simulated dataset experiment.
Table 1. GAR-DSVDD and baseline methods testing performance over a simulated dataset experiment.
MethodAccuracyF1Detection
Rate
PrecisionSpecificityBalanced Accuracy
GAR-DSVDD 0.920.920.990.860.840.92
DeepSVDD0.560.691.000.530.110.56
OCSVM0.760.810.990.680.530.76
SVDD0.710.770.990.630.430.71
S3SVDD0.690.760.990.620.380.69
Table 2. GAR-DSVDD and baseline methods testing F1 score over 10 simulated dataset experiments.
Table 2. GAR-DSVDD and baseline methods testing F1 score over 10 simulated dataset experiments.
Experiment
(Over Different Seeds)
GAR-DSVDDDeepSVDDOCSVMSVDDS3SVDD
10.860.680.740.740.74
20.830.700.780.780.78
30.800.730.800.700.70
40.920.690.810.770.76
50.880.690.750.670.67
60.800.720.800.790.79
70.810.690.760.680.69
80.860.680.790.750.74
90.780.690.760.760.76
100.850.680.760.770.77
Mean0.840.700.780.740.74
Standard deviation0.040.020.020.040.04
p-value (Wilcoxon)- 9.8 × 10 4 9.8 × 10 4 9.8 × 10 4 9.8 × 10 4
p-value (paired t-test)- 7.3 × 10 6 1.3 × 10 3 4.2 × 10 4 4.3 × 10 4
Table 3. Extracted features summary for acoustic analysis of windshield wiper operation sound recording.
Table 3. Extracted features summary for acoustic analysis of windshield wiper operation sound recording.
Feature FamilyRecording Dimension
(Means and Standard Deviations) *
Brief DescriptionReference
MFCC (32)64Cepstral summary of spectral envelope on the mel scale, a widely adopted baseline in recent ESC/SER studies.[33,34]
ΔMFCC (32)64First-order derivative of MFCCs capturing short-term spectral dynamics.[34]
Δ2MFCC (32)64Second-order derivative (acceleration) of MFCCs, emphasizing rapid spectral change.[34]
Chroma (12)24Energy folded into 12 pitch-class bins; it reflects tonal/resonant structure seen in mechanical acoustics.[33,34]
Spectral centroid2Power-weighted mean frequency (proxy for “brightness”).[33,34]
Spectral bandwidth2Spread around the centroid (spectral dispersion).[33]
Spectral roll-off (95%)2Frequency below which 95% of energy lies (high-frequency content indicator).[33,34]
RMS energy2Framewise signal power (overall loudness proxy).[33,34]
ZCR2Sign-change rate (simple proxy for roughness/high-frequency content).[33,34]
* total observation dimension: 1   b y 2 × 32 + 32 + 32 + 12 + 1 + 1 + 1 + 1 + 1 = 226 .
Table 4. GAR-DSVDD and baseline methods testing performance over the industrial windshield wiper dataset.
Table 4. GAR-DSVDD and baseline methods testing performance over the industrial windshield wiper dataset.
MethodAccuracyF1Detection
Rate
PrecisionSpecificityBalanced Accuracy
GAR-DSVDD 0.920.9110.830.860.93
DeepSVDD0.420.5910.4200.5
OCSVM0.420.5910.4200.5
SVDD0.420.5910.4200.5
S3SVDD0.420.5910.4200.5
Table 5. GAR-DSVDD and baseline methods testing F1 score over 10 industrial windshield wiper experiments, where different seeds are used to split the dataset into training, validation, and testing.
Table 5. GAR-DSVDD and baseline methods testing F1 score over 10 industrial windshield wiper experiments, where different seeds are used to split the dataset into training, validation, and testing.
Experiment
(Over Different Seeds)
GAR-DSVDDDeepSVDDOCSVMSVDDS3SVDD
10.860.550.550.550.55
20.930.80.80.820.8
30.760.50.50.520.5
40.790.600.630.630.63
50.670.630.630.630.63
60.910.590.590.590.59
70.830.450.450.450.45
80.970.740.740.740.74
90.960.70.70.70.7
100.900.630.630.690.63
Mean0.860.620.620.630.62
Standard deviation0.100.110.110.110.11
p-value (Wilcoxon)- 9.8 × 10 4 9.8 × 10 4 9.8 × 10 4 9.8 × 10 4
p-value (paired t-test)- 1.7 × 10 5 2.2 × 10 5 3.4 × 10 5 2.2 × 10 5
Table 6. Effect of labeled-normal ratio ( ρ ) on F1 for GAR-DSVDD and baselines (DeepSVDD, OCSVM, SVDD, S3SVDD).
Table 6. Effect of labeled-normal ratio ( ρ ) on F1 for GAR-DSVDD and baselines (DeepSVDD, OCSVM, SVDD, S3SVDD).
ρ GAR-DSVDDDeepSVDDOCSVMSVDDS3SVDD
0.10.910.850.870.680.88
0.20.910.900.880.710.77
0.50.930.930.910.770.90
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Alhindi, T.J. Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control. Mathematics 2025, 13, 3876. https://doi.org/10.3390/math13233876

AMA Style

Alhindi TJ. Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control. Mathematics. 2025; 13(23):3876. https://doi.org/10.3390/math13233876

Chicago/Turabian Style

Alhindi, Taha J. 2025. "Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control" Mathematics 13, no. 23: 3876. https://doi.org/10.3390/math13233876

APA Style

Alhindi, T. J. (2025). Graph-Attention-Regularized Deep Support Vector Data Description for Semi-Supervised Anomaly Detection: A Case Study in Automotive Quality Control. Mathematics, 13(23), 3876. https://doi.org/10.3390/math13233876

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop