1. Introduction
Anomaly detection is a critical task in numerous real-world applications, with use cases including identifying malicious behaviors in social networks [
1,
2], detecting financial fraud in transaction networks [
3,
4], and discovering anomalous patterns in industrial automation systems [
5,
6,
7], among others. With the widespread adoption of graph-structured data across various domains, the significance of GAD has become increasingly prominent [
8]. Accurately identifying anomalies in graph-structured data is crucial for maintaining the integrity and security of complex systems [
9,
10]. In recent years, alongside the rapid growth of graph-structured data in diverse fields, the importance of GAD has garnered extensive attention.
While significant progress has been made in this field, existing methods still face significant challenges. Early research primarily relied on statistical methods and community/clustering-based techniques. For instance, OddBall [
11] detects anomalies by analyzing node degrees, weight distributions, or local structural patterns (e.g., star/clique subgraphs). NetWalk [
12] introduced a real-time anomaly detection framework for dynamic networks, which jointly learns low-dimensional node representations via clique embedding and deep autoencoders while dynamically updating cluster centers using streaming K-means for anomaly identification. However, these methods typically rely on local structural features, making them less effective in capturing complex relationships and global patterns in graphs. With the advancement of deep learning, graph neural networks have emerged as a powerful tool for GAD. CmaGraph [
13] pioneered the integration of dynamic community evolution into anomaly detection. Its three-module architecture (community detection, metric enhancement, and one-class anomaly scoring) reconstructs intra-/inter-community node distances, significantly improving sensitivity to structural anomalies. FAGAD [
14] tackles the issue of anomaly signal loss caused by GNNs’ low-pass filtering. It proposes a frequency-adaptive GNN that enhances high-frequency anomaly feature extraction via all-pass/low-pass/high-pass signal fusion and a self-supervised bootstrapping strategy—without requiring labeled data. G3AD [
15] addresses the adverse effects of anomalies on GNNs by designing a dual-auxiliary encoder (decoupling attributes and topology) and an adaptive caching mechanism. This prevents the model from directly reconstructing anomaly-contaminated graph structures, thereby improving robustness against subtle structural deviations in unsupervised settings. However, most existing GNN-based methods focus solely on either node attributes [
16,
17] or topological structures, resulting in limited sensitivity to structural anomalies. This limitation becomes particularly evident when anomalies manifest as subtle structural deviations rather than distinct attribute differences. For example, as shown in
Figure 1, banks have identified anomalous credit card transactions where small purchases are immediately followed by large withdrawals across thousands of accounts. While individual accounts appear normal, fraudsters mask cash-out activities by coordinating transactions across controlled accounts. Such evasive patterns remain undetectable through neighborhood or attribute analysis alone. By encoding structural similarities, we can highlight shared behaviors among fraudulent nodes, such as connections to 5–10 peripheral accounts or 90% fund flow through these nodes, revealing hidden relationships to identify entire fraud networks rather than isolated suspicious accounts.
Current methods exhibit three key limitations: (1)
Over-reliance on reconstruction error: Many existing approaches, such as autoencoder-based methods [
18,
19], detect anomalies by measuring reconstruction errors in node attributes or graph structure. However, these methods often fail to capture discriminative features, limiting their ability to identify anomalies in complex graph structures. (2)
Inadequate exploitation of structural similarity: Most current methods poorly exploit structural similarity between nodes. For instance, SVD [
20] and Eigenvalue Decomposition [
21] applied to adjacency or Laplacian matrices are sensitive to minor graph perturbations and struggle to distinguish nodes with subtle yet critical structural differences. Similarly, random walk-based methods like DeepWalk [
22] and Node2Vec [
23] often fail to assign similar embeddings to nodes with identical structural roles but distant graph positions. (3)
Disjoint handling of attributes and structure: Many methods focus exclusively on either node attributes [
24,
25] or graph topology like DBSCAN [
26] and Struc2Vec [
27], neglecting their synergistic integration. This separation reduces sensitivity to complex anomalies that exhibit deviations in both attributes and structure [
28].
To address these limitations, we present DPGAD, a structure-aware dual-path attention graph node anomaly detection method. Specifically, DPGAD employs a dual-path attention mechanism to jointly model attribute features and structural similarity features in a unified framework. An adaptive gating mechanism dynamically balances the contributions of these features, enabling robust detection across diverse anomaly types. To summarize, we have the following contributions:
- (1)
We propose a novel dual similarity attention mechanism that captures both attribute and structural similarities in graph nodes. This mechanism enhances the model’s ability to detect anomalies, especially those defined by structural pattern deviations.
- (2)
A learnable gating mechanism dynamically adjusts the fusion of attribute and structural features. This allows DPGAD to focus on task-critical patterns, enhancing robustness against diverse anomalies.
- (3)
Extensive experiments on real-world datasets demonstrate DPGAD’s outperformance, with notable gains in accuracy for structure-sensitive anomalies.
4. Experiment
4.1. Experimental Environment
This experiment uses an Intel (R) core (TM) i9-10900k CPU@3.70 GHZ in the Microsoft Windows environment and an NVIDIA GeForce RTX 3090 GPU card. All the codes are implemented by Python 3.11.11, pytoch 2.4.1, and the pytoch geometric is used to complete the related graph calculation.
4.2. Datasets
This section describes the datasets used to evaluate DPGAD’s performance.
Table 1 summarizes their key statistics.
Weibo: Derived from user interactions on Tencent Weibo, this dataset includes location-based posting patterns and text features extracted via bag-of-words modeling. Users who publish two post pairs within a short timeframe (e.g., 60 s) and engage in at least five such activities are labeled as suspicious; others are considered normal.
Reddit: This dataset captures user–subreddit interactions on Reddit. Posts from users and subreddits are converted into feature vectors using Linguistic Inquiry and Word Count (LIWC) categories. Users banned from subreddits are flagged as anomalous.
Disney: Collected from Amazon’s movie co-purchase network, this dataset includes pricing, review counts, and ratings. Anomaly labels are determined via majority voting among high school students.
Books: Extracted from Amazon’s book co-purchase network, this dataset contains pricing, review volume, and ratings. Anomalies are labeled based on Amazon’s amazonfail tags.
Amazon: This dataset comprises instrument reviews from Amazon.com, designed to detect paid users posting fake reviews. It includes manually curated user features and behavioral statistics.
Tolokers: Sourced from Toloka’s crowdsourcing platform, this dataset tracks worker profiles and task performance. An edge connects two workers if they collaborate on the same task. The goal is to predict which workers are banned in a given project.
YelpChi: Built from Yelp.com reviews, this dataset identifies anomalies—unfair promotions or defamatory comments. Suspicious reviews are flagged based on biased or malicious content.
4.3. Baseline
LOF [
41]: Local Outlier Factor (LOF) quantifies node anomalies based on their isolation relative to neighboring nodes. LOF relies solely on node features, with neighborhoods selected via k-nearest neighbors (KNN).
DIF [
42]: Deep Isolation Forest (DIF) employs a novel representation scheme, where randomly initialized neural networks project raw data into randomized embeddings.
MLPAE [
24]: MLPAE adopts a Multilayer Perceptron (MLP) as both encoder and decoder to reconstruct node features, with reconstruction loss serving as the anomaly score for each node.
GCN: GCN is the most representative balanced network embedding method, which achieves node embedding by aggregating the characteristics of neighbor nodes.
GraphSAGE [
35]: GraphSAGE learns diverse aggregation functions—including mean, LSTM, and pooling aggregators—to integrate neighborhood information effectively.
GCNAE [
19]: GCNAE utilizes a Variational Graph Autoencoder (VGAE), where the encoder learns node embeddings and the decoder reconstructs both adjacency matrices and node attributes. It has anomaly detection using reconstruction error.
GAT [
25]: GAT enhances GCNs by adaptively weighting neighbor contributions through attention mechanisms.
DOMINANT [
18]: DOMINANT is a deep graph autoencoder that learns node representations via a shared encoder while separately reconstructing adjacency and attribute matrices. The anomaly score combines weighted structural and attribute reconstruction errors.
DONE [
43]: DONE employs dual autoencoders to encode topological structure and node attributes independently. Cross-modal interactions capture anomaly patterns, while a unified loss function jointly optimizes node embeddings and anomaly scores.
AdONE [
43]: AdONE extends DONE by integrating adversarial learning. A generator–discriminator framework refines node representations, distinguishing normal from anomalous patterns.
AnomalyDAE [
44]: AnomalyDAE leverages dual autoencoders with GATs to encode adjacency matrices and node features into separate embeddings. Attention mechanisms model asymmetric node interactions.
GAAN [
45]: GAAN adopts a Generative Adversarial Network (GAN) framework for outlier detection. The generator learns normal node distributions, while the discriminator differentiates real from generated nodes. Anomaly scores derive from discriminator outputs.
DiffGAD [
34]: DiffGAD introduces a discriminative content-guided generation paradigm. It extracts discriminative features by contrasting unconditional and conditional diffusion models and then computes reconstruction scores as node anomaly metrics.
4.4. Evaluation
To comprehensively evaluate the model’s performance, this paper adopts the following core evaluation metrics: ROC-AUC,
F1-score,
Recall, and Average Precision (
AP). These metrics measure model performance from different perspectives. ROC-AUC (Receiver Operating Characteristic–Area Under Curve) is a widely used tool for assessing binary classification model performance. It visualizes the relationship between the True Positive Rate (TPR) and False Positive Rate (FPR) across varying thresholds. The AUC value ranges from 0 to 1, with higher values indicating stronger model discriminative ability.
Recall is a crucial metric for measuring a model’s ability to identify positive/abnormal samples. It represents the proportion of actual positive/abnormal samples correctly identified by the model. The
F1-score is the harmonic mean of precision and
Recall. It provides a balanced assessment of the model’s precision and
Recall performance. Average Precision (
AP) measures a model’s precision performance at different
Recall levels. It is calculated by computing the weighted average of precision values across these
Recall levels, offering a more comprehensive evaluation of model performance across thresholds. The calculation formulas for these metrics are given as follows:
4.5. Experimental Results
To comprehensively evaluate the effectiveness of the proposed DPGAD, we conducted systematic experiments on multiple real-world datasets. The experiments primarily focused on addressing three key questions: (1) Can DPGAD significantly outperform existing methods in terms of overall performance? (2) Do structural similarity features play a crucial role in the GAD task? (3) Does DPGAD exhibit strong noise resistance performance?
4.5.1. Performance Analysis
This section presents a comprehensive comparison with state-of-the-art methods using AUC and
F1-score metrics. The ROC-AUC results are summarized in
Table 2, while
F1-scores are reported in
Table 3. The performance of
Recall and
AP is shown in the
Appendix A.
From the experimental results of the two evaluation indices, we can derive the following observations: (1) DPGAD demonstrates robust anomaly detection across all datasets, achieving the highest AUC on six datasets and close-to-optimal performance on Amazon. With an average AUC improvement of 9.69% over the second best method, DPGAD significantly outperforms existing methods in graph anomaly detection. (2) DPGAD maintains strong F1-scores across all datasets, with a 12.97% average improvement over the second best method. This indicates its insensitivity to class imbalance, highlighting superior robustness and stability. (3) DPGAD surpasses competitors by 5.63% (AUC) and 12.9% (F1), attributed to its focus on structural similarity. Here, anomalies exhibit higher clustering coefficients (0.400 vs. 0.301), indicating structural deviations. DPGAD achieves 27.63% (AUC) and 28% (F1) gains over the runner-up. Unlike baselines that overfit on this small-scale dataset, DPGAD effectively isolates anomalies by leveraging structural distinctions during encoding.
4.5.2. Ablation Experiment
Effectiveness of structural similarity feature extraction. To validate the importance of structural similarity feature extraction in GAD, we design con_MLP—a standalone model combining structural similarity embeddings with an MLP head. Experiments are conducted on seven datasets to evaluate its effectiveness. We further compare its ROC-AUC performance against two baselines: (1) MLPAE (MLP-based autoencoder) and (2) a GAT model trained solely on attribute features. The results are summarized in
Table 4.
Comparative experiments reveal that using structural similarity features alone achieves competitive performance compared to MLP-based architectures that solely rely on node attributes. This demonstrates the effectiveness of structural features in graph anomaly detection tasks. Notably, the con_MLP variant significantly outperforms attribute-only models on Weibo and Disney datasets. This indicates heightened sensitivity of anomalous nodes to structural patterns in these datasets, which further explains the superior overall performance observed in
Section 4.5.1.
The effect of gating coefficient on performance. In the adaptive gated fusion unit, the model reduces to a standard GAT when , while it becomes a structure-similarity-based GAT when . This demonstrates that analyzing different gating coefficients helps reveal the importance of structural similarity features in GAD tasks. To evaluate the impact of gating coefficients on model performance, we conduct experiments on Weibo, Reddit, Disney, and Books by fixing different gating values and analyzing their effects across datasets. Specifically, we list the ROC-AUC performance with epoch = 500 and with different values of ranging from 0.1 to 1.0 over the Weibo, Reddit, Disney, and Books datasets.
Figure 4 reveals that the optimal
varies across datasets. A higher
indicates DPGAD’s stronger reliance on attribute features, while a lower
suggests greater dependence on structural similarity features.
Table 5 reports the mean
values when DPGAD achieves peak performance after 500 training epochs. These values align closely with the optimal
in
Figure 4, demonstrating that the adaptive gating unit converges to appropriate coefficients during training. Notably, the gating coefficients corresponding to peak performance consistently fall below 0.5. As shown in
Figure 5, we analyze the distribution of gating coefficients after 500 training epochs on Weibo and Disney. The results reveal that most nodes exhibit gating values below 0.5. More importantly, the vast majority of anomalous nodes show gating coefficients under 0.5, indicating that DPGAD’s performance gains stem from its effective utilization of structural dependencies. This further highlights the critical role of structural similarity in graph node anomaly detection.
Robustness Testing in Complex Scenarios. To evaluate DPGAD robustness and applicability in complex scenarios, we conducted robustness tests. These tests simulate real-world challenges in consumer applications, including data sparsity, disordered review content, and adversarial users evading keyword-based detection, to assess DPGAD performance under such conditions. Specifically, we injected varying noise levels (10%, 20%, 30%) into the attribute features across four datasets and measured DPGAD ROC-AUC performance under different noise scales. Results are shown in
Figure 6.
DPGAD maintains consistent performance across multiple datasets under varying noise levels (10%, 20%, 30%). While the AUC-ROC scores decrease with higher noise levels, the marginal degradation remains within acceptable thresholds, demonstrating the model’s noise tolerance. Notably, on the Weibo and Disney datasets, DPGAD shows less than 3% performance drop when noise increases from 10% to 30%, indicating superior robustness.
Section 4.5.2 further reveals that DPGAD primarily relies on structural similarity features, which explains its resilience to attribute noise across all four datasets.
Performance evaluation of different types of anomalies. To assess DPGAD’s adaptability to diverse anomalies, we conducted experiments on the Weibo dataset. However, the publicly available dataset does not specify distinct anomaly types. Therefore, we employed K-means clustering to categorize anomalies automatically. Specifically, we first concatenated each node’s structural and attribute features as input. Then, we computed the Silhouette Coefficient for different K values to determine the optimal number of clusters. On the Weibo dataset, the best-performing K was 3, indicating three distinct anomaly types. The distribution of these three anomaly categories is visualized in
Figure 7 (left).
The confusion matrix in
Figure 7b demonstrates DPGAD’s classification performance on clustered data, where each cell value represents the proportion relative to its class total. The results indicate DPGAD achieves strong recognition across all categories, confirming its generalization capability for diverse anomaly types.
4.5.3. Visual Analysis
Feature visualization in training process. To visually validate the effectiveness of attribute feature embeddings and structural similarity embeddings, we visualize both representations—along with their gated fusion—at early (epoch = 1) and late (epoch = 500) training stages. Using t-SNE [
46], we project the attribute and structural similarity embeddings into 2D space, with normal and anomalous nodes color-coded for distinction. Results are shown in
Figure 8 and
Figure 9.
In the visualization results, we found the following: (1) Combining features more effectively distinguishes and isolates anomalous nodes from normal ones compared to using attribute features or structural similarity features alone. This indicates that feature fusion enables the model to learn more comprehensive node representations, thereby improving detection accuracy and robustness. The improvement is particularly pronounced on Weibo and Disney, aligning with DPGAD’s superior performance on these datasets. (2) At Epoch = 1, node distributions using only AF exhibit significant overlap between normal (blue) and anomalous (red) nodes, with no clear separation. However, after incorporating SSF, even in early training stages, anomalies begin to diverge from normal nodes. This demonstrates that SSF plays a critical role in capturing anomalous patterns, especially when AF provides limited discriminative signals. (3) DPGAD’s advantages are particularly evident on small-scale datasets like Disney. By jointly leveraging AF and SSF, the model cleanly separates the minority anomalous nodes from the majority normal ones. This capability confirms DPGAD’s consistent detection efficacy on small datasets, even when anomalies are sparse or scarce.
Visualization of classification performance. To visually compare the performance of different graph anomaly detection methods on real-world datasets, visualization reveals how effectively each model separates normal nodes (blue) from anomalies (red). Specifically, we analyze the feature distributions of deep learning-based methods after 500 training epochs, with results visualized in
Figure 10.
The visualization results demonstrate that DPGAD achieves sharper cluster separation between normal and anomalous nodes, with minimal overlap between the two categories. Although GCN and GraphSAGE show clear clustering effects for anomalous nodes, they fail to effectively separate anomalies from normal nodes, resulting in significant overlap regions between them. This limitation leads to their inferior performance compared to DPGAD. This observation indicates DPGAD’s enhanced sensitivity and precision in detecting structural anomalies within graph data. Notably, DPGAD maintains robust detection performance on the Weibo dataset, effectively identifying anomalous nodes despite their low population ratio (10.3%). This confirms DPGAD’s exceptional capability in detecting minority anomaly groups within imbalanced graph datasets.