Next Article in Journal
Dialogical Learning Support in RAG-Based E-Learning
Previous Article in Journal
Expert-Validated Framework for Integrating Photogrammetry and BIM in Saudi Vision 2030 Construction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture

by
Jizhao Liu
,
Shuo Mao
*,
Shuqin Zhang
,
Fangfang Shan
and
Jun Li
School of Computer Science, Zhongyuan University of Technology, Zhengzhou 451191, China
*
Author to whom correspondence should be addressed.
Information 2026, 17(5), 416; https://doi.org/10.3390/info17050416
Submission received: 5 March 2026 / Revised: 18 April 2026 / Accepted: 24 April 2026 / Published: 27 April 2026
(This article belongs to the Topic Machine Learning and Data Mining: Theory and Applications)

Abstract

Due to the capability of graph structures to model complex relationships, graph anomaly detection has significant application value in various domains, including financial fraud detection, network security, and fake account identification. Traditional graph anomaly detection methods follow a specialized paradigm of “one dataset, one model”, which requires retraining or fine-tuning models for each new domain. This approach faces critical challenges in practical applications, namely high deployment costs and limited generalization capability. To address this problem, generalist graph anomaly detection aims to achieve the goal of “train once, apply across domains”. However, existing generalist methods primarily rely on graph neural networks to implicitly learn structural information, where the learned structural representations are tightly coupled with specific topology distributions, resulting in limited structural stability under domain shifts. To address this limitation, we propose DualGAD, a generalist graph anomaly detection method via a dual-encoder architecture. In particular, DualGAD introduces explicit structural modeling that characterizes the relative topological deviation of nodes with respect to the overall graph structure, thereby enhancing structural invariance across heterogeneous domains. This method separately models node attribute information and explicit graph structural information via an attribute feature encoder and an explicit structural feature encoder, and adopts an “attribute-dominant, structure-complementary” fusion strategy to achieve collaborative modeling. Experiments on eight real datasets demonstrate that DualGAD achieves an average improvement of 3.12% in AUROC compared to the strongest baseline methods, exhibiting significant cross-domain generalization capability.

1. Introduction

In modern information society, complex systems can often be modeled as graph structures, where nodes represent entities and edges capture relationships between entities [1]. For instance, users and transactions form graph structures in financial transaction networks, users and their following relationships constitute graph networks in social media platforms, and papers with their citation relationships also exhibit graph characteristics in academic citation networks [2]. Graph anomaly detection aims to identify objects in graphs that significantly differ from the majority patterns (such as nodes, edges, subgraphs, or entire graphs), where these anomalies typically indicate important events such as fraud, attacks, or failures. Therefore, graph anomaly detection plays an essential role in ensuring the reliability and security of complex networked systems. However, graph anomaly detection faces unique challenges. Unlike traditional data, graph data possesses unique dual characteristics: nodes not only have their own attribute features, such as users’ age and income, but are also embedded within specific structural relationships, such as connection patterns with other users. As illustrated in the Figure 1, this dual nature leads to diverse anomaly patterns, some anomalous nodes deviate from normal patterns in their connection structures while maintaining normal attribute features, known as structural anomalies [3,4], whereas others exhibit abnormal attribute features but maintain normal connection patterns, referred to as contextual anomalies [5]. This complexity, combined with the high cost of obtaining labeled data, makes graph anomaly detection an extremely challenging research problem [6].
Graph anomaly detection methods have evolved from specialized to generalized approaches. Traditional statistical methods primarily rely on topological features to identify anomalous nodes [7,8,9], offering good interpretability but struggling to handle complex nonlinear patterns. With the advancement of deep learning, graph neural network (GNN)-based methods have become mainstream, which can be categorized into supervised and unsupervised approaches based on whether labeled information is utilized. Supervised methods such as GCN [10], GAT [11], and BWGNN [12] model anomaly detection as a binary classification problem, learning node representations through message passing mechanisms and fusing attribute and structural information. Unsupervised methods such as DOMINANT [13], CoLA [14], and HCM-A [15] learn normal patterns through proxy tasks like reconstruction error and contrastive learning, without relying on labeled data. However, these methods still follow the traditional paradigm of “one dataset, one model,” requiring retraining or fine-tuning for each new domain, which poses critical challenges of high deployment costs and limited generalization capability in practical applications. To overcome these limitations, generalist graph anomaly detection has emerged, aiming to achieve the goal of “train once, apply across domains.” Existing generalist methods such as ARC [16] adopt few-shot learning strategies, while UNPrompt [17] employs zero-shot inference, training unified models on multiple source domain datasets and then directly applying them to different target domains without retraining or fine-tuning. This paradigm can significantly reduce deployment costs and leverage cross-domain knowledge to enhance detection performance.
Although generalist graph anomaly detection has achieved certain progress, existing methods still suffer from evident limitations in structural information modeling. Most approaches, such as ARC [16] and UNPrompt [17], primarily rely on GNN neighborhood aggregation mechanisms, implicitly learning structural information while modeling attribute information. However, GNN aggregation is typically restricted to local connection patterns within 1–3 hops, making it difficult to capture global topological characteristics such as degree distribution and clustering coefficients. More importantly, since structural representations are implicitly learned under specific topological statistics, their expressive results are often strongly coupled with graph scale, connectivity density, and degree distribution. When domain shifts occur, such distribution-dependent structural embeddings are prone to representation drift, leading to inconsistent structural semantics and thus weakening the model’s cross-domain generalization ability and detection stability. Therefore, a key gap in current generalist graph anomaly detection lies in how to construct structural representations with stable semantics across heterogeneous graph domains, ensuring that structural information remains consistent and transferable across different domains, rather than relying on implicitly learned embeddings tightly coupled with specific topology distributions.
Based on this insight, this paper proposes DualGAD. By introducing explicit structural modeling, we redesign structural statistics to characterize the relative topological deviation of nodes with respect to the overall graph structure, rather than relying on absolute structural values. Through this relative modeling strategy, structural representations reduce their dependence on graph scale and density, thereby enhancing cross-domain invariance and semantic stability. Specifically, DualGAD adopts a dual-encoder architecture to model attribute information and explicit structural information separately. The attribute encoder learns semantic representations through GNNs, while the explicit structural encoder extracts structural statistics such as degree discreteness, triangle density, neighborhood overlap, and in–out degree ratio, and transforms them into structural embeddings through normalization and neural encoding. In this way, the instability of implicit structural modeling under cross-domain settings can be effectively mitigated. Considering that attribute information typically contains richer semantic cues and plays a dominant role in anomaly detection tasks, whereas structural information mainly serves as an auxiliary enhancement, DualGAD adopts an “attribute-dominant, structure-complementary” lightweight fusion strategy to collaboratively integrate the embeddings from the two encoders. Combined with a few-shot graph attention mechanism, DualGAD achieves cross-domain anomaly detection. Experimental results on multiple real-world datasets demonstrate that the proposed method significantly improves detection performance and cross-domain generalization ability, validating the effectiveness of explicit structural modeling in generalist graph anomaly detection.
In summary, our contributions are as follows:
  • We propose the DualGAD dual-encoder architecture, which is the first to introduce explicit structural modeling into generalist graph anomaly detection and effectively alleviates structural instability under cross-domain settings.
  • We design a structural feature construction method that characterizes relative topological deviation, together with an “attribute-dominant, structure-complementary” lightweight fusion strategy, enabling effective collaborative modeling of attribute and structural information.
  • We conduct extensive experimental validation across multiple real-world datasets, demonstrating the effectiveness and superiority of DualGAD, showing remarkable cross-domain generalization performance, and validating the effectiveness of explicit structural modeling in generalist graph anomaly detection.

2. Related Work

2.1. Topological Feature-Based Graph Anomaly Detection

Graph anomaly detection aims to identify nodes or edges that significantly deviate from normal patterns in graph-structured data. Early approaches primarily relied on explicit graph topological features. Ghani  et al. [7] identified anomalous nodes through frequent subgraph pattern analysis, while Eberle et al. [8] leveraged degree distribution deviations for anomaly detection. Centrality-based methods [18,19] identified nodes with importance anomalies by computing metrics such as betweenness centrality and closeness centrality. These methods offer excellent interpretability and can directly extract topological features. However, they struggle to handle complex nonlinear anomaly patterns and cannot effectively integrate node attribute information, limiting their performance when dealing with large-scale complex graph data.

2.2. Deep Learning-Based Graph Anomaly Detection

With the development of graph neural networks, deep learning-based GAD methods have become mainstream and significantly improved detection performance. Based on the utilization of labeled information, they can be categorized into supervised and unsupervised approaches. Supervised methods model anomaly detection as a binary classification problem. Representative methods include GCN [10], which extends semi-supervised learning to anomaly detection, GAT [11], which introduces attention mechanisms to assign neighbor weights, and BWGNN [12], which adopts a Bayesian framework to handle structural uncertainty. These methods perform excellently with sufficient labeled data but face challenges of high annotation costs and label imbalance. Unsupervised methods learn normal patterns through proxy tasks such as reconstruction error and contrastive learning. DOMINANT [13] identifies anomalies by minimizing structural and attribute reconstruction errors, CoLA [14] learns consistent representations based on contrastive learning, AnomalyDAE [20] utilizes dual-encoders to separately model structural and attribute information, and HCM-A [15] employs hybrid clustering mechanisms to handle anomaly detection in attributed graphs. These deep learning methods have achieved superior performance on individual datasets. However, these methods still follow the traditional paradigm of “one dataset, one model.” Specifically, methods such as DOMINANT [13], AnomalyDAE [20], DGNN [21], and ARANE [22] adopt dual-encoder designs, identifying different anomaly patterns through decoupled structural and attribute information modeling. Although these methods can effectively handle attribute information and structural information separately, they still learn structural information implicitly, failing to explicitly control the types of structural features being learned. How to effectively combine explicit structural features from traditional graph theory to enhance cross-domain generalization capability remains a direction worth exploring.

2.3. Generalist Graph Anomaly Detection

Generalist graph anomaly detection aims to develop unified models capable of performing anomaly detection across multiple graph datasets. Unlike traditional methods that require training separate models for each dataset, generalist GAD pursues the goal of “train once, apply across domains.” Existing generalist GAD methods mainly encompass two categories: few-shot learning-based approaches and zero-shot inference-based approaches. ARC [16] adopts a few-shot learning strategy, drawing inspiration from large language model experiences to perform anomaly detection through few-shot normal samples. UNPrompt [17] represents a zero-shot generalist graph anomaly detection method that employs a unified prompting strategy, enabling direct anomaly detection without requiring labeled samples from target domains. These methods have achieved certain progress in cross-domain detection. However, these methods primarily learn structural information implicitly through graph neural networks. In contrast to existing methods, the DualGAD method proposed in this paper directly extracts explicit graph topological features and implements an “attribute-dominant, structure-complementary” fusion strategy through dual-encoder design, thereby achieving more effective cross-domain generalization.

3. Preliminary Knowledge

3.1. Notations

Let G = ( V , E ) be an attributed graph with n nodes and m edges, where V = { v 1 , , v n } represents the node set, E denotes the edge set, and node attributes are described by the feature matrix X R n × d , with each row X i representing the feature vector of node v i . The topological structure of the graph is represented by the adjacency matrix A, where A i j = 1 if and only if there exists an edge ( v i , v j ) E . To facilitate graph neural network processing, we define the normalized adjacency matrix A ˜ , obtained through symmetric normalization: A ˜ = D 1 / 2 A D 1 / 2 , where D is the degree matrix. In GAD, the node set can be partitioned into a set of normal nodes V n and a set of anomalous nodes V a . Typically, the number of normal nodes significantly exceeds that of anomalous nodes, i.e.,  | V n | | V a | . The label vector y { 0 , 1 } n indicates the status of each node, where y i = 1 denotes that node v i is anomalous, and  y i = 0 indicates a normal node.

3.2. Traditional GAD Problem

Traditional Graph Anomaly Detection (GAD) aims to identify anomalous nodes within a given graph G = ( V , E , X ) . Conventional GAD methods typically focus on performing model training and anomaly detection within the same graph. Specifically, given a graph G, the anomaly scoring model f is optimized on G. Formally, GAD aims to learn an anomaly scoring function f : V R to detect anomalies within the same graph in either a supervised or unsupervised manner. The scoring function is expected to generate higher anomaly scores for anomalous nodes compared to normal nodes, i.e.,  f ( v i ) < f ( v j ) when v i V n and v j V a .

3.3. Generalist GAD Problem

Generalist GAD aims to learn a universal model f on training graphs such that the model can directly adapt to target graphs from different domains without any fine-tuning or retraining. Formally, let D t r a i n = { G 1 , G 2 , , G t } be the training dataset from different application domains, and let D t e s t = { G 1 , G 2 , , G s } be the test dataset from different application domains, where D t r a i n D t e s t = , and the datasets in the training and test sets may come from completely different distributions and application domains. The goal of generalist GAD is to train a universal GAD model f on D t r a i n such that f can identify anomalies in any test graph dataset G D t e s t .

4. Methodology

This section proposes DualGAD, a generalist graph anomaly detection method based on dual-encoder architecture. As illustrated in Figure 2, this method comprises four core components: (1) cross-domain feature alignment module; (2) attribute feature encoder; (3) explicit structural feature encoder; (4) few-shot graph attention detector. Through this dual-encoder architecture design, DualGAD significantly improves cross-domain generalization performance while maintaining computational efficiency.

4.1. Cross-Domain Feature Alignment

Feature Projection. To address the issue of inconsistent attribute dimensions across graphs, we employ Principal Component Analysis (PCA) to unify the feature dimensions of multiple graph datasets to the same dimension. PCA projects the high-dimensional and heterogeneous node attributes from different graphs into a shared low-dimensional space, eliminating dimension discrepancies across domains. Specifically, given any graph G ( i ) from D t r a i n and D t e s t with attributes X ( i ) R n ( i ) × d ( i ) , we transform it to X ˜ ( i ) with a common dimension d u , where d u is the pre-defined unified feature dimension for all cross-domain graphs. The transformed feature matrix is defined as:
X ˜ ( i ) R n ( i ) × d u = PCA ( X ( i ) ) = X ( i ) W ( i ) ,
where W ( i ) R d ( i ) × d u is the linear projection matrix determined by the dataset. To  maintain generality, W ( i ) can be obtained using common dimensionality reduction methods, such as Singular Value Decomposition [23] (SVD) and Principal Component Analysis [24] (PCA).
Smoothness-based Feature Reordering. Although PCA projection resolves the dimension inconsistency issue, challenges remain in the attribute correspondence relationships of cross-domain features. To achieve effective feature alignment, we adopt the smoothness-based feature reordering strategy proposed by Liu et al. [16]. The core idea of this method is to rank and align features according to their contribution to the anomaly detection task. Specifically, for the k-th feature dimension in graph G = ( V , E , X ) , its smoothness is defined as:
S k ( X ) = 1 | E | ( x i , x j ) E ( x i , k x j , k ) 2 ,
where x i , k denotes the k-th attribute feature of node v i , x j , k denotes the k-th attribute feature of node v j connected to v i , and a lower S k indicates significant variation of the k-th feature between connected nodes, suggesting that this feature corresponds to high-frequency graph signals and exhibits strong heterogeneity. This smoothness-based feature reordering strategy is supported by existing general graph anomaly detection methods, such as ARC [16], which have observed that features with lower S k are consistently more effective at distinguishing anomalies across most datasets. Specifically, features with lower S k exhibit larger attribute differences between connected nodes, making them inherently more sensitive to anomalous nodes that deviate from typical local neighborhood patterns. Therefore, we reorder the feature dimensions of all datasets in ascending order of S k , such that features with the highest anomaly sensitivity are positioned at the front.

4.2. Attribute Feature Encoder

After feature alignment, we design a dual-encoder architecture to generate node embeddings. This architecture contains two complementary encoding modules: an attribute feature encoder and an explicit structural feature encoder. Unlike existing methods that employ single GNN encoders or dual-encoder approaches relying on implicit structural modeling, our dual-encoder architecture enhances anomaly detection capability through collaborative modeling of explicit structural feature extraction and attribute information, enabling more comprehensive modeling of graph anomaly patterns.
The attribute feature encoder achieves implicit structural learning through the neighborhood aggregation mechanism of graph neural networks, focusing on capturing attribute features within the graph structure. Specifically, our encoder consists of three steps: multi-hop propagation, shared MLP-based transformation, and ego-neighbor residual operation. First, we perform propagation on the aligned feature matrix X ˜ = X ( 0 ) for L iterations, and then conduct transformation on the initial and propagated features with a shared MLP network:
X ( l ) = A X ( l 1 ) , Z ( l ) = MLP ( X ( l ) ) ,
where l { 0 , 1 , , L } , X ( l ) is the propagated feature matrix at the l-th iteration, Z ( l ) is the transformed representation matrix at the l-th iteration, and A is the normalized adjacency matrix. MLP ( · ) denotes a multi-layer perceptron, a standard feed-forward neural network that applies non-linear transformations to the node features at each layer. After obtaining Z ( 0 ) , , Z ( l ) , we calculate the residual representation by taking the difference between Z ( l ) and Z ( 0 ) :
R ( l ) = Z ( l ) Z ( 0 ) ,
where R ( l ) is the residual matrix at the l-th iteration. These multi-hop residual representations are then concatenated to form the final embedding H 1 :
H 1 = Concat R ( 1 ) , R ( 2 ) , , R ( L ) ,
where H 1 R n × d u is the output embedding matrix of the attribute encoder. This residual design can effectively capture local feature propagation information at different hop distances.

4.3. Explicit Structural Feature Encoder

The explicit structural feature encoder is dedicated to extracting explicit structural information independent of attribute features. We design an augmented structural feature extractor that systematically extracts four types of complementary structural features, which characterize the structural properties of nodes from different perspectives.
Degree Discreteness. This feature measures the relative position of a node’s degree in the degree distribution. It quantifies how much a node’s degree deviates from the average connectivity level of the network. The degree discreteness for node v i is defined as:
f 1 ( i ) = tanh | d i μ d | σ d + ε ,
where μ d and σ d are the mean and standard deviation of all node degrees in the graph, respectively, and ε is a small constant to prevent division by zero and ensure numerical stability.
Triangle Density. This feature approximates the common neighbor count through degree aggregation to compute triangle participation, which is particularly suitable for structural anomaly detection in social networks and academic citation networks. Here, we define w i j as the degree similarity weight between node v i and node v j . The triangle density for node v i is defined as:
f 2 ( i ) = tanh j N ( v i ) min ( d i , d j ) · w i j d i ,
where w i j = min ( d i , d j ) max ( d i , d j ) × 0.1 , N ( v i ) represents the neighbor set of node v i , and  d i and d j are the degrees of nodes v i and v j , respectively. The weight w i j is based on degree similarity to modulate contribution strength, with the denominator d i providing normalization to ensure comparability across nodes with different degrees.
Neighborhood Overlap. This feature is based on hypergeometric distribution theory to calculate the expected common neighbor count, measuring the connection tightness between a node and its neighbors. The neighborhood overlap for node v i is defined as:
f 3 ( i ) = 1 | E ( v i ) | u E ( v i ) ( d v i 1 ) ( d u 1 ) n 2 ,
where E ( v i ) represents the edge set connected to node v i , d v i denotes the degree of node v i , d u represents the degree of node u, and n is the total number of nodes in the graph.
In–Out Degree Ratio. This feature measures the imbalance between a node’s in-degree and out-degree, specifically designed to detect structural anomalies in directed graphs, such as spam accounts in social networks and patterns of non-reciprocal citations in academic networks. The in–out degree ratio for node v i is defined as:
f 4 ( i ) = tanh | log ( d i in / ( d i out + ε ) ) μ | σ + ε ,
where d i in and d i out represent the in-degree and out-degree of node v i , respectively, μ and σ are the mean and standard deviation of this feature across all nodes, log ( · ) denotes the logarithmic transformation to reflect relative strength, and ε is a small constant to avoid division by zero and ensure numerical stability.
Based on the definitions of the four structural features above, we convert the features from “absolute values” to “relative deviation degrees” through global topological statistical correction and node intrinsic attribute normalization. This ensures that each feature value only reflects the statistical deviation of a node relative to the graph itself, rather than absolute magnitudes. Therefore, regardless of whether the total number of nodes is thousands or tens of thousands, and whether the edge density is sparse or dense, the distributions of these four features are normalized to a similar scale, endowing them with inherent cross-domain invariance.
This invariance is crucial for generalist anomaly detection: anomaly detection essentially judges whether a node “relatively deviates” from normal patterns, rather than relying on absolute values. As long as the features maintain the same statistical meaning across different domains (e.g., a Z-score greater than 1 indicates significant deviation), the decision boundary learned by the model on one domain can be directly transferred to another domain without fine-tuning. In contrast, the implicitly learned structural embeddings of GNNs are tightly bound to the degree distribution and community structure of a specific graph; when the domain changes, the embedding distribution drifts, causing the decision boundary to fail. Precisely because of these properties, DualGAD achieves strong cross-domain generalization across diverse graph data with large differences in node count and connection density, and its cross-domain generalization performance outperforms existing methods that rely on implicit structure learning.
Structural Embedding Generation. Then we map the four extracted structural features to a low-dimensional space through a lightweight two-layer MLP encoder with ReLU activation, and finally output the encoded structural feature embeddings. The encoder adopts a fully connected architecture to transform handcrafted structural statistics into learnable low-dimensional embeddings. The final structural feature embedding is represented as H 2 :
H 2 = Encoder ( [ f 1 , f 2 , f 3 , f 4 ] ) ,
The two-layer design balances representation capacity and computational efficiency, while ReLU introduces necessary non-linearity without extra overhead. Through this design, we systematically integrate the structural prior knowledge of traditional graph theory methods and achieve end-to-end learnable representations through a deep learning framework.

4.4. Dual-Encoder Fusion Mechanism

To effectively integrate the complementary information from the two encoders, we further design a lightweight dual-encoder fusion mechanism. Due to the dimensional mismatch problem between the outputs of the two encoders, we first map the structural feature embeddings to a unified representation space through a linear projection layer. The projected structural representation H 3 is defined as:
H 3 = H 2 W proj + b proj ,
where W proj and b proj are the projection layer parameters. The final representation adopts an “attribute-dominant, structure-complementary” strategy to combine the embedding representations of the two encoders, achieving collaborative modeling. The final output node embedding is defined as follows:
H final = α · H 1 + ( 1 α ) · H 3 ,
where α [ 0 , 1 ] is the weight coefficient for dual-encoder fusion, which balances the contributions of attribute feature embedding H 1 and structural feature embedding H 3 . A larger α indicates a stronger dominance of attribute features. According to the subsequent parameter sensitivity analysis in our paper, we select the weight α = 0.7 in our model, which not only reflects our “attribute-dominant, structure-complementary” strategy but also maintains excellent performance.
Through this well-defined fusion strategy, our method can simultaneously identify two types of complex patterns: attribute anomalies and structural anomalies. For contextual anomalies that manifest as attribute features deviating from normal distributions but with normal connection patterns, the attribute encoder plays a dominant identification role; for structural anomalies with abnormal connection patterns but normal attribute features, the structural encoder provides crucial complementary information [3]. Different from attention fusion, gated fusion, or other fusion mechanisms that introduce extra learnable parameters for dynamic weight adjustment, our fixed-weight design is inherently lightweight for its parameter-free nature without any additional training overhead. This lightweight advantage not only reduces model complexity but also brings lower performance fluctuation across datasets than dynamic fusion strategies, making it more suitable for few-shot cross-domain generalization. As verified by our subsequent comparison experiments in Section 5.2.2, such a design yields superior detection performance while maintaining high efficiency. Thus, we adopt fixed-weight fusion as the default strategy in DualGAD.

4.5. Few-Shot Graph Attention Detection

Attention Mechanism Design. The core innovation of this module lies in extending the GAT attention mechanism from intra-graph neighborhoods to cross-domain support set-query set scenarios, achieving anomaly detection through attention reconstruction with few normal samples. Specifically, we partition the fused node embedding matrix by index into two parts: support node embeddings H s R n s × d e and query node embeddings H q R n q × d e , where n s denotes the number of nodes in the support set, n q denotes the number of nodes in the query set, and  d e represents the dimension of node embedding features. The core idea is to leverage normal nodes in the support set to reconstruct query set nodes, where normal nodes can be well reconstructed by support set samples, while anomalous nodes are difficult to reconstruct.
To measure the correlation between each query node and each node in the support set, we adopt a GAT-style parameterized attention mechanism. Here, W is the learnable linear projection weight matrix in the attention mechanism, used to transform node embeddings; a is the learnable attention weight vector, and  a T denotes its transpose. For query node v i q and support node v j s , the attention score is computed as:
e i j = LeakyReLU ( a T [ W H q i W H s j ] ) ,
To obtain an effective attention weight distribution, we apply softmax normalization to the computed attention scores, ensuring that all support node weights corresponding to each query node sum to 1. Finally, we utilize the computed attention weights to perform a weighted combination of support set node features, thereby reconstructing the representation of query nodes. The query node reconstruction formula is as follows:
H ˜ q i = j = 1 n s softmax ( e i j ) H s j ,
Anomaly Score Calculation. Given query nodes, we calculate the Euclidean distance between the original input and the reconstructed representation to quantify the anomaly degree. The anomaly score function is defined as:
f ( v i ) = H ˜ q i H q i 2 = j = 1 d e ( H ˜ q i H q i ) 2 ,
Based on the above definition, the attention weights are non-negative and sum to one (due to softmax), so the reconstructed embedding is a convex combination of the support node embeddings (i.e., a linear combination with non-negative coefficients summing to one). Normal query nodes, being similar to the support set, lie within or near the convex hull, yielding a small reconstruction error. Anomalous nodes deviate from the normal distribution and fall outside the hull, resulting in a larger reconstruction error. This guarantees anomaly separability. This design ensures that the original embeddings and reconstructed embeddings reside in the same representation space, enabling the reconstruction distance to directly reflect the degree to which a node deviates from normal patterns. The higher the anomaly score, the more likely the node is to be anomalous.

4.6. Loss Function and Training Strategy

Contrastive Learning Loss. We employ a contrastive learning paradigm to weight query nodes through normal nodes in the support set. The training objective is to make normal nodes obtain high weighted similarity, while anomalous nodes obtain low weighted similarity. Given a query node v i with its corresponding embedding H q i , weighted embedding H ˜ q i , and ground truth label y i , the sample-level loss function can be written as:
L = 1 cos ( H q i , H ˜ q i ) if y i = 0 max ( 0 , cos ( H q i , H ˜ q i ) ε ) if y i = 1 ,
Test-time Inference. During the test-time inference phase, following the standard few-shot setting, we randomly sample a small number of normal nodes from the target domain graph G D t e s t as the support set H s (10 normal nodes in all experiments), which is completely derived from the unseen test target domain instead of training domains. The node sampling strategy includes randomly sampling a portion of normal samples as positive query samples, randomly sampling an equal number of anomalous nodes as negative query samples, and the target-domain normal nodes selected as the support set provide normal pattern priors for test-time detection. Specifically, when given all normal sample prompts of the target graph during the test phase, we first use the prompted normal samples as the support set, then treat all remaining nodes as the query set, subsequently compute the anomaly score for each query node through the support set-guided attention mechanism, with all model parameters frozen and no fine-tuning on the target domain, and finally perform anomaly detection based on anomaly score ranking. The entire test-time inference process is formally summarized in Algorithm 1.
Algorithm 1 The inference algorithm of DualGAD
Require: Test dataset D with few-shot normal nodes { v s 1 , , v s n s }
Ensure: Well-trained model weight parameters
  1:
Align features in G via Equation (1)
  2:
Obtain X , E , V from G
  3:
for   l = 1  to K do
  4:
    Z [ l ] Propagate and transform X = X [ 0 ] via Equation (3)
  5:
    R [ l ] Calculate residual of Z [ l ] via Equation (4)
  6:
end for
  7:
H 1 Concatenate [ R [ 1 ] R [ K ] ] via Equation (5)
  8:
f ( i ) Extract structural features via Equation (6)
  9:
H 2 Encode structural features via Equation (10)
10:
H final Fuse H 1 and H 2 via Equation (12)
11:
H q , H s Separate query and support node sets and indexing from H final
12:
H ˜ q Calculate cross attention from H q , H s via Equation (13)
13:
d Computing the L2 distance between H ˜ q and H q via Equation (15)
14:
return  d as the anomaly scores f ( · ) for query nodes

5. Experiments

5.1. Experimental Setup

Datasets. We evaluate our proposed framework using ten real-world datasets spanning several domains: citation networks (PubMed [25], Cora [26], CiteSeer [27], ACM [28]), social networks (Flickr [29], Facebook [30], Weibo [31], Reddit [31], Questions [32]), and co-review networks (Amazon [33]). Among them, Reddit and Questions are specifically adopted for comparative experiments to further validate the generalization of our method. Table 1 summarizes their key statistics. Specifically, the datasets vary widely in node count (from 1081 to 48,921), edge density (sparse citation graphs vs. dense social networks), and feature dimensionality (from 25 to 12,047), ensuring diverse evaluation conditions. They also differ in graph types (undirected, directed, and heterogeneous structures) and attribute types (textual, behavioral, and statistical features), covering comprehensive heterogeneity for cross-domain evaluation. Two datasets (PubMed and Flickr) are used for training, and the remaining eight are used to test cross-domain generalization performance. A brief description of each dataset is as follows:
PubMed [25]: A citation network of diabetes-related medical papers. Nodes represent papers, and edges denote citation relationships. Anomalies are papers with unusual citation patterns that deviate from typical research field norms.
Cora [26]: A citation network of machine learning papers from multiple categories (e.g., neural networks, probabilistic methods). Nodes are papers with bag-of-words features. Anomalies are papers whose citation patterns deviate from their category norms.
CiteSeer [27]: A citation network of computer science papers covering fields like AI, databases, and IR. Nodes are papers with bag-of-words features. Anomalies are papers with abnormal cross-domain citations or unconventional patterns.
ACM [28]: An academic network derived from the ACM Digital Library. Nodes include papers and authors, with edges representing citations or authorship. Anomalies are authors with unusual publication patterns or papers with abnormal citation behaviors.
Flickr [29]: A photo-sharing social network where nodes are users and edges indicate following relationships. User features include tags, group memberships, and text content. Anomalies are spam accounts or users with abnormal behavioral patterns.
Facebook [30]: A social network where nodes are users and edges denote friendships. User features include basic profile information and activity statistics. Anomalies are users who artificially construct connection clusters or have attributes deviating from neighbors.
Weibo [31]: A microblog social network from Tencent Weibo. Nodes are users, edges indicate following relationships, and features include location and bag-of-words. Anomalies are suspicious users who post continuously within time windows.
Reddit [31]: A forum posts network from Reddit. Nodes are users, edges represent interactions. Banned users are labeled as anomalies. Node features are derived from post-textual content transformed into vectors.
Questions [32]: A question-answering network from Yandex Q. Nodes are users, edges indicate Q&A interactions within a one-year timeframe. Anomalies are users with abnormal interaction patterns. Node features are derived from FastText embeddings of user descriptions.
Amazon [33]: A product co-purchase network from Amazon’s musical instruments category. Nodes are products/users, edges represent co-purchases or reviews. Anomalies are fraudulent products or review users identified by helpful vote ratios.
Baseline Methods. We compare our proposed framework against 8 state-of-the-art baseline methods, categorized into three classes: supervised graph anomaly detection (GCN [10], GAT [11], BWGNN [12]), unsupervised graph anomaly detection (DOMINANT [13], CoLA [14], HCM-A [15]), and generalist graph anomaly detection (ARC [16], UNPrompt [17]).
GCN [10]: A classic graph convolutional model that aggregates neighbor features to learn node representations, widely used as a backbone for anomaly detection.
GAT [11]: Introduces attention mechanism into graph convolution to adaptively weight neighbors, enhancing the discriminative capacity of node embeddings.
BWGNN [12]: Uses band-pass filters in spatial and spectral domains to alleviate the right-shift issue and capture both high- and low-frequency graph signals.
DOMINANT [13]: An unsupervised model that reconstructs both topology and attributes, identifying anomalies via reconstruction errors.
CoLA [14]: Performs unsupervised anomaly detection by contrastive learning on local neighborhood information.
HCM-A [15]: Uses hypergraph and multi-view contrastive learning to capture high-order relationships for robust anomaly detection.
ARC [16]: A few-shot generalist GAD model that uses residual graph encoders and cross-attention for cross-domain transfer.
UNPrompt [17]: A zero-shot generalist method that learns universal neighborhood prompts without fine-tuning on target domains.
Evaluation Metrics. We employ AUROC (Area Under the ROC Curve) and AUPRC (Area Under the Precision–Recall Curve) as evaluation metrics. These two metrics are widely used in anomaly detection tasks and can comprehensively evaluate model performance.
Hyper-parameters. We conduct empirical grid search, and all hyper-parameters are chosen from the following fixed candidate sets. Given the cross-domain few-shot setting, where no labeled validation data is available for the target domains, we adopt a cross-domain generalization-oriented tuning strategy: the optimal hyper-parameter combination is selected based on the average AUROC performance across all target test datasets. This setting simulates a practical scenario where hyper-parameters are fixed once and applied to unseen domains. The same set of hyper-parameters is used consistently across all datasets for fair comparison and reproducibility.
  • Attribute-structure fusion weight: { 0.0 , 0.2 , 0.4 , 0.6 , 0.8 , 1.0 } ;
  • Number of propagation hops: { 1 , 2 , 3 , 4 } ;
  • Hidden feature dimension: { 64 , 128 , 256 , 512 , 1024 } ;
  • Learning rate: { 10 5 , 5 × 10 5 , 10 4 , 5 × 10 4 } ;
  • Number of network layers: { 2 , 3 , 4 , 5 } ;
  • Number of prompt nodes: { 5 , 10 , 15 , 20 } ;
  • Dropout rate: { 0.0 , 0.1 , 0.2 , 0.3 } .
Experimental Environment. All experiments are conducted on a server equipped with an NVIDIA RTX 3090 GPU, using Python 3.8, PyTorch 2.1.2, and PyTorch Geometric 2.3.1 along with its dependencies (including torch-sparse, torch-cluster, and torch-spline-conv). For each method, we report the mean and standard deviation over 3 runs with different random seeds.

5.2. Experimental Results

5.2.1. Performance Comparison

Table 2 presents the anomaly detection performance comparison results of DualGAD against various baseline methods across 6 datasets. DualGAD outperforms all baseline methods on the majority of datasets, achieving an average improvement of 3.12% in AUROC metrics compared to ARC, the strongest generalist method. It demonstrates significant performance advantages compared to both traditional supervised and unsupervised methods. Notably, although the model is trained only on two datasets (PubMed and Flickr), it exhibits excellent performance across the remaining six test datasets from different domains, particularly achieving the best performance on academic network datasets. DualGAD achieves the best results on 4 datasets, fully validating the effectiveness and cross-domain adaptation capability of the dual-space representation learning framework.
To further confirm the reliability of these performance improvements with statistical evidence, we conduct a rigorous significance test using the Wilcoxon signed-rank test on six test datasets, comparing DualGAD with the baseline method ARC. As illustrated in Figure 3, the overall distribution of AUROC achieved by DualGAD is superior to that of ARC. The test results are W = 21.0 , p = 0.0156 < 0.05 , indicating that DualGAD is significantly better than ARC (marked with * in the figure). This statistical evidence supports the effectiveness of our proposed method.
Having statistically validated the performance superiority of DualGAD, we next analyze its training dynamics to ensure stable optimization. To validate the convergence of the proposed method, we record the training loss across three independent trials with different random seeds (Figure 4). The training loss consistently decreases across all three trials and exhibits highly consistent trends, demonstrating the stability and reproducibility of the training process. The loss decreases rapidly in the first 30 epochs and gradually slows down, eventually plateauing after approximately 80 epochs, confirming that the optimization process converges. We select 65 epochs as the training duration based on the optimal cross-domain detection performance observed at this point. Further training beyond 65 epochs leads to marginal performance degradation on target domains due to overfitting to source domain distributions, which is a common phenomenon in cross-domain learning. Therefore, an early stopping strategy based on cross-domain validation performance is adopted to achieve the best trade-off between training sufficiency and cross-domain generalization.
In addition to convergence behavior, we also evaluate the model’s robustness when facing varying domain distributions, which is conducted through a Domain Shift Stability Analysis. To evaluate the robustness of the proposed DualGAD under domain shift, we construct three additional source domain combinations (pubmed + question, Flickr + Reddit, questions + Reddit) with consistent hyperparameters and experimental settings. As shown in Table 3, the model maintains remarkable stability across most target domains: AUROC variations for five out of six datasets (Cora, CiteSeer, ACM, Facebook, and Weibo) are within 5 points, with an overall average AUROC ranging from 80.31 to 85.39. This stability benefits from our normalized explicit structural features, which provide cross-domain invariance by capturing relative topological deviations rather than absolute values.
A notable exception is the Amazon dataset: performance drops obviously when training on structurally divergent source domains (e.g., questions + Reddit, AUROC = 54.85). As a dense co-review network with complex multi-type relations, Amazon differs sharply from the sparse and simple structure of questions+Reddit. In contrast, when source domains include Flickr (e.g., Flickr + Reddit, AUROC = 77.49), performance recovers significantly, since Flickr is a dense, high-connectivity social network with structural patterns compatible with Amazon. These results further validate the stability and generalization ability of DualGAD under various domain shift conditions.

5.2.2. Parameter Sensitivity Analysis

In this section, we analyze the impact of two key hyperparameters: fusion weight α and the number of layers. Figure 5 shows the sensitivity analysis results on 6 datasets (Cora, CiteSeer, ACM, Facebook, Weibo, and Amazon), which span citation networks, social networks, and Co-review networks with distinct structural and attribute characteristics. These parameters respectively control the fusion ratio of the dual-encoders and the representation learning depth of the attribute feature encoder, directly affecting the model’s cross-domain generalization capability and detection accuracy.
Fusion Weight  α . The fusion weight α controls the fusion ratio of the dual-encoders. As illustrated in Figure 5a, the optimal α values cluster in the range [0.6, 0.8] for most datasets in terms of AUROC, which strongly validates our “attribute-dominant, structure-complementary” design philosophy, attribute information (e.g., paper keywords in citation networks, user behavioral features) provides core semantic clues for anomaly detection, while structural features supplement with topological constraints to reduce false positives. Consistent trends are observed for AUPRC in Figure 5b, further confirming the fusion strategy’s robustness across metrics. Notably, Facebook achieves peak performance at α = 0.0, which stems from its unique data characteristics: Facebook users’ attribute features (e.g., basic profiles) are relatively homogeneous, making structural patterns (e.g., artificial friend clusters or isolated nodes with abnormal connection densities) more discriminative for identifying fake accounts. This exception highlights the adaptability of our fusion strategy to domain-specific properties.
To verify the superior performance of our proposed fusion mechanism, we compare the proposed fixed-weight fusion mechanism for the attribute and structure spaces in DualGAD with two typical dynamic fusion strategies: attention-based fusion and gated fusion. As shown in Table 4, DualGAD achieves the highest average AUROC of 85.39% and outperforms both dynamic variants on four out of six datasets. The Wilcoxon signed-rank test confirms these improvements are statistically significant ( p < 0.05 ). Due to its excellent performance and lightweight nature that requires no additional training overhead, we adopt this fixed-weight fusion strategy as the default choice in DualGAD.
Number of Layers. The number of layers controls the representation learning depth of the model, and we vary the ’num_layers’ parameter within the range [2, 6]. Figure 5c demonstrates that optimal performance occurs at 2–3 layers across datasets, consistent with over-smoothing observations in GNNs. Shallow architectures can effectively capture local structural correlations while preserving discriminative features for anomaly detection, avoiding excessive feature mixing. In contrast, deeper networks (≥4 layers) show obvious performance degradation, as node embeddings tend to converge and become indistinguishable between normal and anomalous nodes. This finding achieves an optimal balance between detection accuracy and computational efficiency, providing important guiding significance for practical deployment.

5.2.3. Ablation Study

To validate the effectiveness of each component in the DualGAD dual-encoder architecture, we design comprehensive ablation experiments. The experiments include three parts: major component ablation, structural feature ablation, and dimensionality reduction ablation, which respectively validate the contributions of the dual-encoder design, the four explicit structural features, and the dimensionality reduction module for cross-domain alignment.
Major Component Ablation. We design three variants: w/o Attr (removing the attribute feature encoder), w/o Struct (removing the explicit structural feature encoder), and the complete DualGAD framework. As shown in Table 5, removing the attribute feature encoder leads to a significant average AUROC drop of 23.52% (from 85.39% to 61.87%), confirming that attribute information provides core semantic clues for identifying contextual anomalies (e.g., abnormal content in citation papers or deviant user behaviors). In contrast, removing the explicit structural feature encoder results in a moderate average drop of 5.23% (from 85.39% to 80.16%), indicating that structural features play a critical complementary role—they compensate for GNNs’ limitations in capturing global topological patterns and enhance detection of structural anomalies (e.g., artificial connection clusters in social networks). These results fully validate the necessity of the dual-encoder design: attribute and structural information are mutually reinforcing, and their collaborative modeling is key to achieving strong cross-domain generalization. To further verify that the ablation conclusions are not sensitive to the number of few-shot normal samples, we repeated the ablation experiments under different support set sizes (shot = 2, 5, 10, 20). As shown in Table 6 consistently show the same trend across all shot sizes, confirming the robustness of our findings.
Structural Feature Ablation. To evaluate the individual contributions of the four explicit structural features (degree discreteness, triangle density, neighborhood overlap, in–out degree ratio), we conduct ablation experiments by removing each feature individually. The results in Table 7 show that removing any single structural feature leads to consistent performance degradation, demonstrating that all four features make positive and non-redundant contributions. Degree discreteness captures global degree distribution anomalies, triangle density reflects local connection tightness, neighborhood overlap measures neighbor similarity, and in–out degree ratio targets directed graph-specific anomalies (e.g., one-way following in social networks). Specifically, these four features are standard graph-theoretic metrics [34,35], and their complementarity arises from targeting distinct structural dimensions—spanning global vs. local topology and undirected vs. directed properties—which ensures no redundant information overlap. Our ablation results empirically confirm their complementary roles in anomaly detection.
The minimal performance gap between different single-feature ablation variants further validates the good complementarity of the designed structural features, which collectively cover diverse topological characteristics and adapt to different types of graph datasets.
Dimensionality Reduction Ablation. To verify the effectiveness of PCA in cross-domain feature alignment, we compare it with two typical nonlinear dimensionality reduction methods, t-SNE and UMAP. As shown in Table 8, PCA significantly outperforms both t-SNE and UMAP across all datasets. UMAP leads to severe performance degradation since it overfits the source domain distribution and breaks the cross-domain consistency. t-SNE restricts the embedding dimension to 2D, resulting in heavy information loss. In contrast, PCA provides a stable, domain-agnostic projection that well balances efficiency and cross-domain generalization, which is why we choose it as the default dimensionality reduction module. Furthermore, these nonlinear methods introduce extremely high computational and memory overhead during feature alignment, which violates the efficiency and scalability requirements of generalist graph anomaly detection systems. In comparison, PCA is lightweight and computationally efficient, making it well-suited for processing large-scale and diverse graph datasets in our cross-domain setting.
Despite its advantages, we acknowledge that PCA, as a linear method, inherently assumes linear relationships between features and may fail to capture complex, high-order attribute interactions in heterogeneous and high-dimensional graph data. This limitation could restrict its ability to model fine-grained patterns that are critical for anomaly detection, such as non-linear correlations between node attributes and structural roles. While more expressive nonlinear representation techniques (e.g., t-SNE, UMAP) have the potential to capture these complex interactions and improve domain adaptation, our empirical results show that they introduce severe overfitting to the source domain, degrade cross-domain generalization, and significantly increase computational cost. Therefore, exploring lightweight, domain-invariant nonlinear alignment methods remains an important future direction to further enhance the transferability of our framework.

5.3. Complexity Analysis

Theoretical Complexity Analysis. Let N and E denote the number of nodes and edges, d the input feature dimension, h the hidden dimension, k the propagation depth, L the number of MLP layers, n q the number of query nodes, and n s the number of support/prompt nodes. The overall time complexity of our method is dominated by k-hop sparse feature propagation and query-support cross-attention, yielding O ( k E d + N d h + N L h 2 + E + N + n q n s h ) operations. The four explicit structural features can be computed with only O ( E + N ) complexity, and the two-layer structural encoder adds O ( N h ) , which is negligible compared with the main terms. For sparse graphs where E N 2 , our method scales nearly linearly with the number of edges. The space complexity is dominated by storing node representations and the sparse graph structure, resulting in O ( N d + k N d + E + n q n s ) memory usage.
Efficiency Analysis. To assess the runtime efficiency of the proposed DualGAD, we compare the training and inference time on the target datasets with the same training epochs and on the identical training set. As shown in Table 9, DualGAD demonstrates comparable runtime performance with the fastest method ARC, and significantly outperforms the generalist GAD method UNPrompt and the traditional unsupervised method AnomalyDAE, as well as GAT and BWGNN in terms of efficiency. Additionally, although DualGAD incorporates a dual-encoder architecture and an explicit structural feature extraction module, it still maintains extremely high computational efficiency, which is on par with the high-performance method ARC.
Furthermore, the peak GPU memory usage of our model during training and inference is 3315.45 MB, and the peak CPU memory usage is 1291.12 MB. These results sufficiently demonstrate that while achieving improved cross-domain generalization performance, DualGAD does not introduce significant computational and memory overhead, thus exhibiting an extremely low hardware requirement and great potential for practical deployment.

5.4. Visualization

In this section, we conduct two types of visualization experiments to intuitively demonstrate the effectiveness and interpretability of our model. Specifically, we perform t-SNE embedding visualization on the Cora dataset and SHAP-based interpretability analysis on the Weibo dataset. We select these two representative datasets instead of all datasets for the following reasons: on the one hand, visualizing all datasets would lead to repetition and redundancy without providing additional information; on the other hand, these two datasets cover the main characteristics of heterogeneous graph data and sufficiently verify the generalization and interpretability of our model.
To interpret the learned structural embeddings, we perform t-SNE visualization analysis on the Cora dataset, selected from our six test datasets for two key reasons: first, Cora is the most widely used standard benchmark in graph anomaly detection, ensuring high representativeness of the results. Second, it exhibits the largest distribution gap with our training datasets (PubMed and Flickr) in both node attribute semantics and graph topological patterns, which best demonstrates the cross-domain generalization of our framework. As shown in Figure 6, the three embeddings present distinct separation behaviors. The attribute-only embedding achieves only limited separability, with most anomalous nodes scattered among the clusters of normal nodes. The explicit structural embedding alone shows almost no discriminative power, with a silhouette score of only 0.060, which verifies its auxiliary role in our framework. In contrast, the fused embedding from our dual-encoder framework achieves a silhouette score of 0.264, where anomalous nodes form a clear, independent cluster with a sharp boundary from normal nodes. This demonstrates that our explicit structural modeling provides critical complementary information to attribute features, significantly enhancing the model’s ability to distinguish anomalous nodes from normal ones in cross-domain scenarios.
We further conduct a SHAP-based analysis on the Weibo dataset under the final model setting ( α = 0.7 ), where the model is attribute-dominant, and the structural branch serves as a complementary component. Specifically, we selected the representative Weibo dataset for our experiments because the in–out degree ratio is one of our structural features, which is suitable for directed graphs, and Weibo perfectly matches this requirement. Since the model is attribute-dominant and the structural branch serves as a complementary component, the absolute SHAP values of structural features are relatively smaller. As shown in Figure 7, the SHAP beeswarm plot provides a direct explanation of how structural deviations affect anomaly scores. In particular, higher Neighborhood Overlap values tend to produce positive SHAP contributions, and several nodes exhibit notably large positive impacts, indicating that abnormal neighborhood interaction patterns strongly increase anomaly scores. Triangle Density also shows positive contributions for a subset of nodes, suggesting that deviations in local clustering structure provide additional evidence for anomaly detection. In contrast, Degree Dispersion exhibits both positive and negative SHAP values, implying that its effect depends on how a node deviates from the global degree distribution.
The contribution of InOutDegreeRatio is relatively concentrated around zero, showing that direction-related degree imbalance is less influential than neighborhood inconsistency in Weibo. These observations demonstrate that the explicit structural branch offers interpretable evidence of how different structural deviations contribute to anomaly scoring, thereby enhancing the transparency of anomaly detection.

6. Conclusions

The DualGAD method proposed in this paper effectively addresses the limitations of structural modeling in generalist graph anomaly detection. Through dual-encoder architecture design and an “attribute-dominant, structure-complementary” fusion strategy, this method achieves an average improvement of 3.12% compared to the strongest baseline methods on 8 real datasets, demonstrating excellent cross-domain generalization capability. Ablation experiments validate the effectiveness of each component. Despite significant progress, the method’s performance on extremely imbalanced data and large-scale graphs still requires further optimization, as the normal-prior contrastive learning loss is prone to being dominated by majority normal samples for imbalanced data, and explicit structural feature extraction brings non-negligible computational overhead and degraded pattern representativeness for large-scale sparse graphs. Future work will further explore the application potential of this framework on more complex network structures and focus on optimizing the model for imbalanced data and large-scale graph scenarios.

Author Contributions

Conceptualization, S.M.; methodology, S.M.; software, S.M.; validation, S.M.; formal analysis, S.M.; investigation, S.M.; resources, J.L. (Jizhao Liu); data curation, S.M.; writing—original draft preparation, S.M.; writing—review and editing, J.L. (Jizhao Liu), S.Z., F.S. and J.L. (Jun Li); visualization, S.M.; supervision, J.L. (Jizhao Liu); project administration, J.L. (Jizhao Liu); funding acquisition, J.L. (Jizhao Liu). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Key Backbone Teachers Support Program of Zhongyuan University of Technology under Project GG202417, and in part by the Key Research and Development Program of Henan under Grant 251111212000.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All datasets used in this study are publicly available.

Acknowledgments

The authors have reviewed and edited the output and take full responsibility for the content of this publication. The authors sincerely thank the editors and reviewers for their professional insights, which have greatly improved the technical depth and communicative clarity of this article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P.S. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
  2. Qiao, H.; Tong, H.; An, B.; King, I.; Aggarwal, C.; Pang, G. Deep graph anomaly detection: A survey and new perspectives. IEEE Trans. Knowl. Data Eng. 2025; in press.
  3. Bandyopadhyay, S.; N, L.; Vivek, S.V.; Murty, M.N. Outlier resistant unsupervised deep architectures for attributed network embedding. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; ACM: New York, NY, USA, 2020; pp. 25–33. [Google Scholar]
  4. Lin, Y.; Tang, J.; Zi, C.; Zhao, H.V.; Yao, Y.; Li, J. Unigad: Unifying multi-level graph anomaly detection. Adv. Neural Inf. Process. Syst. 2024, 37, 136120–136148. [Google Scholar]
  5. Ma, X.; Wu, J.; Xue, S.; Yang, J.; Zhou, C.; Sheng, Q.Z.; Akoglu, L. A comprehensive survey on graph anomaly detection with deep learning. IEEE Trans. Knowl. Data Eng. 2021, 35, 12012–12038. [Google Scholar] [CrossRef]
  6. Qiao, H.; Wen, Q.; Li, X.; Lim, E.P.; Pang, G. Generative semi-supervised graph anomaly detection. Adv. Neural Inf. Process. Syst. 2024, 37, 4660–4688. [Google Scholar]
  7. Ghani, R.; Senator, T.E.; Bradley, P.; Parekh, R.; He, J. (Eds.) Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; ACM: New York, NY, USA, 2013. [Google Scholar]
  8. Eberle, W.; Holder, L. Anomaly detection in data represented as graphs. Intell. Data Anal. 2007, 11, 663–689. [Google Scholar] [CrossRef]
  9. Henderson, K.; Gallagher, B.; Eliassi-Rad, T.; Tong, H.; Basu, S.; Akoglu, L.; Li, L. Rolx: Structural role extraction & mining in large graphs. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; ACM: New York, NY, USA, 2012; pp. 1231–1239. [Google Scholar]
  10. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  11. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
  12. Zhang, Y.; Pal, S.; Coates, M.; Ustebay, D. Bayesian graph convolutional neural networks for semi-supervised classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Volume 33, pp. 5829–5836. [Google Scholar]
  13. Ding, K.; Li, J.; Bhanushali, R.; Liu, H. Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; SIAM: Philadelphia, PA, USA, 2019; pp. 594–602. [Google Scholar]
  14. Liu, Y.; Li, Z.; Pan, S.; Gong, C.; Zhou, C.; Karypis, G. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2378–2392. [Google Scholar] [CrossRef] [PubMed]
  15. Huang, T.; Pei, Y.; Menkovski, V.; Pechenizkiy, M. Hop-count based self-supervised anomaly detection on attributed networks. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Grenoble, France, 19–23 September 2022; Springer: Cham, Switzerland, 2022; pp. 225–241. [Google Scholar]
  16. Liu, Y.; Li, S.; Zheng, Y.; Chen, Q.; Zhang, C.; Pan, S. Arc: A generalist graph anomaly detector with in-context learning. Adv. Neural Inf. Process. Syst. 2024, 37, 50772–50804. [Google Scholar]
  17. Niu, C.; Qiao, H.; Chen, C.; Chen, L.; Pang, G. Zero-shot generalist graph anomaly detection with unified neighborhood prompts. arXiv 2024, arXiv:2410.14886. [Google Scholar] [CrossRef]
  18. Hassanzadeh, R.; Nayak, R.; Stebila, D. Analyzing the effectiveness of graph metrics for anomaly detection in online social networks. In Proceedings of the International Conference on Web Information Systems Engineering, Paphos, Cyprus, 19–21 November 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 624–630. [Google Scholar]
  19. Bonacich, P. Some unique properties of eigenvector centrality. Soc. Netw. 2007, 29, 555–564. [Google Scholar] [CrossRef]
  20. Fan, H.; Zhang, F.; Li, Z. Anomalydae: Dual autoencoder for anomaly detection on attributed networks. In Proceedings of the ICASSP 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 4–8 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 5685–5689. [Google Scholar]
  21. Wang, J.; Guo, J.; Sun, Y.; Gao, J.; Wang, S.; Yang, Y.; Yin, B. Dgnn: Decoupled graph neural networks with structural consistency between attribute and graph embedding representations. IEEE Trans. Big Data 2024, 11, 1813–1827. [Google Scholar] [CrossRef]
  22. Tian, C.; Zhang, F.; Wang, R. Adversarial regularized attributed network embedding for graph anomaly detection. Pattern Recognit. Lett. 2024, 183, 111–116. [Google Scholar] [CrossRef]
  23. Stewart, G.W. On the early history of the singular value decomposition. SIAM Rev. 1993, 35, 551–566. [Google Scholar] [CrossRef]
  24. Abdi, H.; Williams, L.J. Principal component analysis. Wiley Interdiscip. Rev. Comput. Stat. 2010, 2, 433–459. [Google Scholar] [CrossRef]
  25. Zhang, Z.; Hensley, C.; Chen, Z. Improving node classification with neural tangent kernel: A graph neural network approach. In Proceedings of the International Conference on Machine Learning, Pattern Recognition and Automation Engineering, Singapore, 7–9 August 2024; ACM: New York, NY, USA, 2024; pp. 93–97. [Google Scholar]
  26. Tang, L.; Liu, H. Relational learning via latent social dimensions. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Paris, France, 28 June–1 July 2009; ACM: New York, NY, USA, 2009; pp. 817–826. [Google Scholar]
  27. Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
  28. Yuan, X.; Zhou, N.; Yu, S.; Huang, H.; Chen, Z.; Xia, F. Higher-order structure based anomaly detection on attributed networks. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2691–2700. [Google Scholar]
  29. Zhang, H.; Zhou, Y.; Xu, H.; Shi, J.; Lin, X.; Gao, Y. Graph neural network approach with spatial structure to anomaly detection of network data. J. Big Data 2025, 12, 105. [Google Scholar] [CrossRef]
  30. Xu, Z.; Huang, X.; Zhao, Y.; Dong, Y.; Li, J. Contrastive attributed network anomaly detection with data augmentation. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Chengdu, China, 16–19 May 2022; Springer: Cham, Switzerland, 2022; pp. 444–457. [Google Scholar]
  31. Kumar, S.; Zhang, X.; Leskovec, J. Predicting dynamic embedding trajectory in temporal interaction networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; ACM: New York, NY, USA, 2019; pp. 1269–1278. [Google Scholar]
  32. Platonov, O.; Kuznedelev, D.; Diskin, M.; Babenko, A.; Prokhorenkova, L. A critical look at the evaluation of GNNs under heterophily: Are we really making progress? arXiv 2023, arXiv:2302.11640. [Google Scholar]
  33. Dou, Y.; Liu, Z.; Sun, L.; Deng, Y.; Peng, H.; Yu, P.S. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Galway, Ireland, 19–23 October 2020; ACM: New York, NY, USA, 2020; pp. 315–324. [Google Scholar]
  34. Durak, N.; Pinar, A.; Kolda, T.G.; Seshadhri, C. Degree relations of triangles in real-world networks and graph models. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management, Maui, HI, USA, 29 October–2 November 2012; ACM: New York, NY, USA, 2012; pp. 1712–1716. [Google Scholar]
  35. Meghanathan, N. A greedy algorithm for neighborhood overlap-based community detection. Algorithms 2016, 9, 8. [Google Scholar] [CrossRef]
Figure 1. Illustration of two types of anomalies in attributed networks. (1) Structural anomalies: abnormal connection patterns but normal attributes; (2) contextual anomalies: abnormal attributes but normal connection patterns.
Figure 1. Illustration of two types of anomalies in attributed networks. (1) Structural anomalies: abnormal connection patterns but normal attributes; (2) contextual anomalies: abnormal attributes but normal connection patterns.
Information 17 00416 g001
Figure 2. Overall architecture of the DualGAD framework, comprising four core modules: cross-domain feature alignment, attribute feature propagation space encoder, structural topology space encoder, and few-shot graph attention detection.
Figure 2. Overall architecture of the DualGAD framework, comprising four core modules: cross-domain feature alignment, attribute feature propagation space encoder, structural topology space encoder, and few-shot graph attention detection.
Information 17 00416 g002
Figure 3. Boxplot comparison of AUROC performance between ARC and DualGAD (Ours) across six test datasets. The asterisk (*) indicates statistical significance (Wilcoxon signed-rank test, p < 0.05).
Figure 3. Boxplot comparison of AUROC performance between ARC and DualGAD (Ours) across six test datasets. The asterisk (*) indicates statistical significance (Wilcoxon signed-rank test, p < 0.05).
Information 17 00416 g003
Figure 4. Training loss convergence. Average training loss versus epoch for three independent trials with different random seeds.
Figure 4. Training loss convergence. Average training loss versus epoch for three independent trials with different random seeds.
Information 17 00416 g004
Figure 5. Impact of key parameters on model performance. (a) Impact of fusion weight α on AUROC. (b) Impact of fusion weight α on AUPRC. (c) Impact of network depth on AUROC performance.
Figure 5. Impact of key parameters on model performance. (a) Impact of fusion weight α on AUROC. (b) Impact of fusion weight α on AUPRC. (c) Impact of network depth on AUROC performance.
Information 17 00416 g005
Figure 6. t-SNE visualization of node embeddings on the Cora dataset. From left to right: attribute-only embedding (silhouette score: 0.173), where anomalous nodes are scattered among normal nodes; structural-only embedding (silhouette score: 0.060), with weak discriminative power; and our dual-encoder fused embedding (silhouette score: 0.264), where anomalous nodes form a clear and distinct cluster.
Figure 6. t-SNE visualization of node embeddings on the Cora dataset. From left to right: attribute-only embedding (silhouette score: 0.173), where anomalous nodes are scattered among normal nodes; structural-only embedding (silhouette score: 0.060), with weak discriminative power; and our dual-encoder fused embedding (silhouette score: 0.264), where anomalous nodes form a clear and distinct cluster.
Information 17 00416 g006
Figure 7. SHAP beeswarm plot of explicit structural features on the Weibo dataset. Each point represents a node, and the horizontal axis indicates the SHAP value, i.e., the contribution of a feature to the anomaly score. Positive SHAP values indicate increasing anomaly scores. Colors represent feature values, where red and blue denote high and low values, respectively.
Figure 7. SHAP beeswarm plot of explicit structural features on the Weibo dataset. Each point represents a node, and the horizontal axis indicates the SHAP value, i.e., the contribution of a feature to the anomaly score. Positive SHAP values indicate increasing anomaly scores. Colors represent feature values, where red and blue denote high and low values, respectively.
Information 17 00416 g007
Table 1. Key statistics of the real-world GAD datasets.
Table 1. Key statistics of the real-world GAD datasets.
DatasetTypeNodesEdgesFeaturesAnomalies (Rate)
PubMedCitation Networks19,71744,338500600 (3.04%)
FlickrSocial Networks7575239,73812,047450 (5.94%)
CoraCitation Networks270854291433150 (5.53%)
CiteSeerCitation Networks332747323703150 (4.50%)
ACMCitation Networks16,48471,9808337597 (3.62%)
FacebookSocial Networks108155,10457625 (2.31%)
WeiboSocial Networks8405407,963400868 (10.30%)
RedditSocial Networks10,984168,01664366 (3.33%)
QuestionsSocial Networks48,921153,5403011460 (2.98%)
AmazonCo-review10,244175,60825693 (6.76%)
Table 2. Anomaly detection performance comparison in terms of AUROC and AUPRC (percentages). Best results are highlighted in bold.
Table 2. Anomaly detection performance comparison in terms of AUROC and AUPRC (percentages). Best results are highlighted in bold.
MetricMethodCoraCiteSeerACMAmazonFacebookWeibo
AUROCSupervised Methods
GCN59.64 ± 8.3060.27 ± 8.1160.49 ± 9.6546.63 ± 3.4729.51 ± 4.8676.64 ± 17.69
GAT50.06 ± 2.6551.59 ± 3.4948.79 ± 2.7350.52 ± 17.2251.88 ± 2.1653.06 ± 7.48
BWGNN54.06 ± 3.2752.61 ± 2.8867.59 ± 0.7055.26 ± 16.9545.84 ± 4.9753.38 ± 1.61
Unsupervised Methods
DOMINANT72.23 ± 0.3474.69 ± 0.3274.34 ± 0.1259.06 ± 2.8049.92 ± 0.5592.21 ± 0.10
CoLA67.62 ± 4.2670.75 ± 3.4269.11 ± 0.6752.51 ± 6.6664.70 ± 18.8631.55 ± 6.02
HCM-A56.45 ± 4.9355.54 ± 4.0757.69 ± 3.5942.20 ± 0.5536.57 ± 10.7271.89 ± 2.79
General Methods
UNPrompt64.98 ± 0.3571.78 ± 1.0974.00 ± 0.1579.35 ± 1.2780.92 ± 0.8588.68 ± 1.35
ARC87.32 ± 0.7990.74 ± 0.5379.98 ± 0.2478.98 ± 2.4367.48 ± 0.3789.13 ± 0.42
DualGAD (Ours)93.72 ± 0.8295.13 ± 0.0983.36 ± 0.5682.04 ± 1.2868.50 ± 1.0789.63 ± 0.26
AUPRCSupervised Methods
GCN7.41 ± 1.556.40 ± 1.405.27 ± 1.126.96 ± 2.041.59 ± 0.1167.21 ± 15.20
GAT6.49 ± 0.845.58 ± 0.624.70 ± 0.7515.74 ± 17.853.14 ± 0.3733.34 ± 9.80
BWGNN7.25 ± 0.806.35 ± 0.737.14 ± 0.2013.12 ± 11.822.54 ± 0.6312.13 ± 0.71
Unsupervised Methods
DOMINANT21.35 ± 0.7423.02 ± 1.5522.74 ± 0.957.48 ± 0.463.56 ± 0.1577.69 ± 1.43
CoLA13.91 ± 5.5619.51 ± 3.738.48 ± 0.517.27 ± 1.1315.19 ± 11.048.03 ± 1.19
HCM-A6.41 ± 1.334.76 ± 0.514.41 ± 0.635.64 ± 0.092.23 ± 0.7627.20 ± 5.53
General Methods
UNPrompt12.00 ± 1.3219.38 ± 2.0120.50 ± 0.3718.92 ± 0.7617.85 ± 1.5260.21 ± 1.37
ARC50.28 ± 1.2346.35 ± 0.8140.86 ± 1.1428.10 ± 4.956.34 ± 0.2865.45 ± 1.01
DualGAD (Ours)59.12 ± 4.1356.45 ± 1.7640.28 ± 1.2926.61 ± 2.166.60 ± 1.0366.38 ± 0.74
Table 3. Performance comparison under different training source domain settings.
Table 3. Performance comparison under different training source domain settings.
Training SourceCoraCiteSeerACMAmazonFacebookWeiboAvg
(Pubmed + Flickr)93.7295.1383.3682.0468.5089.6385.39
(Pubmed + Question)95.5795.5679.3962.3273.1989.2182.54
(Flickr + Reddit)85.3490.7485.6577.4962.0489.1281.73
(Questions + Reddit)93.8194.9878.5954.8570.5789.0580.31
Table 4. Ablation study on fusion strategies (AUROC ± std).
Table 4. Ablation study on fusion strategies (AUROC ± std).
Fusion MethodCoraCiteSeerACMAmazonFacebookWeibo
Attention Fusion 82.97 ± 0.93 85.12 ± 2.19 80.27 ± 0.87 85.59 ± 0.09 69.82 ± 1.79 72.84 ± 3.07
Gated Fusion 87.26 ± 1.12 89.47 ± 0.67 74.84 ± 0.51 71.28 ± 0.65 72.59 ± 1.11 88.53 ± 0.53
DualGAD 93.72 ± 0.82 95.13 ± 0.09 83.36 ± 0.56 82.04 ± 1.28 68.50 ± 1.07 89.63 ± 0.26
Table 5. Ablation study results for main components (AUROC).
Table 5. Ablation study results for main components (AUROC).
VariantCoraCiteSeerACMFacebookWeiboAmazonAvg
w/o Attr74.1575.5663.8248.9343.3365.4161.87
w/o Struct88.3489.7678.9163.1284.2276.5880.16
DualGAD93.7295.1383.3668.5089.6382.0485.39
Table 6. Ablation results under different support set sizes (shot). AUROC (%).
Table 6. Ablation results under different support set sizes (shot). AUROC (%).
DatasetShot = 2Shot = 5Shot = 10Shot = 20
Oursw/o Attrw/o StructOursw/o Attrw/o StructOursw/o Attrw/o StructOursw/o Attrw/o Struct
Cora93.0374.2386.9793.2974.2287.0093.7274.4085.2693.2274.2586.97
CiteSeer93.3475.1991.8393.3775.1891.9095.1375.1691.3093.3675.1491.95
ACM76.8872.7279.8476.8872.7479.9083.3672.4978.9876.8972.7479.92
Facebook70.8674.6070.6470.4974.6070.5968.5074.4468.3770.3174.6370.49
Weibo88.4642.6087.9588.6142.5288.1989.6343.3388.9288.6242.6188.32
Amazon80.8755.1663.7680.9855.1563.6682.0455.1767.2180.3155.1563.97
Table 7. Ablation study results for structural features (AUROC).
Table 7. Ablation study results for structural features (AUROC).
VariantCoraCiteSeerACMFacebookWeiboAmazonAvg
w/o Degree93.2194.5882.8767.9289.1581.5284.87
w/o Triangle93.4594.8983.0168.1189.3281.7885.09
w/o Overlap93.1894.6782.9468.2389.2881.6984.99
w/o InOut93.7295.1380.7868.5089.4181.7584.88
DualGAD93.7295.1383.3668.5089.6382.0485.39
Table 8. Ablation study on different dimensionality reduction methods (AUROC).
Table 8. Ablation study on different dimensionality reduction methods (AUROC).
MethodCoraCiteSeerACMFacebookWeiboAmazonAvg
PCA (Ours)93.7295.1383.3668.5089.6382.0485.39
t-SNE81.0387.0659.1468.7357.9670.4770.73
UMAP63.4264.5556.3459.6552.3664.5260.14
Table 9. Training and inference times (s) of different methods.
Table 9. Training and inference times (s) of different methods.
MethodsAnomalyDAEGATBWGNNUNPromptARCDualGAD
Training Time86.042.434.863.860.350.39
Inference Time264.29300.90330.99105.170.150.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, J.; Mao, S.; Zhang, S.; Shan, F.; Li, J. DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture. Information 2026, 17, 416. https://doi.org/10.3390/info17050416

AMA Style

Liu J, Mao S, Zhang S, Shan F, Li J. DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture. Information. 2026; 17(5):416. https://doi.org/10.3390/info17050416

Chicago/Turabian Style

Liu, Jizhao, Shuo Mao, Shuqin Zhang, Fangfang Shan, and Jun Li. 2026. "DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture" Information 17, no. 5: 416. https://doi.org/10.3390/info17050416

APA Style

Liu, J., Mao, S., Zhang, S., Shan, F., & Li, J. (2026). DualGAD: A Generalist Graph Anomaly Detection Method via Dual-Encoder Architecture. Information, 17(5), 416. https://doi.org/10.3390/info17050416

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop