AddAG-AE: Anomaly Detection in Dynamic Attributed Graph Based on Graph Attention Network and LSTM Autoencoder

: Recently, anomaly detection in dynamic networks has received increased attention due to massive network-structured data arising in many ﬁelds, such as network security, intelligent transportation systems, and computational biology. However, many existing methods in this area fail to fully leverage all available information from dynamic networks. Additionally, most of these methods are supervised or semi-supervised algorithms that require labeled data, which may not always be feasible in real-world scenarios. In this paper, we propose AddAG-AE, a general dynamic graph anomaly-detection framework that can fuse node attributes and spatiotemporal information to detect anomalies in an unsupervised manner. The framework consists of two main components. The ﬁrst component is a feature extractor composed of a dual autoencoder, which captures a joint representation of both the network structure and node attributes in a latent space. The second component is an anomaly detector that combines a Long Short-Term Memory AutoEncoder (LSTM-AE) and a predictor, effectively identifying abnormal snapshots among most normal graph snapshots. Compared with baselines, experimental results show that the method proposed has broad applicability and higher robustness on three datasets with different sparsity.


Introduction
With the advancement of science and technology, numerous domains, such as network security, intelligent transportation systems, social media, and computational biology [1][2][3][4] are producing large amounts of network-structured data composed of many interdependent objects and time-varying components.However, in these data, there are often anomalies (some unusual patterns or behaviors that significantly deviate from most of the data) that are typically associated with network attacks in network security and network fraud in social networks.Therefore, effectively detecting anomalies in network-structured data is of great significance for mitigating potential risks, monitoring system status, and ensuring system security.
Due to the high complexity of these data, integrating information fully and reasonably from various dimensions and effectively identifying anomalies has become a significant challenge for anomaly detection.To obtain as much information as possible from the network, attributed graphs are usually adopted in contemporary research [5].However, with the graphs' structural and attributed features contained, graph anomaly detection (GAD) in attributed networks raises a more complex problem in non-Euclidean space.The authors in [6][7][8] detected anomalous nodes in static attributed graphs and achieved a better performance than their baselines.However, those methods, based on static attributed graphs, usually ignore the dynamic evolution of graph structures and node attributes.There are also some papers focusing on investigating anomaly detection in dynamic graphs.Some literature [9,10] has explored traditional outlier detection algorithms such as Robust Random Cut Forest (RRCF) [11], IsolationForest [12], etc., while others explore anomaly detection in dynamic attributed graphs using deep-learning approaches [13].Graph Convolutional Network (GCN) [14], Graph Attention Network (GAT) [15], GraphSage [16] and AutoEncoder (AE) [17] are representative methods in dealing with graph data.However, the above methods neglect the impact of the temporal information of graph data.Long Short-Term Memory (LSTM) [18], Gate Recurrent Unit (GRU) [19] and a forecasting-based model are well-designed for dealing with temporal information.However, it neglects both the attribute information of the nodes and the structural information of the graph.To leverage all available information in the network effectively, attempts have been made [20][21][22] to combine both types of techniques in anomaly detection for specific application domains.These efforts have demonstrated the feasibility and efficiency of detecting anomalies in graph-structured data.
Among the abovementioned methods, it is evident that each of them offers a novel algorithmic solution with high performance.However, some of these methods have not fully integrated all the features of network data, including structural, attribute, and temporal factors in dynamic networks.Additionally, certain methods rely on labeled data, which is sometimes impractical in real-world scenarios.Another significant limitation of existing works is that these methods apply to specific types of network datasets only.For instance, one method may demonstrate better performance on networks with fewer nodes and dense connections between them but experience a severe decline when applied to sparse networks.Correspondingly, a method that excels in sparse networks may exhibit worse performance when dealing with dense networks.
To alleviate the aforementioned problems, we proposed a novel framework AddAG-AE for anomaly detection in dynamic attributed networks.AddAG-AE can combine network structural, node attribute, and temporal information reasonably.The framework consists of two main components, namely the feature extractor and anomaly detector.The feature extractor includes a structure autoencoder and an attribute autoencoder, which reconstruct the original node attributes and network adjacency matrix to obtain a joint representation vector in the latent space.The anomaly detector, composed of an LSTM-AE and a forecasting-based model, detects anomalies in data with node attribute vector, adjacency matrix, and the represented vector as input.The main contributions of this work are summarized as follows:

•
We propose an anomaly-detection framework, named AddAG-AE, that effectively integrates spatiotemporal, structural, and attribute information to achieve higher accuracy in anomaly detection in a self-supervised manner.It addresses the issue of integrating different types of information and improves applicability to different graph structures (sparsity levels).

•
In the graph embedding phase, we design and implement a new encoding-decoding mechanism based on GAT, which makes full use of the horizontal and vertical dimensions of the latent matrix, mining the potential associations between different types of information and enhancing the effectiveness of node representations.

•
In the anomaly-detection phase, a joint optimization objective is introduced to effectively integrate the reconstruction model and the prediction model.The reconstruction loss in this joint optimization objective takes into account latent vector reconstruction, graph structure reconstruction, and attribute reconstruction, effectively improving the model's robustness and performance in anomaly detection.
We conduct extensive experiments on three different kinds of datasets, including dense-connected graphs, sparse-connected graphs, and graphs with frequent changes in connections or weights.The result shows the broader applicability and superior performance of AddAG-AE for anomaly detection in different kinds of graphs.

Related Work
In this section, we first review the existing methods for anomaly detection, including some typical frameworks and deep-learning-based ones, and then summarize some approaches relevant to our work.

Anomaly Detection in Dynamic Networks
In anomaly detection for dynamic networks, in order to represent the real-world dynamic network with evolving relationships between real objects and their attributes, we usually need to model them as dynamic graphs.The benefits and limitations of the existing work are shown in Table 1.Typical methods, such as Steamspot [10], Spotlight [9], and Snapsketch [23] detect anomalies by mapping the graphs modeled by real network data to graph vectors in a sketching space through a special sketching method, and then classifying them using typical one-class classification algorithms.These works take full advantage of network structure features but cannot maintain and process time-varying components.
Recently, an increasing number of deep-learning-based methods have been proposed to tackle this issue.The authors in [24] proposed a novel anomaly-detection framework, named GmapAD, which fully explores the structural and attribute information within and between graphs.The framework utilizes representative nodes in the graph to map it to a new feature space and employs traditional machine-learning classifiers for anomaly detection.Although the aforementioned method extensively exploits structural and attribute information at the graph level, it overlooks the long and short-term patterns of nodes.In order to reasonably combine all possible information of dynamic graph data, the authors in [22] proposed a semi-supervised model for anomalous edge detection, named AddGraph, which employs an attention-based GRU to capture hidden information for long-term patterns and a temporal GCN to deal with graph structural information for shortterm patterns of the nodes as input of GRU at current timestamp.However, this method is deployed by assuming that the training data are ideal data without any anomalous edges contained at the initial timestamps.The authors in [25] proposed an unsupervised model DeepSphere, which first incorporates hypersphere learning into LSTM-AE, and can overcome the problem that may seriously degrade the quality of neural network during training by learning the boundary between normal and abnormal data.In practice, this method performs well for relatively stable dense graphs but loses the ability to detect anomalies when applied to unstable sparse graphs.Additionally, it does not consider graph structure.

Benefits Limitations
Steamspot [10] Broad adaptability to graph types, fast processing speed.Easily affected by noise, disregard the contextual relationships Spotlight [9] Simple, fast and scalable Only suitable for specific type graphs Snapsketch [23] Strong feature representation, real-time anomaly detection Only suitable for specific type datasets, susceptible to experience GmapAD [24] Full structural and attribute information Vulnerable to data size, high computational complexity AddGraph [22] A few labeled dataset, spatiotemporal information Highly sensitive to noise DeepSphere [25] No manual labeling required Not suitable for sparse graphs Factorization [26][27][28] High processing efficiency, capable for large-scale graph Sensitive to network topology structures Random Walk [29,30] Suitable for a few samples and missing data Not suitable for large graphs or highly heterogeneous graphs Deep Learning [31] Robustness on sparse networks High computational complexity, easily affected by hyperparameters

Graph Embedding Techniques and Deep Autoencoder
Graph Embedding aims at obtaining the embedding vector of a node or graph in latent space, which can sufficiently preserve valuable information about graph structural data.Corresponding techniques [32] have been widely exploited.Most can be grouped into three kinds, namely factorization-based methods, such as (i) Locally Linear Embedding (LLE) [26], Laplacian Eigenmaps [27] and Graph Factorization (GF) [28]; (ii) random walkbased methods, such as DeepWalk [29] and Node2vec [30]; and (iii) deep-learning-based methods, such as Structural Deep Network Embedding (SDNE) [31], GCN, GAT and others.
Factorization-based methods and random walk-based methods focus on the preservation of structural similarity only.Deep-learning-based methods, especially Graph Neural Network (GNN)-related techniques with their strong capability to fuse structural and content features, have attracted extensive interest recently.GCN generalizes the idea of a convolution model to non-Euclidean space so that it can process graph structural features and node content features in general graphs but not be limited to regular ones by aggregating the representations of node-selves and their one-step neighbors.Building on this idea, GAT improves the aggregating manner using a self-attention layer to learn related weights for different neighbors, using neighbor information more reasonably than GCN.Additionally, it can further enhance the capability of capturing information by applying multi-head attention.
General GNN methods are well-designed for combining all possible features of graph data, but, in dynamic graphs, temporal features reflecting the evolution process of graph structure have not been considered.Naturally, an architecture that can capture temporal information needs to be added to dynamic graph anomaly detection.In time-series anomaly detection, Recurrent Neural Networks (RNNs) [33] designed for capturing temporal information have been widely used.In order to overcome the problem of gradient vanishing and the exploding of standard RNN, the authors in [18] proposed LSTM, which successfully solved the long-term dependency problem.Combing the idea of deep autoencoding, LSTM-AE has become a reasonable choice to capture temporal information in dynamic graphs and achieve anomalous detection by reconstructing original input data.Additionally, inspired by the work of the authors in [34], we introduce a forecasting-based model detecting anomalies by predicting errors of the next timestamp as a complementary method to improve the overall performance of our model under the joint optimization strategy.

Problem Definition
In this section, we introduce notations and definitions for our framework.The notations used in this article are listed in Table 2.

G Dynamic attributed graph stream ν t
A set of nodes(t denotes timestamp t) The number of nodes M The number of edges d The dimension of the nodes' attribution Reconstructed attribute matrix after GAT decoding Anomalous score of node i

Definition 1: Dynamic Attributed Graph
A dynamic attributed graph stream can be defined as G ={ν t , ε t , χ t } T t=1 .Next, we set G(t) = (V t , A t , X t ) as a snapshot in G, where t is the timestamp, V t = {V t i } N i=1 is the node set, V t i is the i-th node, and each row of the attribute matrix X t represents a node's attribute vector.A t is an adjacency matrix.A t i,j = ω, where ω is the weight of edge.We adopt unweighted graphs in this work, so A t i,j ∈ {0, 1}.A t i,j = 1, when i = j, because GAT-AE considers node-self's connection in Equations ( 1)-(4).

Definition 2: Anomaly Detection
Given a graph steam {G(t)}, our goal is to detect anomalous snapshots (i.e., anomalous graphs) where the state of the whole system at the current timestamp differs from most other timestamps.For each node i at any timestamp t, we have learned a function f (V i ) that reflects the node's anomaly probability.Additionally, this function can serve as a score for node anomaly.If the average anomaly score of all nodes in the graph corresponding to a specific timestamp t is greater than a certain threshold, the graph is defined as an anomalous graph.

Proposed Method
In this section, we elaborate on the proposed AddAG-AE framework, the overview of which is illustrated in Figure 1a.AddAG-AE contains two parts, namely graph embedding and anomaly detection.The former can be viewed as a feature extractor aiming to learn node-embedding vectors fusing graph structural and node attribute information.Meanwhile, the latter is an anomaly detector aiming to detect anomalies with temporal information considered.We make corresponding improvements in two phases based on the original algorithm.

Graph Embedding Based on GAT-AE
The details of GAT-AE are shown in Figure 1b.In a dynamic attributed graph stream, each snapshot contains graph structural and node attribute feature.Therefore, we aggregate the information of a node with its neighbors in the current graph at timestamp t to obtain its latent representation Z t i as follows, where X t i ∈ R 1×d is the node i's attribute feature, W ∈ R d×d is the weight matrix (d denotes the embedding dimension of GAT layer), whose parameters are learned by the encoder and shared among all nodes at all timestamps, N t (i) = j|A t ij > 0 is the set of the node i's neighbors, and the attention score β i,j can be computed as where o ∈ R 2d is a vector of learned parameters for the attention mechanism, ⊕ denotes represents the concatenation of two vectors and Tr denotes the transpose of a matrix.The attention score is computed by the LeakyReLU function and normalized by the SoftMax function.After obtaining representation Z t i of all nodes, we decode them to achieve the reconstruction of the original graph G(t) by structure and attribute decoder.For graph structural decoding, we take latent representation as the structural decoder's input and calculate the inner product between them to get the reconstructed adjacency matrix where Z t ∈ R N×d is a matrix that consists of all nodes of representation Z t 1 , Z t 2 , ..., Z t N at time step t. (Z t ) Tr denotes the transpose of Z t .Ât ∈ R N×N denotes the reconstructed adjacency matrix.For node attribute decoding, we improve the idiomatic decoding manner [35] by employing two fully connected layers in two different directions, which take full advantage of more information from the node's latent representation, reconstruct the attribute vector as follows where W 1 ∈ R N×N and B 1 ∈ R d ×N denote weights and bias in one dimension, and W 2 ∈ R d ×d and B 2 ∈ R N×d denote the weights and bias in the other dimension.These parameters are learned by the decoder and shared across all timestamps.Xt is the reconstructed attribute matrix at timestamp t.As shown above, reconstruction models, which include two parts (i.e., graph structural and node attribute reconstruction), are introduced in our model.Correspondingly, the loss of two parts ought to be taken into consideration.Therefore, the loss function of reconstruction errors can be formulated as follows: where α is a tradeoff parameter that balances the graph structure and node attribute reconstruction errors.L S is a reconstruction loss derived from a kind of maximum likelihood estimation, which can be written as where N is the number of nodes, A t ij is the real element value of adjacency matrix and Ât ij is predictive value from Equation (4).For the node attribute reconstruction, we take the mean square error between the original node attributes and reconstructed vector as the loss function as follows: where X t i and Xt i are node i's attribute vector and corresponding reconstructed vector respectively.After training using the aforementioned method, we only retain the GAT encoding part (i.e., Equations ( 1)-(3) corresponding to the encoding layer).Next, we describe the graph embedding based on GAT using an example.For a given graph at a specific time step, we have the node attribute matrix X ∈ R N×d and the adjacency matrix A ∈ R N×N (the adjacency matrix is represented in the form of N t as in Equation ( 1)).We input these matrices into the GAT layer, and after applying Equation (1), we obtain the embedding matrix Z ∈ R N×d .

Anomaly Detection
After the above process of the GAT-AE, we obtain embedding vectors of all nodes in all graphs, which represent the structure and attribute features in latent space.However, until now, the time-dependency of the same node at different time steps has not been taken into account for anomaly detection.Furthermore, a node with a high anomalous score in Section 4.1 only represents the degree of deviation from other normal nodes in the same graph (the same timestamp), but not the degree of deviation from the historical feature of nodes, i.e., we are more likely to score nodes by comparing the changes in the state of node i at current timestamp t with this node historical state at timestamp t − 1, t − 2, . . . .Thus, considering temporal information in anomaly detection, the detector includes two parts, one of which is a reconstruction-based model aiming at capturing the data distribution of the entire graph stream and the other of which is a forecasting-based model aimed at predicting the value at the next timestamp.During the process, the parameters from both models are updated simultaneously.Finally, the loss function can be defined as follows: L 2 = L rec + L f or (10) where L rec and L f or denote the reconstruction loss and forecasting loss, respectively.

Reconstruction-Based Model
We adopt an LSTM-AE as a reconstruction model to reconstruct the embedding vector of the GAT-AE to find anomalous nodes in the data.It is noted that we reshape an output tensor of the GAT-AE so that the embedding vector of the same node i at different timestamp t can form a time sequence Z 1  i , Z 2 i , ..., Z T i (named X i ) into the LSTM-AE.In the process of encoding, the encoder transforms X i into a hidden representation with a regular LSTM model, as follows: where h t en ∈ R 1×d denotes a hidden vector of node i after processing of the LSTM encoder (d denotes the embedding dimension of LSTM layer).In the process of decoding, we take the last hidden vector h T en as the decoder's initial parameter to reconstruct sequence X i , where Ẑt−1 i is the reconstruction vector at timestamp t − 1.From Equation ( 12), we use the output of timestamp t as the input of timestamp t − 1 in the decoding phase.Finally, the training loss function of our model is computed as follows: where Z t i and Ẑt i are feature vector and reconstructed vector, respectively.N is the number of nodes.The number of layers in the LSTM can be set to different values.We maintain the same number of layers for both the encoding LSTM and decoding LSTM.Based on the comprehensive performance considerations, the number of layers in the LSTM encoder-decoder is set as two in this paper (relevant experiments refer to Section 5.4.3).

Forecasting-Based Model
First, we stack fully connected layers with hidden vector h t en of LSTM-AE as input and use ReLU as the nonlinear activation function, to extract features from hidden vectors that contain all spatio-temporal information for the node Then, we transform Zt 1 , Zt 2 , ..., Zt N at time step t into matrix Zt and then use Equations ( 4)-( 6) to decode Zt to predict the next timestamp's original attribute vector and adjacency matrix.The loss function of the predictor can be formulated as follows: where X t i and Xt i are the original attribute and predicting vector of node i using our predictor, respectively.A t i,j and Ât i,j denote the real existing edge and prediction probability value of the edge, respectively.N is the number of nodes.t ranges from 1 to T, denoting the timestamp.Finally, we adopt a node's reconstruct error of the LSTM and forecasting error as its anomalous score, which can be formulated as follows: where S t (V i ) is the anomalous score of the node i at timestamp t.It demonstrates that AddAG-AE can detect anomalous nodes in dynamic graphs.However, in practical applications, when working with a graph stream, the focus often shifts to assessing whether the whole system state at a specific timestamp (i.e., the statistical properties of a graph) is anomalous or not.To address this, we can evaluate the mean value of the anomalous scores, denoted as S(t) = ∑ N i S t (V i )/N, or consider the number of nodes whose anomalous score exceeds a particular threshold, denoted as S(t) = i|S t (V i ) > λ , as the evaluation metric for determining whether the current network is anomalous.We adopt the first one in this article.

Datasets
In this paper, we evaluate AddAG-AE on three commonly used real-world datasets, including Enron Mail (graph whose connections frequently changing, Table 3), NYC Taxi Trips (dense graph, Table 4) and IDS 2017 (sparse graph, Table 5).After processing and calculation, the statistics of nodes (#v) and edges (# ) of different datasets are shown in Table 6, where the symbol "#" represents abbreviation of "number".We focused on the data since 1999 and select the top 147 individuals with the highest number of senders or recipients as nodes.The dataset was segmented into graph streams in days.Since there are no direct anomaly data, the charts were marked as anomalies if they related to a major scandal event.

Evaluation Metrics
To verify the performance of the model more comprehensively, this paper adopts the AUC score, precision, recall, and loss as evaluation metrics.Among them, the AUC score is the main evaluation metric, which has been widely used in many abnormal detection methods in the past.Specifically, the AUC score is the area under the ROC (receiver operating characteristic) curve, which implies the sorting quality of the sample forecasted by the model.ROC is the spot of the true positive rate against the false positive rate.We treat the anomalous graphs we labeled as a positive sample in statistics and sort all samples by anomalous score provided by the model.Therefore, a higher AUC score reflects the better anomaly-detection performance of the model.In addition, in order to confirm the effectiveness of the model, we added two supplementary metrics, namely recall and precision.

Experimental Section Setup
In the experiment, we implement AddAG-AE using the Adam optimizer with a learning rate of 0.001 and weight decay of 0.001 on three commonly used datasets.The dimensions of GAT-AE embedding are 2d , d , and 1.5d for NYC Taxi Trips, IDS 2017, and Enron Mail dataset, respectively, where d is its own node-embedding vector's dimension.α is set to 0.1, 0.5, and 0.3 for NYC Taxi Trips, IDS 2017, and Enron Mail respectively.The number of both GAT layers and LSTM layers is set to two for all datasets.

Experimental Section Results
We compare AddAG-AE with the following six anomaly-detection methods.

Performance Evaluation
The anomaly-detection performance (AUC scores, precision, and recall in the testing phase) of all baseline methods and AddAG-AE on three datasets are reported in Table 7, where the numbers in bold represent the optimal results for their corresponding indexes.The related analyses and evaluations are as follows: • Spotlight [9]: This identifies an anomalous graph by focusing on a sudden change in the localized graph structures, which is effectively for IDS 2017, because its graph structure changes drastically when a network attack occurs in a stable network.While NYC Taxi Trips has dense connections between nodes, the Spotlight's manner of acquiring features can result in changes to the embedding vector of certain special anomalous graph structures that are not obvious.For Enron Mail, many normal graphs have similar graph vectors with anomaly graphs owing to its special sketching method, which led to its poor performance.• LSTM-AE [36]: This displays a set of worse precision, recall, and AUC scores on all three datasets and even be worse than random guesses on IDS 2017.The main reason for this is that the method completely disregards the graph structure and the evolution of structure over time.For IDS 2017, with many nodes and sparse connections between nodes, the simple encode-decode strategy cannot generate effective node representation.

•
DeepSphere [25]: This combines an LSTM-AE and hypersphere learning, and learns a compact boundary for distinct, normal, and abnormal data.For NYC Taxi Trips and Enron Mail, whose nodes number fewer, it can effectively capture the structural differences between normal and abnormal graphs.DeepSphere's embedding manner is expanding the adjacency matrix of a graph into a high-dimensional vector simply by concatenating every line of the adjacency matrix.For IDS 2017, whose number of nodes is more than 1000, the manner will generate a more than one million-dimensional sparse vector for a graph, which may be the main reason for the poor performance.• AnomalyDAE [7]: This has a better performance on NYC Taxi Trips and IDS 2017 while it exhibits ordinary performance on Enron Mail.It is worth noting that it achieves the best recall and the better precision on NYC Taxi Trips data.The method acquires node representation using a dual autoencoder, which is sensitive to node attribution and graph structure.Thus, it has generally better performance on different datasets, especially on dense graphs.• Dominant [37]: This is stable on the three datasets, which ignores the dynamic evolution of the same node over time.It adopts the single graph encoder while neglecting complex interactions between the graph structure and node attribute.It is maybe suitable for Enron Mail, and achieves the best AUC score.• GmapAD [24]: GmapAD exhibits a stable and better precision, recall, and AUC score on the three datasets.It combines cleverly multiple algorithms such as graph neural networks, differential evolution, and graph mapping and leading to a significant recognition effect on anomalous graphs.It performs better on dense graphs (NYC Taxi Trips data) and slightly worse on sparse graphs.
• AddAG-AE: This shows the best AUC score on NYC Taxi Trips and IDS 2017, and has a better performance on Enron Mail.In addition, it outperforms all baselines in the precision of all three datasets, and has the best recall on IDS 2017 and Enron Mail.We improved the GAT decoding process and enabled the model to capture a richer set of structure and attribute information.As a result, it produces more effective node representations, which can potentially increase precision and AUC scores.Then, based on the traditional LSTM-AE, we design and incorporate a novel prediction module to aid in anomaly detection, which ensures the stability of the model.On the NYC Taxi Trips dataset, in terms of recall, the best method is not AddAG-AE but AnomalyDAE.The reason for this may be that in the GAT decoding stage, AnomalyDAE jointly considers hidden vectors generated by GAT and the embedded vectors of attribute vectors to reconstruct the attribute vectors, while AddAG-AE ignores this point.For Enron Mail, whose graph structure is from email exchange statistics, more normal graphs are similar to the anomaly graph, which leads to AUC scores declining slightly.The experimental results are illustrated in Figure 2, with the loss graph during the training phase for three datasets depicted in Figure 2a and the AUC score comparison of different methods across the three datasets showcased in Figure 2b.It can be observed that AddAG-AE increases by 2.8% and 2.3% in AUC score, respectively, on NYC Taxi Trips and IDS 2017 compared to the corresponding second-best model, and its performance is approximately flat with the best model on Enron Mail.In addition, Table 7 shows that AddAG-AE achieves gains of 5.3%, 2.6%, and 1.5% in precision on NYC Taxi Trips, IDS 2017 and Enron Mail, respectively, and increases the recall by 3.8% and 2.2% on IDS 2017 and Enron Mail.In summary, compared with all baselines, AddAG-AE has stable and better performance in three metrics on three datasets.

Ablation Experiment
In this section, we carry out a comparative study by contrasting the AddAG-AE method with the unenhanced method (AddAG-AE without 2D-mlp) and the method lacking SA-decoding improvement (AddAG-AE without SA-decoding).Evaluating this control group through the AUC score can further verify the necessity and influence of the improvement of AddAG-AE.The experimental results are shown in Table 8.
In graph embedding, we improve the decoding manner by considering two-dimensional information (2D-MLP), compared with idiomatic manner [35].The experiment results showed that it improves by 1.11%, 2.08%, and 4.24% in the AUC scores on the three realworld datasets, respectively.In the anomaly-detection phase, we improved the forecasting model by introducing structure and attribute re-decoding to the latent tensor (SA-decoder), compared with [38].The experiment results showed that it improves by 0.96%, 1.46%, and 3.10% in AUC scores on three real-world datasets, respectively.It is proved that the two improvements are beneficial to the performance.In this section, we mainly investigate the influence of hyperparameters on model performance, including the embedding vector's dimension d for GAT-AE, tradeoff parameter α, and the number of GAT and LSTM layers.
Figure 3 shows the AUC scores with different hyperparameters on the datasets.For d , whose suitable values are related to node attribute dimension and usually vary greatly in different datasets, we use the ratio of the embedding vector's dimension to its node attributes instead of d as the x-axis (we take 1  2 , 1, 3 2 , . . .).As shown in Figure 3a, this is the point at which the highest performance is different on different kinds of datasets.It is larger for dense graphs, while it is smaller for sparse ones because of the difference in the information density of the graph.For tradeoff parameter α, it shows a similar trend, where the AUC score increases first, then stays flat, and finally decreases as the α value increases from 0 to 1.0, which is especially obvious on IDS 2017 and Enron Mail.Although the highest points on the three datasets differ from one another, this implies the importance of jointly considering the contribution of network structure and node attributes for attributed graph anomaly detection.In terms of the number of network layers, including two parametersthe number of GAT encode layers and that of LSTM encode layers-and taking NYC Taxi Trips datasets as an example, we can see that the two-layer model has the best performance.

Discussion
In many practical application scenarios, anomalies in network-structured data from many domains often mean that the system may have potential risks.Accurately and effectively detecting anomalies in these data is of great significance for preventing potential threats, monitoring system status, etc.In this paper, AddAG-AE is proposed to solve the above problems, which considers graph structure, node attribute, and temporal information to achieve anomalous graph (or node) detection by combining the GAT dual autoencoder, LSTM-AE, and forecasting-based model effectively.First, it solves the problem that different information is difficult to fuse in dynamical graph anomaly detection.Second, we improve the GAT dual autoencoder by considering two-dimensional information in the graph embedding phase.Additionally, the dual self-decoding mechanism of GAT is introduced, which improves the performance of the model in the anomaly-detection stage.The whole framework detects anomalies in a self-supervised manner, not depending on labeled data, which can be more applicable to real-world situations.Experiments on three commonly used real-world datasets show that AddAG-AE has superior performance in AUC score, precision, and recall.It has a broad application in different kinds of datasets.Compared with the second-best baseline, the AUC score has increased by 2.8% and 2.3%, respectively, on NYC Taxi Trips and IDS 2017, and its performance is approximately flat with Dominant on Enron Mail; Precision improved by 5.3%, 2.6%, and 1.5% on NYC Taxi Trips, IDS 2017, and Enron Mail, respectively, and the recall increased by 3.8% and 2.2% on IDS 2017 and Enron mail, respectively.
However, research into graph attributes is exploratory and not comprehensive.For example, this paper only uses the weight attribute of nodes, but does not use the type of attribute (such as type of attack in IDS) and edge property attribute of nodes, which is important in some special networks.In future work, we plan to explore more node and edge features in networks and will attempt to improve the model as an end-to-end algorithm without losing the model's current advantages.

Figure 2 .
Figure 2. Performance comparison of AddAG-AE.(a) Loss of AddAG-AE on three datasets.(b) AUC scores of different approaches on three datasets.

Table 1 .
Benefits and limitations of related work.

Table 3 .
Sample data of Enron mail.

Table 4 .
Sample data of NYC Taxi Trips.

Table 6 .
Statistics of the used real-world datasets.
•A. NYC Taxi Trips.This is a dataset of NYC taxi trips from 2009 to 2018, including details such as time and coordinates.In the experiment, we extracted data from October 2015 to January 2016 and converted it into a chart broken down by day.NYC taxi trips are divided into 56 zones using K-means clustering, which represents the nodes of the graph, with the trips between regions representing the edges.For each node, the trips from that region are aggregated and normalized into a single attribute vector.Special days with unusual traffic patterns are flagged as anomalies for analysis.• B. IDS 2017.The dataset was intercepted on 5 July 2017, including source IP, target IP, attack type, timestamp, and other fields, resulting in more than 640,000 edges and five attack types.After processing, the graph flow was modeled into 5-minute intervals and marked as anomalies if the graph contained at least 200 attack edges.• C. Enron Mail dataset.The dataset consists of 352,550 emails exchanged between Enron employees from 1979 to 2002.

Table 8 .
AUC scores of ablation experiment on three datasets.