Systematic Review of Graph Neural Network for Malicious Attack Detection

Alshehri, Sarah Mohammed; Sharaf, Sanaa Abdullah; Molla, Rania Abdullrahman

doi:10.3390/info16060470

Open AccessSystematic Review

Systematic Review of Graph Neural Network for Malicious Attack Detection

by

Sarah Mohammed Alshehri

^*,

Sanaa Abdullah Sharaf

and

Rania Abdullrahman Molla

Computer Science Department, King Abdulaziz University, Jeddah 21589, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Information 2025, 16(6), 470; https://doi.org/10.3390/info16060470

Submission received: 21 March 2025 / Revised: 5 May 2025 / Accepted: 27 May 2025 / Published: 2 June 2025

Download

Browse Figures

Versions Notes

Abstract

As cyberattacks continue to rise alongside the rapid expansion of digital systems, effective threat detection remains a critical yet challenging task. While several machine learning approaches have been proposed, the use of graph neural networks (GNNs) for cyberattack detection has not yet been systematically explored in depth. This paper presents a systematic literature review (SLR) that analyzes 28 recent academic studies published between 2020 and 2025, retrieved from major databases including IEEE, ACM, Scopus, and Springer. The review focuses on evaluating how GNN models are applied in detecting various types of attacks, particularly those targeting IoT environments, web services, phishing, and network traffic. Studies were classified based on the type of dataset, GNN model architecture, and attack domain. Additionally, key limitations and future research directions were extracted and analyzed. The findings provide a structured comparison of current methodologies and highlight gaps that warrant further exploration. This review contributes a focused perspective on the potential of GNNs in cybersecurity and offers insights to guide future developments in the field.

Keywords:

deep learning; machine learning; graph neural network; systematic literature review

1. Introduction

In recent years, deep learning (DL) methods have been increasingly adopted in cybersecurity research to address the growing complexity and sophistication of cyberattacks. Among these methods, Graph Neural Networks (GNNs) have gained notable attention for their ability to model structured relationships within data, offering promising results in various security domains. Despite the increasing number of studies applying GNNs to attack detection tasks, there remains a need for a comprehensive review that organizes and analyzes these efforts through a practical lens. This paper presents a systematic literature review (SLR) of selected academic works that utilized GNN-based models for cyberattack detection, focusing on the types of attacks addressed, the characteristics of the datasets used, and the specific GNN architectures employed.

The review includes 28 recent studies and categorizes them based on the attack type such as DDoS, phishing, and injection; the application domain such as networks, IoT, and databases, in addition to the web; and the data formats utilized. It also highlights patterns in how GNNs have been applied across these contexts and identifies common limitations reported in the literature. Special attention is given to areas that have received relatively less focus, such as web-based attacks, which are often addressed using other deep learning approaches.

This study aims to provide researchers and practitioners with a structured overview of the current landscape, emphasizing practical usage, existing challenges, and opportunities for future research. The paper is organized as follows: Section 2 covers the background and related studies; Section 3 and Section 4 outline the methodology and research questions; Section 5 presents the analysis and discussion of the findings; and Section 6 concludes the paper and proposes future directions.

2. Background and Previous Work

In the subsequent section, Section 2.1 provides information on machine learning. Section 2.2 then offers an overview of relevant review papers on the application of deep learning. Finally, Section 2.3 provides the related work.

Over the years, researchers in artificial intelligence have explored a wide range of machine learning and deep learning models to address security challenges. This section outlines the evolution of these models in the context of cyberattack detection, starting with conventional ML methods, progressing to deep learning techniques, and finally focusing on GNNs, which are the core of this study.

2.1. Machine Learning Algorithm

Machine learning is a subset of artificial intelligence that is focused on developing algorithms capable of executing specified tasks in real-world applications. It involves more complex methodologies than deep learning since it was developed before deep learning. Machine learning algorithms are robust and widely employed in efficient applications. They pose unique properties to perform tasks with relative ease. Creating a program with complicated and hard-coded instructions is the traditional approach to solving computer problems. It requires a highly skilled engineer with in-depth knowledge to solve complex tasks.

Machine learning (ML) is a significant field in artificial intelligence that uses various approaches to solve problems. An ML algorithm uses the approach of systems training instead of explicitly programming [1]. In ML, while systems are training, precise details can be presented while input data are provided to this system. Nowadays, the emergence of powerful inventions is not limited to computer systems or web networks, a substantial amount of data require processing and security.

ML algorithms have the advantage of processing data and extracting good accurate results. They are even capable of exceeding human intelligence in some situations. Machine learning algorithms can be classified into three distinct categories based on the nature of input data and output [2].

2.1.1. Supervised Learning Models

Models in supervised learning learn through analyzing input data, which can be text, images, or array of numbers, as well as output data, learning the mapping in input and the pairs of output. They can achieve precise and accurate predictions including unseen data.

2.1.2. Unsupervised Learning Models

Models in unsupervised learning learn based on the process of finding similarities. Such models do not rely on labels, and the results generated by unsupervised learning models are clusters.

2.1.3. Reinforcement Learning Models

Reinforcement learning operates as an intermediate framework connecting supervised and unsupervised learning models. Despite the existence of some form of supervision, reinforcement models obtain feedback from their interactions with the environment. This feedback confirms the model’s effectiveness [2].

ML techniques such as KNN and XGBoost have shown high accuracy in intrusion detection and anomaly classification tasks, particularly in datasets like CIDDS-001. They are commonly used for detecting network-based attacks and phishing but often require manual feature engineering [3,4].

While ML methods such as SVM, decision trees, and random forests have shown utility in early intrusion detection systems, recent years have witnessed a shift toward deep learning models due to their superior ability to extract features automatically and handle complex data structures

2.2. Deep Learning Algorithm

Deep learning (DL) is considered a particular type of machine learning based on neural network structures. Deep learning is particularly effective at accomplishing a wide variety of goals. It can perform tasks with high accuracy and efficiency via incremental stages [5]. The primary points of reference for ML and DL are as follows:

In a classical ML model, the classification task is different from feature selection, and the two procedures are unable to be integrated with one another to enhance the performance [6]. Nevertheless, DL handles the problem by integrating these two procedures into a single phase that facilitates effective detection and classification of phishing attacks [7].
Third-party services and manual feature engineering are still required for ML [8]. On the other hand, DL models are capable of learning and extracting features automatically without the guidance of a human.

There are numerous deep learning algorithms that are applied to address various kinds of issues as well as perform a variety of tasks. Some of the most common algorithms include the following.

2.2.1. Multilayer Perceptrons (MLPs)

A feedforward artificial neural network method can be defined as a multi-layer perceptron (MLP) model. It comprises various connected neurons that operate concurrently to accomplish specific tasks. A MLP consists of input and output layers, in addition to one or more hidden layers. Each node contains an activation function that consists of sigmoid and radial basis function. The MLP network’s basic mechanism is the successive transition of signals through multiple layers. In the training phase of MLP, there are three stages: The first stage involves the input pattern X of the dataset; subsequently, the output is generated and compared with the output that is needed. The error signal between the network output and the intended output is utilized to start the next phase. Neural weights constitute the final phase; this procedure is repeated for the next input vector until every instance in the training set has been processed [9]. MLPs have been used for anomaly detection in cyberattacks targeting networks, demonstrating competitive results on datasets like ASNM-NPBO, especially in handling obfuscated threats [10].

2.2.2. Radial Basis Function Network (RBFNs)

In contrast to nearly all types of neural network architectures that employ nonlinear activation functions and consist of multiple layers, a radial basis function network (RBFN) includes an input layer, a hidden layer, and an output layer. Data are first received by the input layer and subsequently transmitted to the hidden layer for processing. RBFs neural networks are hidden layers, which is the most significant and different factor compared with the majority of other types of neural networks. The output layer is utilized for prediction tasks, such as classification, which is performed using the output layer [11].

RBFN is a strong neural network that demonstrates exceptional capabilities in binary and classification problems, such as attack detection. According to Rapaka et al. [12], the radial basis function network (RBFN) is the classical model of the RBF network, initially created for anomaly detection. A hierarchical intrusion detection system that works simultaneously across several RBFN layers was presented by Jiang et al. [13] to enable real-time execution.

2.2.3. Recurrent Neural Networks (RNNs)

Recurrent neural networks (RNNs) use repeated computations to analyze model data and are highly skilled in working with systems that rely on cyclic connections. RNN system architecture consists of a series of interconnected stages, with each step utilizing the input from the previous one. Among all DL algorithms, recurrent neural networks (RNNs) are more suitable for processing sequential input, such as natural language. It is effective for sequence recognition across text, audio, and handwriting [14].

In discriminative learning, the most significant issue is the expense and time required to collect labeled data [15]. This makes it a difficult task for studies to find the most suitable algorithm for a specific cybersecurity application. The model’s efficiency and accuracy might be compromised, and wasted effort might result from the selection of an inappropriate algorithm, which could lead to unpredictable results [16].

Figure 1 illustrates a recurrent neural network (RNN). At each time step (t), a single input x(t) is accepted by the RNN unit, which then provides a corresponding output y(t).

RNNs have been applied in detecting malicious URLs by analyzing raw URL sequences without manual feature extraction, showing potential for integrating with browser-based security modules [17].

2.2.4. Self-Organizing Maps (SOMs)

The self-organizing map (SOM) is a type of neural networks that facilitates the ordered and non-linear representation of high-dimensional input data on each component of a structured, low-dimensional array. On a two-dimensional map, SOMs convert non-linear statistical interactions between data points in high-dimensional space into geometric representations [18]. Studies have demonstrated that the SOM algorithm can simulate the growth of a variety of topographical response areas in the human brain. Due to its modeling capabilities such as maps of optical features and phonemes of speech, SOMs have been successfully employed in practical applications including automatic speech recognition, image analysis, and industrial processes [19]. In attack detection, SOM has been used in research such as [20]. It could achieve some efficient results due to its ability to capture and accurately classify the attacks. The limitation of using SOM is that attacks that show normal behavior may not be detected, which prompts the need to find a more accurate and intelligent way.

2.2.5. Autoencoders

Autoencoder refers to an unsupervised algorithm that employs a decoder function to the process of input reproduction at the output phase. The code definition used to represent the inputs is contained within a hidden layer, where the process takes place. The second function is the encoder, employed to transform the data being input into code. In the duration of the training period, reconstruction errors would be reduced [21]. Feature extraction from datasets represents a method utilized in the domain of artificial intelligence. Nevertheless, these are constrained by the requirement for substantial computational resources. In cyber security, some studies have explained attacks detection such as network malware detection with a higher degree of accuracy compared with other algorithms such as KNN and SVM [22].

2.2.6. Long Short-Term Memory Networks (LSTMs)

The long short-term memory (LSTM) network is a more sophisticated variant of the recurrent neural network (RNN) that intends to improve performance. One of the distinguishing characteristics is its capability to manage feedback connections effectively. Image classification, pattern/sequence recognition, and other applications fall within the numerous applications of the LSTM method. There are three primary components of the LSTM network: input, output, and forget gates. The neural network of the LSTM model is capable of storing and retrieving data processed in earlier stages, which helps it learn from previous experiences. Thus, the model is able to identify the most important features for the subsequent phases and achieve high accuracy in the system. In Figure 2, LSTM structure illustration is explained [23].

As an RNN algorithm, LSTM is among the most promising algorithms for the purpose of recognition and text classification. RNNs are the most suitable algorithms for sequential data structures because of their capability to retain a memory of prior inputs. Therefore, they are considered one of the best algorithms for solving issues in the fields of language models and text generation. However, RNNs have various problems that affect their performance results. LSTM was presented to be an improved form of the RNN algorithm that can improve the results efficiently and reliably address the limitations compared to traditional RNNs. It is crucial to recognize the text context and semantics where LSTMs can be especially beneficial. Sentiment analysis, text classification, and natural language processing are well suited to being processed by both RNNs and LSTMs, which are particularly proficient at analyzing sequences of data. According to the authors in [24], the vanishing or exploding gradient issues that are present in traditional RNNs are resolved by LSTM models, rendering them suitable for the processing of time-series sequence data.

LSTMs, particularly Bi-LSTMs with attention mechanisms, have been effective in phishing detection, achieving high performance in classifying URL sequences and showing potential in multiclass malicious URL classification [25].

These advantages make LSTM and CNN ideal for the detection of illegal webpages. For each algorithm in the field of deep learning, there are distinct advantages that can be the subject of further research, as well as shortcomings that require development. In order to detect malicious activities with sufficient accuracy, each investigation should assess the strengths and weaknesses of each algorithm and subsequently analyze the characteristics of the issue to determine the most suitable model for its solution.

Throughout the implementation of the RNN algorithm, the most common issues were the vanishing and exploding gradients, as previously mentioned; LSTM was a successful solution that has been employed to address the limitations of RNN [16,26].

2.2.7. Deep Belief Networks (DBNs)

Deep belief networks (DBNs) are algorithms that rely on unsupervised learning, as they are created by adding two or more restricted Boltzmann machines (RBMs). They demonstrate robust performance by conducting unsupervised training for each layer separately [27]. During the pre-training phase, each layer’s initial features are extracted, which then is followed by a fine-tuning phase, while the top layer performs using the softmax layer [28]. As shown in Figure 3, DBNs comprise two layers, visible and hidden layers. In cyber security, many recent studies have addressed the issues of malicious attack detection models using DBNs, which achieved efficient results compared to machine learning algorithms [29].

2.2.8. Restricted Boltzmann Machines (RBMs)

A deep generative model is constructed by restricted Boltzmann machine (RBM) algorithms, which are among the algorithms that rely on unsupervised learning. Layers in the RBM model are classified into two categories: visible and hidden. The hidden layer consists of numerous layers and contains undetermined potential variables, whilst the visible layer includes the recognized input parameters. The features would be extracted from a dataset and subsequently transmitted to the next layer as hidden variables in a hierarchical structure [31].

RBMs have been implemented in many applications such as smart cities, IoT intrusion detection systems [32,33]. However, they have many limitations such as the requirement for significant computational resources. Additionally, the absence of feature representation in single RBMs. In smart city scenarios, their performance tends to degrade after increasing the number of layers, and they struggle with multi-class classification and real-valued data representation, which reduces their reliability [32].

2.2.9. Generative Adversarial Network (GANs)

Generative adversarial networks (GAN) are a deep learning architecture designed to produce a dataset through the utilization of two model types: generative and discriminative models. Data such as text and images would be generated first using the generative model, then it would be classified using the discriminative model [34].

The generative model will generate data by utilizing random noise, as demonstrated in Figure 4. Nevertheless, the discriminative model is designed to differentiate between legitimate training data samples and other samples generated by the generative model.

In this following illustration, D(x) represents a binary classification that generates an output that can be considered either fake (generated) or actual. Both model accuracy and efficiency are inversely proportional to the correct/incorrect classification measure. This results in model updates during each iteration [21].

In the cyber security field, GANs can achieve promising results, such as in detecting zero-day attacks in Internet of Things (IoT) environments through generating samples that imitate such attacks [35]. GANs have been applied in the other studies of cyber security [36] due to their capability of training during different attack scenarios.

2.2.10. Convolutional Neural Networks (CNNs)

Convolutional Neural Networks (CNNs) are specifically designed to analyze visual inputs, enabling the extraction of relevant features and the precise analysis of visual data. The process begins with the aggregation of a significant dataset of photos to train the model. The inputs are processed through a filtering layer to improve their features before the establishment of the convolutional layer. Subsequently, the maximum pooling function is applied to find and extract matrices containing the greatest values, which are then used to configure the fully connected neural networks that generate outputs [14]. Figure 5 shows how convolutional neural networks (CNNs) work.

Convolutional neural network (CNN) techniques have become popular for their fast and effective extraction of features from complicated, unprocessed input. The complexity of the network is reduced, and the learning process increases, leading to greater strength and better results with CNN architecture. The architecture of CNNs produces excellent results when analyzing grid structures with two-dimensional data, such as videos and images because of the robust relationship between neural networks and pixels [37]. CNNs have been widely used in phishing and website attack detection by extracting character-level and content-based features from URLs and webpages, enabling efficient early-stage classification [38].

Various deep learning models have been employed in cybersecurity to address different attack types and data structures. Models like CNNs, LSTMs, and RNNs have shown high effectiveness in text-based scenarios, such as detecting malicious URLs, phishing websites, and classifying web content, due to their ability to capture sequential dependencies and contextual features [25,38]. In contrast, models such as MLPs, RBFNs, and SOMs have been widely used in anomaly detection and intrusion detection systems, particularly for analyzing network-based data. These models benefit from their structural simplicity and pattern recognition capabilities, though they often struggle with detecting stealthy or behaviorally normal attacks [13,20]. Techniques like autoencoders, DBNs, and RBMs have also contributed to malware and intrusion detection tasks, offering improved performance in feature learning, but they typically require intensive computational resources and may lack generalization [29,32]. While each model has unique advantages, most of them fail to represent the rich relational structures inherent in cybersecurity data, which has led to growing interest in GNNs for more structured and adaptive attack detection solutions.

Despite the success of deep learning techniques in attack detection, most of these models are designed for grid-like data. To capture more complex relationships, especially in network and security data that are inherently graph-structured, graph neural networks have emerged as a promising alternative.

2.2.11. Graph Neural Networks (GNNs)

A graph neural network (GNN) is a field of deep learning that has demonstrated exceptional capabilities in dealing with Euclidean data including text, image, and videos. CNN has demonstrated its capability of achieving an excellent performance and reducing associated errors and problems.

Despite the impressive capabilities of various deep learning techniques, there is a challenge in dealing with applications that involve data generated from non-Euclidean domains. The complex interactions and interdependencies between the objects are well represented by graphs, which are commonly used to represent these kinds of data. This emphasizes the significance of graphs, as they can represent extremely complex data. In several recent applications, including cybersecurity, natural language processing, social networks, and protein folding, graph neural networks (GNNs) have demonstrated the capability to address and manage various forms of complex data. For instance, social networking data encompass a substantial volume of interrelated information, including accounts, updates, images, profiles, and reposts. Similarly, in the domain of cybersecurity, complex systems are designed to address spy accounts and other associated accounts [39].

A graph is a data structure that contains several elements (nodes) and the connections that connect them (edges). Based on the considerable capabilities of machine learning methods, many studies have indicated a growing interest to analyze graphs through these approaches. Each node in a graph is characterized by its distinct features, in addition to the features of its neighboring nodes.

As is shown in the equation below [40], the purpose of graph neural networks (GNNs) is creating state embedding called (h_v); h_v consists of relevant information about the neighborhood of each node. The node v is represented by a vector of s-dimension, which is known as the state embedding. It has the ability to generate an output known as o_v. To achieve this, the local transition function is a parametric function f, which is shared among all nodes. The state of each node based on the input in the neighborhood is updated by this function [40]. The output as generated is denoted by a local output (g). The final definitions of hv and o_v are as follows:

h_v = f(x_v,x_co[v],h_ne[v],x_ne[v]); o_v = g(h_v,x_v)

Figure 6 shows that the graph neural network has a graph as input and prediction as output.

Several graph neural network (GNN) models have been proposed in recent years for cybersecurity applications; however, graph convolutional networks (GCNs) and graph attention networks (GATs) have emerged as the most widely adopted and studied frameworks within the past five years due to their strong performance and adaptability across various attack detection applications:

The graph convolutional network (GCN) is a commonly used algorithm for classification purposes. Figure 7 shows that the information of each node’s neighborhood can be aggregated by applying convolutional operations on the signal. This model has demonstrated high accuracy in various classification tasks [41].

In a graph, GCN is a multi-layer algorithm that is implemented by gathering embeddings and evaluating the properties of each node and its neighbors. Nevertheless, the GCN is restricted to collecting data from a node’s near neighbors when employing a single layer of convolution. By incorporating multiple GCN layers, this limitation can be reduced. Assuming each vertex is self-connected, the adjacency matrix A of a graph G = (V, E) consists of diagonal elements representing self-connections, while any matrix entry A[i, j] denotes the existence of an edge between node v_i and node v_j [41]. For studies that utilize data that can be represented graphically, GCN can be implemented due to its efficiency and capability to manage these types of data, such as text [42] and images [43]. The process of learning data on graphs encounters several challenges, including the fact that most connection patterns between data are complex and diverse. Furthermore, graph models are incompatible with specific kinds of data, including images and text, which are unstructured. Experimental studies suggest that a potential solution lies in ensuring the preservation of graph properties by obtaining knowledge of graph representation in a low-dimensional Euclidean space. This approach, known as representation learning, has demonstrated its significant success in multiple domains. The deep structure feature of graph convolutional network models allows them to be leveraged in order to enhance representation learning [44]. The development of a deep architecture that is capable of more effectively capturing the complicated structural patterns present in graphs remains a significant challenge [41].

Graph attention network (GAT) is a variant of graph neural networks (GNNs) that have been specifically designed to achieve classification objectives. It is capable of efficiently capturing complex details and information about the structure of the graph by using attention mechanisms to weigh the importance of neighboring nodes. One of the primary advantages of the GAT model is its scalability, rendering it a highly effective tool for large-scale graph analysis. Additionally, GAT has the ability to consider edge features, which further enhance its accuracy and utility [45]. The effectiveness of the attention mechanism has been demonstrated successful in a variety of deep learning applications, including speech, image recognition, and natural language processing. The attention function is a mechanism that generates a weighted sum of the values by combining a query with multiple key–value pairs. The output’s shape is dependent upon the level of information that is being addressed, whether spatial, temporal, or any other component. In particular, graph attention networks (GATs) are a novel kind of neural network architecture that is particularly intended for graph-structured data. The limitations of previous techniques that depend on graph convolutions or their variants have been resolved by the utilization of self-attentional layers. A defining feature of GATs is their attention mechanism, which allows the network to prioritize critical nodes in the graph rather than treating all nodes equally. This is particularly advantageous in scenarios where nodes possess varying levels of importance. GATs extend the attention mechanism by employing multi-head attention, allowing the model to capture diverse interactions between nodes more effectively.

As illustrated in Figure 8, the approach of multi-head attention involves the incorporation of a vertex and its adjacent vertices. To generate the attention vector, the features extracted from the central vertex and each of its neighbors are subsequently combined through either concatenation or averaging [46].

The features of all nodes in GAT are represented as h = {h₁, h₂,…, h_N}, where N represents the total number of nodes, and h_i is a vector that contains the features. In order to process these features, GAT employs a weight matrix W to apply a linear transformation to each node’s features. This helps in improving the ability for efficient processing. The fundamental concept behind graph attention networks (GAT) is to calculate the attention coefficients. The GAT model focuses on the features of neighboring nodes when updating a node’s features. The transformed features Wh_i and Wh_j are used by each node i to calculate an attention score e{ij} for every neighbor j. This essentially establishes how important the feature of node j is to node i. The attention values are normalized using the softmax function to ensure that they are comparable [45]. Due to its ability to focus on different neighbors based on the input features and its efficient use of attention mechanisms, the graph attentional layer is a robust algorithm that can effectively operate on complex and diverse graph data.

To illustrate the general processing pipeline of graph neural network models in the context of cyberattack detection, the following diagram presents a comparative architecture involving GCN, GAT, and general GNN approaches (Figure 9). The architecture is divided into key phases: preprocessing, graph embedding, and detection. Each branch reflects the typical structure and application focus of the corresponding model type.

Graph neural networks (GNNs) have become an important tool in cybersecurity because of their ability to handle structured and relational data. Unlike other deep learning models, GNNs can capture the connections between different elements such as users, devices, and traffic, which helps in identifying complex threats. They have been applied in various areas like intrusion detection, phishing detection, IoT security, and social network analysis. With the increasing complexity of cyberattacks, GNNs provide a promising approach for building smarter and more effective security systems.

2.3. Related Work

This section provides an overview of recent review and survey papers that focus on the application of graph neural networks (GNNs) in security-related domains. Although systematic reviews remain limited in this area, several surveys have explored the potential of GNNs in diverse security tasks.

Phan et al. [3] presented a broad review on fake news detection using GNNs, highlighting the use of GCN, AGNN, and GAE models in social media contexts. While the paper demonstrated the potential of GNNs in classifying news into real or fake, it primarily emphasized content-based and context-based detection strategies without integrating structural and textual features effectively. Similarly, Kim et al. [47] focused on graph anomaly detection using GNNs, categorizing methods based on anomaly types (e.g., node-level, edge-level, and subgraph-level). The paper provided a useful taxonomy but did not investigate cyberattack-specific use cases or datasets, limiting its applicability to security threat modeling. In another study, Zhang et al. [48] surveyed privacy-preserving GNN techniques, particularly in defense against inference and extraction attacks. However, the review concentrated more on privacy challenges in general graph applications rather than threats specific to network and cybersecurity contexts.

In contrast to these studies, this review aims to bridge that gap by systematically analyzing how GNN models have been used across a variety of cyberattacks—including phishing, DDoS, and injections—highlighting the strengths, limitations, and future research directions in the context of graph-based learning. Among these, phishing detection has received more focused attention in ML and DL reviews but has rarely been examined using GNNs. This paper is not limited to phishing but emphasizes it due to its relevance and underrepresentation in graph-based research.

3. Method

The research questions and the approach that were employed in this study are presented in this section. A systematic literature review methodology was employed to collect and evaluate the current knowledge related to attack detection. This SLR investigates the following research questions (RQ):

What types of graph neural network (GNN) models are currently used in cyberattack detection research?
How are GNN methods applied to detect and classify different types of cyberattacks?
What public datasets have been used in the reviewed studies for model training and evaluation?
What performance evaluation metrics are commonly used in GNN-based attack detection?
Which GNN models demonstrated the best performance across different attack detection scenarios?
What are the key challenges and research gaps identified in current GNN-based cyberattack detection studies?
What future research directions have been proposed for advancing the use of GNNs in cybersecurity?

Study Strategy

A systematic literature review was conducted following the PRISMA methodology (Supplementary Materials). The search process targeted major academic databases, including IEEE Xplore, ACM Digital Library, Scopus, and Springer. The search was guided by the keywords: “graph neural network”, “attack”, “malicious”, and “detection”, with filters applied to include studies published within the past five years to ensure relevance to current advancements. Initially, 200 studies were identified. Titles and abstracts were manually screened, leading to the exclusion of 142 irrelevant records. The remaining 58 studies underwent full-text review, of which 30 were excluded due to insufficient information, lack of technical depth, or the absence of GNN implementation. Ultimately, 28 studies met the inclusion criteria. Studies were selected based on their relevance to cyberattack detection using artificial intelligence, with a particular focus on those employing deep learning techniques, especially GNN models. Manual qualitative synthesis was performed to extract key aspects such as attack types, datasets used, GNN architectures, reported performance, and future research directions. Only the highest reported performance was considered in cases involving multiple datasets, while studies with non-standard metrics or unclear results were excluded to maintain methodological consistency.

4. Primary Studies

4.1. Study Selection

This study presents a systematic analysis of selected research on cyber-attack detection. Key information was extracted from each study, including publication year, attack types, model specifications, operational mechanisms, and weaknesses. These data were organized into a comparative table, facilitating easy reference and analysis of various detection methodologies. Table 1 shows the criteria that we used to filter out irrelevant studies to answer our research questions.

As shown in Table 1, GCN is the most commonly used model, appearing in 16 studies. This is because it effectively analyzes structured and static data. In contrast, GAT is used less often (only five studies), it performs well with evolving data, such as web traffic, although it requires higher computational power.

4.2. Dataset

In this research, we examined more than 25 different datasets in studies, which are described in Table 1. Our analysis revealed that Phish Tank and OpenPhish were the most commonly utilized datasets for detecting Web URLs and HTML. Additionally, some of the authors created specific datasets for their studies. Weibo and Pheme were the most frequently used datasets in studies that focused on applying algorithms on “x” platform, while CTU-13 was the best choice for traffic tracking studies. In conclusion, we can say that each field, such as bot detection, spam, traffic detection, and web detection, has its preferred dataset, as explained in this section. Available datasets for attack detection studies using GNN are shown in Table 2.

5. Discussion

5.1. Limitations and Challenges

Different disciplines encounter a variety of challenges and limitations. Common concerns among half of the studies were the limited quantity of datasets and models employed in the comparative analyses. These challenges significantly impact the efficiency of the results and the model’s practical applicability. Therefore, most of the future directions for these papers aim to increase the number of datasets and models to expand the field of comparison and work towards improving performance.

5.2. Performance Evaluation and Metric

Performance results were extracted based on the available evaluation metrics, including precision, recall, F1-score, and accuracy. For studies that evaluated models on multiple datasets, the best reported performance was selected for comparison. Studies that did not provide clearly interpretable results or used non-standard evaluation metrics were excluded from the quantitative analysis. However, key methodological aspects of these studies were considered in the discussion where relevant.

It is important to note that the models were evaluated on different datasets within their respective studies. Therefore, Table 3 is not intended to provide a direct or unified benchmark comparison but rather to give insight into each model’s reported performance within its original experimental context.

While the datasets vary, one consistent pattern across studies was the preference for F1-score due to its balanced treatment of false positives and false negatives, especially in imbalanced scenarios such as DDoS or botnet detection. This explains why F1-score was widely used and generally showed favorable results. In cybersecurity, managing both false positives and false negatives is crucial, and F1-score effectively addresses this challenge. In Table 3, we summarize the reported model results.

5.3. Future Research Directions

As reflected in the Table 4, the most frequently suggested research directions focused on expanding dataset diversity and volume, enhancing detection performance, and addressing model limitations in handling encrypted or complex traffic. Several studies aimed to improve model interpretability, scalability, and efficiency, particularly for dynamic environments such as fake news detection or encrypted threats. Other directions included extending model application to new domains such as NLP, code analysis, or multiclass classification. Notably, a number of studies did not present any explicit future plans, which suggests a lack of long-term research vision in certain areas. These observations underscore the importance of structured planning to guide the practical deployment and advancement of GNN models in cybersecurity.

5.4. Comparative Analysis of GNN Models

One of the main objectives of this review is to provide a comparative analysis of GNN-based methods for cyberattack detection. This section summarizes the strengths, limitations, and appropriate use cases of the most common GNN models identified in the reviewed studies.

It is difficult to rank the algorithms from best to worst because they are applied in various contexts, as discussed throughout this review. Each study typically selects the most effective model based on its specific data type and attack scenario.

The graph convolutional network (GCN) is the most widely used model. It performs well in structured environments, such as malware detection, social networks, and web analysis, where relationships between nodes are relatively stable. However, GCN struggles with dynamic graphs in which edge relationships frequently change.

In contrast, the graph attention network (GAT) appeared in fewer studies but achieved better results in dynamic data settings, such as web traffic and real-time attack detection. GAT’s attention mechanisms allow it to model complex and evolving relationships, although it comes with a higher computational cost.

Some papers employed general GNN architectures without specialization, particularly in tasks like DDoS detection and malicious IP classification. These models offer flexibility but typically do not outperform specialized approaches like GCN or GAT. In summary, GCN is better suited for well-structured, static data, while GAT offers advantages in scenarios involving dynamic, evolving graphs. The choice of model depends heavily on the nature of the dataset and the specific threat being addressed.

5.5. Tools and Dataset Diversity

A variety of datasets, in addition to frameworks such as PyTorch, were observed across the reviewed studies. This variation was closely tied to the type of attack and the context in which each study operated. For example, Pheme and Weibo were commonly utilized in detecting fake news and attacks on social networks. This diversity illustrates the interdisciplinary and evolving nature of GNN-based cybersecurity research, which spans multiple data types and application domains.

5.6. Summary of Research Questions

Based on the analysis of the selected studies, the following findings address the research questions posed in this review:

RQ1:

A variety of GNN models have been employed, including graph convolutional networks (GCN), graph attention networks (GAT), and general GNN frameworks, each chosen based on the attack type and data structure.

RQ2:

GNNs were applied for detecting and classifying attacks such as phishing, botnet, DDoS, and intrusion attempts in IoT and web environments. They were integrated into end-to-end pipelines including data preprocessing, graph construction, and classification stages.

RQ3:

Public datasets such as CTU-13, BoT-IoT, CICIDS2017, Pheme, and Weibo were commonly used depending on the attack domain, for instance, network traffic or social media.

RQ4:

F1-score was the most frequently used metric due to its suitability for imbalanced datasets. Other metrics included accuracy, precision, and recall.

RQ5:

GCN was the most widely used and generally performed well for structured data like network traffic, while GAT showed superior performance in dynamic and evolving data environments such as social media.

RQ6:

Major challenges included limited datasets, lack of standard benchmarks, scalability issues, and insufficient model interpretability.

RQ7:

Future directions focused on enhancing model scalability, using more diverse datasets, exploring under-investigated attack types, and improving explainability and cross-domain adaptability.

6. Conclusions and Future Work

As cybersecurity threats continue to evolve, there is a growing need for advanced detection techniques that can adapt to complex and dynamic attack patterns. This review explored the application of graph neural network (GNN) methodologies in the detection of various cyber threats, including emerging challenges such as advanced persistent threats (APTs). By systematically analyzing 28 recent studies, this work provided a structured overview of how different GNN models are applied across domains such as IoT, social networks, and phishing detection. The studies were categorized based on the type of attack, dataset used, and GNN architecture, while also highlighting common limitations and future research directions.

Despite promising results, the implementation of GNNs in cybersecurity remains limited and fragmented. Future efforts should aim to diversify the datasets, explore less-studied attack types, and expand the use of advanced GNN variants. This review lays the groundwork for more targeted research and encourages the development of robust, scalable GNN-based solutions for real-world cyberattack detection.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/info16060470/s1, PRISMA.

Author Contributions

S.M.A. led the research process, including the literature review, methodology development, experimentation, and drafting of the manuscript. S.A.S. and R.A.M. provided critical academic supervision, conducted thorough proofreading, and offered insightful recommendations that enriched the structure and content of the paper, including the addition of sections and clarification of overlooked aspects. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	Convolutional neural networks
DBNs	Deep belief networks
DL	Deep learning
GANs	Generative adversarial network
GAT	Graph attention network
GCN	Graph convolutional networks
GNN	Graph neural networks
LSTM	Long short-term memory networks
ML	Machine learning
RBMs	Restricted Boltzmann machines
RNN	Recurrent neural networks
SOMs	Self-organizing maps

References

Ongsulee, P. Artificial Intelligence, Machine Learning and Deep Learning. In Proceedings of the 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE), Bangkok, Thailand, 22–24 November 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 1–6. [Google Scholar]
Simeone, O. A Very Brief Introduction to Machine Learning with Applications to Communication Systems. IEEE Trans. Cogn. Commun. Netw. 2018, 4, 648–664. [Google Scholar] [CrossRef]
Varlamis, I.; Michail, D.; Glykou, F.; Tsantilas, P. A Survey on the Use of Graph Convolutional Networks for Combating Fake News. Future Internet 2022, 14, 70. [Google Scholar] [CrossRef]
Thapa, N.; Liu, Z.; Kc, D.B.; Gokaraju, B.; Roy, K. Comparison of Machine Learning and Deep Learning Models for Network Intrusion Detection Systems. Future Internet 2020, 12, 167. [Google Scholar] [CrossRef]
Dhingra, M.; Jain, M.; Jadon, R.S. Role of Artificial Intelligence in Enterprise Information Security: A Review. In Proceedings of the 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC), Waknaghat, India, 22–24 December 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 188–191. [Google Scholar]
Apruzzese, G.; Colajanni, M.; Ferretti, L.; Guido, A.; Marchetti, M. On the Effectiveness of Machine and Deep Learning for Cyber Security. In Proceedings of the 2018 10th International Conference on Cyber Conflict (CyCon), Tallinn, Estonia, 29 May–1 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 371–390. [Google Scholar]
Ahmad, R.; Alsmadi, I. Machine Learning Approaches to IoT Security: A Systematic Literature Review. Internet Things 2021, 14, 100365. [Google Scholar] [CrossRef]
Aljofey, A.; Jiang, Q.; Qu, Q.; Huang, M.; Niyigena, J.-P. An Effective Phishing Detection Model Based on Character Level Convolutional Neural Network from URL. Electronics 2020, 9, 1514. [Google Scholar] [CrossRef]
Odeh, A.J.; Keshta, I.; Abdelfattah, E. Efficient Detection of Phishing Websites Using Multilayer Perceptron. Int. J. Interact. Mob. Technol. (iJIM) 2020, 14, 22–31. [Google Scholar] [CrossRef]
Teoh, T.T.; Chiew, G.; Franco, E.J.; Ng, P.C.; Benjamin, M.P.; Goh, Y.J. Anomaly Detection in Cyber Security Attacks on Networks Using MLP Deep Learning. In Proceedings of the 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), Kuala Lumpur, Malaysia, 11–12 July 2018; pp. 1–5. [Google Scholar]
Hwang, Y.-S.; Bang, S.-Y. An Efficient Method to Construct a Radial Basis Function Neural Network Classifier. Neural Netw. 1997, 10, 1495–1503. [Google Scholar] [CrossRef]
Rapaka, A.; Novokhodko, A.; Wunsch, D. Intrusion Detection Using Radial Basis Function Network on Sequences of System Calls. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; IEEE: Piscataway, NJ, USA, 2003; pp. 1820–1825. [Google Scholar]
Jiang, J.; Zhang, C.; Kamel, M. RBF-Based Real-Time Hierarchical Intrusion Detection Systems. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; IEEE: Piscataway, NJ, USA, 2003; Volume 2, pp. 1512–1516. [Google Scholar]
Berman, D.S.; Buczak, A.L.; Chavis, J.S.; Corbett, C.L. A Survey of Deep Learning Methods for Cyber Security. Information 2019, 10, 122. [Google Scholar] [CrossRef]
Mahdavifar, S.; Ghorbani, A.A. Application of Deep Learning to Cybersecurity: A Survey. Neurocomputing 2019, 347, 149–176. [Google Scholar] [CrossRef]
Sarker, I.H. Deep Cybersecurity: A Comprehensive Overview from Neural Network and Deep Learning Perspective. SN Comput. Sci. 2021, 2, 154. [Google Scholar] [CrossRef]
Arivukarasi, M.; Antonidoss, A. Performance Analysis of Malicious URL Detection by Using RNN and LSTM. In Proceedings of the 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC), Erode, India, 11–13 March 2020; pp. 454–458. [Google Scholar]
Alhoniemi, E.; Hollmén, J.; Simula, O.; Vesanto, J. Process Monitoring and Modeling Using the Self-Organizing Map. Integr. Comput. Aided Eng. 1999, 6, 3–14. [Google Scholar] [CrossRef]
Marković, V.S.; Marjanović Jakovljević, M.; Njeguš, A. Anomalies Detection in the Application Logs Using Kohonen SOM Machine Learning Algorithm. In Proceedings of the SINTEZA 2020—International Scientific Conference on Information Technology and Data Related Research, Belgrade, Serbia, 26 June 2020; pp. 275–282. [Google Scholar] [CrossRef]
Ramadas, M.; Ostermann, S.; Tjaden, B. Detecting Anomalous Network Traffic with Self-Organizing Maps; Springer: Berlin/Heidelberg, Germany, 2003; pp. 36–54. [Google Scholar]
Mohammadi, M.; Al-Fuqaha, A.; Sorour, S.; Guizani, M. Deep Learning for IoT Big Data and Streaming Analytics: A Survey. IEEE Commun. Surv. Tutor. 2018, 20, 2923–2960. [Google Scholar] [CrossRef]
Yousefi-Azar, M.; Varadharajan, V.; Hamey, L.; Tupakula, U. Autoencoder-Based Feature Learning for Cyber Security Applications. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 3854–3861. [Google Scholar]
Wong, M.C.K. Deep Learning Models for Malicious Web Content Detection: An Enterprise Study; University of Toronto: Toronto, ON, Canada, 2019; ISBN 1392398290. [Google Scholar]
Do, N.Q.; Selamat, A.; Krejcar, O.; Herrera-Viedma, E.; Fujita, H. Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions. IEEE Access 2022, 10, 36429–36463. [Google Scholar] [CrossRef]
Ren, F.; Jiang, Z.; Liu, J. A Bi-Directional LSTM Model with Attention for Malicious URL Detection. In Proceedings of the 2019 IEEE 4th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chengdu, China, 20–22 December 2019; pp. 300–305. [Google Scholar]
Wu, Y.; Wei, D.; Feng, J. Network Attacks Detection Methods Based on Deep Learning Techniques: A Survey. Secur. Commun. Netw. 2020, 2020, 8872923. [Google Scholar] [CrossRef]
Hinton, G.E.; Osindero, S.; Teh, Y.-W. A Fast Learning Algorithm for Deep Belief Nets. Neural Comput. 2006, 18, 1527–1554. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, L.T.; Chen, Z.; Li, P. A Survey on Deep Learning for Big Data. Inf. Fusion 2018, 42, 146–157. [Google Scholar] [CrossRef]
Li, Y.; Ma, R.; Jiao, R. A Hybrid Malicious Code Detection Method Based on Deep Learning. Int. J. Secur. Its Appl. 2015, 9, 205–216. [Google Scholar] [CrossRef]
Asharf, J.; Moustafa, N.; Khurshid, H.; Debie, E.; Haider, W.; Wahab, A. A Review of Intrusion Detection Systems Using Machine and Deep Learning in Internet of Things: Challenges, Solutions and Future Directions. Electronics 2020, 9, 1177. [Google Scholar] [CrossRef]
Hinton, G.E. A Practical Guide to Training Restricted Boltzmann Machines; Springer: Berlin/Heidelberg, Germany, 2012; pp. 599–619. [Google Scholar]
Elseaidy, A.; Munasinghe, K.S.; Sharma, D.; Jamalipour, A. Intrusion Detection in Smart Cities Using Restricted Boltzmann Machines. J. Netw. Comput. Appl. 2019, 135, 76–83. [Google Scholar] [CrossRef]
Fiore, U.; Palmieri, F.; Castiglione, A.; De Santis, A. Network Anomaly Detection with the Restricted Boltzmann Machine. Neurocomputing 2013, 122, 13–23. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. Adv. Neural Inf. Process. Syst. 2014, 2, 2672–2680. [Google Scholar]
Hiromoto, R.E.; Haney, M.; Vakanski, A. A Secure Architecture for IoT with Supply Chain Risk Management. In Proceedings of the 2017 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications (IDAACS), Bucharest, Romania, 21–23 September 2017; IEEE: Piscataway, NJ, USA, 2017; Volume 1, pp. 431–435. [Google Scholar]
Zhao, X.; Fok, K.W.; Thing, V.L.L. Enhancing Network Intrusion Detection Performance Using Generative Adversarial Networks. Comput. Secur. 2024, 145, 104005. [Google Scholar] [CrossRef]
Geetha, R.; Thilagam, T. A Review on the Effectiveness of Machine Learning and Deep Learning Algorithms for Cyber Security. Arch. Comput. Methods Eng. 2021, 28, 2861–2879. [Google Scholar] [CrossRef]
Pooja, A.L.; Sridhar, M. Analysis of Phishing Website Detection Using CNN and Bidirectional LSTM. In Proceedings of the 2020 4th International Conference on Electronics, Communication and Aerospace Technology (ICECA), Coimbatore, India, 5–7 November 2020; pp. 1620–1629. [Google Scholar]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Glisic, S.G.; Lorenzo, B. Graph Neural Networks; Wiley: Hoboken, NJ, USA, 2022. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. Proc. AAAI Conf. Artif. Intell. 2019, 33, 7370–7377. [Google Scholar] [CrossRef]
Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Plaza, A.; Chanussot, J. Graph Convolutional Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 5966–5978. [Google Scholar] [CrossRef]
Zhang, S.; Tong, H.; Xu, J.; Maciejewski, R. Graph Convolutional Networks: A Comprehensive Review. Comput. Soc. Netw. 2019, 6, 11. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph Attention Networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Ma, X.; Yin, Y.; Jin, Y.; He, M.; Zhu, M. Short-Term Prediction of Bike-Sharing Demand Using Multi-Source Data: A Spatial-Temporal Graph Attentional LSTM Approach. Appl. Sci. 2022, 12, 1161. [Google Scholar] [CrossRef]
Kim, H.; Lee, B.S.; Shin, W.Y.; Lim, S. Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges. IEEE Access 2022, 10, 111820–111829. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, Y.; Li, Z.; Cheng, X.; Wang, Y.; Kotevska, O.; Derr, T. A Survey on Privacy in Graph Neural Networks: Attacks, Preservation, and Applications. IEEE Trans. Knowl. Data Eng. 2024, 36, 7497–7515. [Google Scholar] [CrossRef]
Zhao, C.; Xin, Y.; Li, X.; Zhu, H.; Yang, Y.; Chen, Y. An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks. Appl. Sci. 2020, 10, 8160. [Google Scholar] [CrossRef]
Ren, Y.; Wang, B.; Zhang, J.; Chang, Y. Adversarial Active Learning Based Heterogeneous Graph Neural Network for Fake News Detection. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 452–461. [Google Scholar]
Wang, Y.; Qian, S.; Hu, J.; Fang, Q.; Xu, C. Fake News Detection via Knowledge-Driven Multimodal Graph Convolutional Networks. In Proceedings of the 2020 International Conference on Multimedia Retrieval, Dublin, Ireland, 8–11 June 2020; ACM: New York, NY, USA, 2020; pp. 540–547. [Google Scholar]
Guo, Z.; Tang, L.; Guo, T.; Yu, K.; Alazab, M.; Shalaginov, A. Deep Graph Neural Network-Based Spammer Detection under the Perspective of Heterogeneous Cyberspace. Future Gener. Comput. Syst. 2021, 117, 205–218. [Google Scholar] [CrossRef]
Ouyang, L.; Zhang, Y. Phishing Web Page Detection with HTML-Level Graph Neural Network. In Proceedings of the 2021 IEEE 20th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Shenyang, China, 20–22 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 952–958. [Google Scholar]
Fang, Y.; Huang, C.; Zeng, M.; Zhao, Z.; Huang, C. JStrong: Malicious JavaScript Detection Based on Code Semantic Representation and Graph Neural Network. Comput. Secur. 2022, 118, 102715. [Google Scholar] [CrossRef]
Ariyadasa, S.; Fernando, S.; Fernando, S. Combining Long-Term Recurrent Convolutional and Graph Convolutional Networks to Detect Phishing Sites Using URL and HTML. IEEE Access 2022, 10, 82355–82375. [Google Scholar] [CrossRef]
Huang, Y.; Negrete, J.; Wagener, J.; Fralick, C.; Rodriguez, A.; Peterson, E.; Wosotowsky, A. Graph Neural Networks and Cross-Protocol Analysis for Detecting Malicious IP Addresses. Complex Intell. Syst. 2023, 9, 3857–3869. [Google Scholar] [CrossRef]
Zhang, B.; Li, J.; Chen, C.; Lee, K.; Lee, I. A Practical Botnet Traffic Detection System Using GNN; Springer: Cham, Switzerland, 2022; pp. 66–78. [Google Scholar]
Pan, Y.; Cai, L.; Leng, T.; Zhao, L.; Ma, J.; Yu, A.; Meng, D. AttackMiner: A Graph Neural Network Based Approach for Attack Detection from Audit Logs; Springer: Cham, Switzerland, 2023; pp. 510–528. [Google Scholar]
Yang, Z.; Pei, W.; Chen, M.; Yue, C. WTAGRAPH: Web Tracking and Advertising Detection Using Graph Neural Networks. In Proceedings of the 2022 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 22–26 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1540–1557. [Google Scholar]
Zhou, X.; Liang, W.; Li, W.; Yan, K.; Shimizu, S.; Wang, K.I.-K. Hierarchical Adversarial Attacks Against Graph-Neural-Network-Based IoT Network Intrusion Detection System. IEEE Internet Things J. 2022, 9, 9310–9319. [Google Scholar] [CrossRef]
Li, Y.; Li, R.; Zhou, Z.; Guo, J.; Yang, W.; Du, M.; Liu, Q. GraphDDoS: Effective DDoS Attack Detection Using Graph Neural Networks. In Proceedings of the 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hangzhou, China, 4–6 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1275–1280. [Google Scholar]
Lo, W.W.; Kulatilleke, G.; Sarhan, M.; Layeghy, S.; Portmann, M. XG-BoT: An Explainable Deep Graph Neural Network for Botnet Detection and Forensics. Internet Things 2023, 22, 100747. [Google Scholar] [CrossRef]
Ren, W.; Song, X.; Hong, Y.; Lei, Y.; Yao, J.; Du, Y.; Li, W. APT Attack Detection Based on Graph Convolutional Neural Networks. Int. J. Comput. Intell. Syst. 2023, 16, 184. [Google Scholar] [CrossRef]
Gao, M.; Wu, L.; Li, Q.; Chen, W. Anomaly Traffic Detection in IoT Security Using Graph Neural Networks. J. Inf. Secur. Appl. 2023, 76, 103532. [Google Scholar] [CrossRef]
Bao, H.; Li, W.; Wang, X.; Tang, Z.; Wang, Q.; Wang, W.; Liu, F. Payload Level Graph Attention Network for Web Attack Traffic Detection; Springer: Cham, Switzerland, 2023; pp. 394–407. [Google Scholar]
Maksimoski, A.; Woungang, I.; Traore, I.; Dhurandher, S.K. Bonet Detection Mechanism Using Graph Neural Network; Springer: Cham, Switzerland, 2023; pp. 247–257. [Google Scholar]
Lin, H.-C.; Wang, P.; Lin, W.-H.; Lin, Y.-H.; Chen, J.-H. Malware Detection and Classification by Graph Neural Network. In Proceedings of the 2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE), Yunlin, Taiwan, 27–29 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 623–625. [Google Scholar]
Hosseini, D.; Jin, R. Graph Neural Network Based Approach for Rumor Detection on Social Networks. In Proceedings of the 2023 International Conference on Smart Applications, Communications and Networking (SmartNets), Istanbul, Turkey, 25–27 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Hussain, S.; Nadeem, M.; Baber, J.; Hamdi, M.; Rajab, A.; Al Reshan, M.S.; Shaikh, A. Vulnerability detection in Java source code using a quantum convolutional neural network with self-attentive pooling, deep sequence, and graph-based hybrid feature extraction. Sci. Rep. 2024, 14, 7406. [Google Scholar] [CrossRef] [PubMed]
Shi, S.; Chen, J.; Wang, Z.; Zhang, Y.; Zhang, Y.; Fu, C.; Qiao, K.; Yan, B. SStackGNN: Graph Data Augmentation Simplified Stacking Graph Neural Network for Twitter Bot Detection. Int. J. Comput. Intell. Syst. 2024, 17, 106. [Google Scholar] [CrossRef]
Zhang, G.; Zhang, S.; Yuan, G. Bayesian Graph Local Extrema Convolution with Long-tail Strategy for Misinformation Detection. ACM Trans. Knowl. Discov. Data 2024, 18, 89. [Google Scholar] [CrossRef]
Zhou, Y.; Pang, A.; Yu, G. Clip-GCN: An adaptive detection model for multimodal emergent fake news domains. Complex Intell. Syst. 2024, 10, 5153–5170. [Google Scholar] [CrossRef]
Altaf, T.; Wang, X.; Ni, W.; Yu, G.; Liu, R.P.; Braun, R. GNN-Based Network Traffic Analysis for the Detection of Sequential Attacks in IoT. Electronics 2024, 13, 2274. [Google Scholar] [CrossRef]
Pattanaik, B.; Mandal, S.; Tripathy, R.M.; Sekh, A.A. Rumor detection using dual embeddings and text-based graph convolutional network. Discov. Artif. Intell. 2024, 4, 86. [Google Scholar] [CrossRef]
Chukka, R.B.; Suneetha, M.; Ahmed, M.A.; Babu, P.R.; Ishak, M.K.; Alkahtani, H.K.; Mostafa, S.M. Hybridization of synergistic swarm and differential evolution with graph convolutional network for distributed denial of service detection and mitigation in IoT environment. Sci. Rep. 2024, 14, 30868. [Google Scholar] [CrossRef]
Guo, W.; Du, W.; Yang, X.; Xue, J.; Wang, Y.; Han, W.; Hu, J. MalHAPGNN: An Enhanced Call Graph-Based Malware Detection Framework Using Hierarchical Attention Pooling Graph Neural Network. Sensors 2025, 25, 374. [Google Scholar] [CrossRef]

Figure 1. RNN architecture [14].

Figure 2. LSTM Cell [23].

Figure 3. Deep Belief Network [30].

Figure 4. Generative Adversarial Network [30].

Figure 5. CNN model [37].

Figure 6. The general design pipeline for a GNN model [40].

Figure 7. Graph Convolution Network [41].

Figure 8. Multi-Head Attention Method [45].

Figure 9. Comparative Architecture of GNN Models Cyberattack Detection.

Table 1. Summary of GNN algorithms.

No	Study	Year	Algorithm	Description	Limitations
1	[49]	2020	GAT	To construct a detection model, the paper proposed using aggregate neighbor relationships and features. Due to the model’s ability to learn complex method, it can integrate different neighbor relationships.	The method relies on the abundance of hidden features that may not always be available. In addition, constructing the detection model is based on structural data that may affect accuracy.
2	[50]	2020	GNN	A novel hierarchical attention method is applied by the model in HIN for the purpose of learning node representations. The subsequent detection of fake news is accomplished by AA-HGNN through the classification of new nodes during active learning.	Utilizing the model with complicated and large amounts of data may result in inadequate performance.
3	[51]	2020	GCN	A unified framework is created for the detection process by combining visual and textual information. To improve the process and provide semantic information. However, the model converts them into a graph and subsequently derives external knowledge from the real world, which is represented as nodes.	The scalability of the extraction process may be restricted by its high cost.
4	[52]	2021	GNN	The proposed model suggests a method to strengthen feature expressions and generate more spaces.	The model may not be equipped to handle spammers who use advanced programs to conceal their tracks and identities.
5	[53]	2021	GCN	The model uses the DOM trees in HTML source codes to construct a graph. It employs an embedding method and recurrent neural networks (RNNs) to identify the important features within the nodes and represent their semantics.	The method that uses URLs as input was found to be less accurate than other methods that use HTML as input.
6	[54]	2022	GAT	Flow information is managed by the model, which embeds the nodes and edges of the graph as well as generates an abstract syntax tree that incorporates flow.	Lack of updated data in addition to the comprehensive dataset, which contains a wide range of malicious codes, including JavaScript.
7	[55]	2022	GCN	The model is made up of two distinct components: URLDET and HTMLDET. To process URLs and HTML, these models were developed using long-term recurrent convolutional networks (LRCN) and GCN, respectively.	At times, the model may not be saved correctly, which could adversely affect its performance.
8	[56]	2022	GCN	The identification of malicious IPs will be utilized by both an innovative approach and the random forest model. The web protocol and targeted features from email will be combined and discriminant features will be incorporated into the graph.	The model’s features were not enhanced enough to significantly impact its performance.
9	[57]	2022	GNN	The research proposes a novel approach to detect traffic using graph neural network (GNN) models to classify nodes. The present system uses three distinct modules: Data classification module, data processing module, and a module for visualizing results. The modules work together in order to apply data encoding, data visualization, and feature extraction.	The network topology diagram only tested botnet traffic. Other anomalous traffic such as virus attacks or worm traffic were not tested.
10	[58]	2022	GAT	The model combines deep learning techniques and provenance graph causal analysis to build a graph model. It identifies whether an attack has occurred by building key patterns. The input audit logs are then used to create provenance graphs.	N\A
11	[59]	2022	GNN	HTTP network traffic is represented using a homogenous multigraph in the model. It formulates advertising and web tracking detection via GNN-based edge representation learning.	The CDN server may experience performance issues when detecting new WTA requests.
12	[60]	2022	GCN	This research introduces a novel method called hierarchical adversarial attack (HAA). One of its objectives is to create a black-box adversarial assault technique that is level-aware. This approach aims to identify intrusions in IoT systems that rely on graph neural networks (GNN) and have limited resources. The approach involves developing a shadow graph neural network (GNN) model by employing an intelligent approach that includes a map to generate adversarial samples. The model works well in recognizing an important feature accurately.	N\A
13	[61]	2022	GNN	The model creates an endpoint traffic graph that includes structural relationships.	The DDoS dataset may not be effectively tested across different size or traffic amounts.
14	[62]	2023	GCN	The XG-BoT uses a graph isomorphism network and a reversible residual connection to significant important node representations from botnet communication networks.	The model is experiencing issues with its network communications.
15	[63]	2023	GCN	The paper presents a method for detecting advanced persistent threat (APT) attacks. The method employs graph convolutional networks (GCN) to identify vulnerabilities and establish relationships between existing APT threats. This is carried out by analyzing the names of software security entities from CAPEC, CWE, and CVE and using these data to form a graph representing APT attack practices.	There is a possibility of information loss when simplifying the heterogeneous APT attack graph.
16	[64]	2023	GCN	The Internet of Things (IoT) has been represented by graphs including nodes that are implemented to optimize efficiency. To enhance the graph convolutional network, a meta-path-based aggregation strategy is used, which produces a representation of the graph nodes with low dimensions.	Only two downloaded datasets have been used to test the model.
17	[65]	2023	GAT	An independent graph is generated for each pre-processed payload. In the global matrix of features, the node representation is shared. Then, the model uses GAT to train graph classification models.	The model may not be capable enough to effectively tackle the problem of traffic encryption.
18	[66]	2023	GCN	A graph neural network (GNN) is employed in the model to facilitate the efficient detection of malicious activities within botnets through the use of supervised learning models. The model was evaluated using labeled datasets and five metrics.	The model may not be effective when compared to other benchmark models and datasets.
19	[67]	2023	GCN	The malware and its variants will be classified using a graph convolutional network (GCN) that determines their potential features. Cuckoo Sandbox logs are also employed in this model.	The model may not be as effective as other models when compared.
20	[68]	2023	GAT	The model consists of three steps: first it creates a graph. Second, it incorporates different features into this graph-based approach. Finally, it employs a GAT to integrate neighboring information with these features.	The results of the model are based on inadequate data.
21	[69]	2024	GCN	A hybrid GCN-RFEMLP model integrated with CodeBERT utilizes quantum convolutional neural networks along with self-attentive pooling to identify vulnerabilities in Java code by analyzing code patterns and structures.	N\A
22	[70]	2024	GNN	The stacked graph neural network employs streamlined graph stacking techniques to enhance data augmentation, with the goal of improving GNN performance in semi-supervised learning tasks.	N\A
23	[71]	2024	GNN	A Bayesian graph convolutional network incorporates local extrema convolution to tackle challenges in long-tail graph data, enhancing node representation.	N\A
24	[72]	2024	GCN	The Clip-GCN model comprises three modules: cross-modal feature extraction, domain detection, and news detection. The feature extraction module captures semantic relationships between modalities, the domain detection module extracts domain-invariant features, and the news detection module utilizes cross-domain knowledge for news authenticity detection.	N\A
25	[73]	2024	GCN	The model utilizes a novel GGCN architecture combined with sequential analysis to capture and analyze the temporal dynamics of network traffic, enhancing its ability to detect and respond to evolving cyber threats.	Multiclass classification increases complexity due to feature overlap and data imbalance, affecting generalization. The model struggled with diverse attacks in the Mirai dataset, leading to lower performance than BoT-IoT.
26	[74]	2024	GCN	A novel rumor detection model that integrates dual embeddings from BERT and GPT with a graph-based approach. The system constructs text-based graphs to capture contextual relationships between rumors and their propagation on social media.	The model performed well on PHEME but struggled on Twitter15, possibly due to data quality, hyperparameter tuning, or graph structure. Relying solely on BERT and GPT may be insufficient, and the reasons for performance issues lack experimental validation.
27	[75]	2024	GCN	The SSODE-GCNDM method begins with Z-score normalization to standardize input data. It employs the SSO-DE approach for feature selection, while the GCN technique is utilized to detect and mitigate attacks. Finally, the NGO method is applied to fine-tune the parameters of the GCN model, enhancing its effectiveness.	The model is sensitive to data quality, prone to overfitting with high-dimensional data, and limited by its reliance on specific optimization algorithms.
28	[76]	2025	GCN	The MalHAPGNN model utilizes enhanced call graphs and GNN to identify malware behaviors by examining call sequences and execution patterns in programs.	N/A

Note: “N/A” (Not Applicable) indicates that the original study did not explicitly report any limitations. This does not imply that the study has no limitations.

Table 2. List of available datasets for attack detection using GNN.

No	Detection Task	Dataset	Data Type	Description	Related Work
1	Bot Detection	Twitter 1KS-10KN	Twitter user activities	Uses Twitter user activity and interaction patterns such as retweets and followers to build graphs that help identify automated bots.	[49]
2	Fake Content Detection	PolitiFact Buzz feed	Social media posts, images	Models content relationships using social media posts, images and links to detect fake or misleading information.	[50]
3	Fake Content Detection	Pheme WEIBO	Tweets, News and Images		[51] [72] [71]
4	Spam Detection	Weibo	Social media posts, accounts	Analyzes social platform posts and user accounts, using graph-based models to identify repetitive or coordinated spam behavior.	[52]
5	Web Content	Open Phish, Phish Tank	URLs and HTML	Focuses on detecting malicious or deceptive web content such as phishing pages and scripts.	[53] [54]
6		Open Phish, Phish Tank	URLs and HTML		[55]
7		Mcafee	Emails, Web and DNS		[56]
8	Malware	Tranco	Benign and Malicious Java script Codes, DDos.	Involves datasets collected from scripts or APKs labeled as malicious or benign.	[54] [67] [69] [76]
9	Audit log detection	DARPA TC	Audit Log Activities	Uses audit logs from monitored systems to build temporal graphs, allowing detect advanced persistent threats.	[58]
10	Advertising Detection	Chromium	HTTP requests, DOM, API Access for Webpages	Detects abnormal advertising behaviour and tracking using user interaction data such as chromium activities	[59]
11	Advertising Detection	UNSW-SOSR2019, Bot-IoT and mirai	IoT Attack Traces		[60] [73]
12	Traffic Detection	CIC-IDS2017, CIC-DoS2017	DoS and DDoS Traffic	Utilizes network flow datasets to generate communication graphs between nodes, helping detect patterns of DDoS or other volumetric attacks.	[60] [61]
13		CTU-13	Network Flows, HTTPs, DDoS		[62] [66]
14		CTU-13	Network Flows, HTTPs, DDoS		[62] [66]
15		CIDA	IP Traces		[57]
16		Bot-Iot, Isot, CICDDoS2019	Network Traffic		[64] [75]
17		CSIC2010 FWAF, TBWIDD BDCI2022	HTTPs Traffic		[65]
18	Web, application and Service Content	CVE, CWE, CAPEC	CVE, CWE, CAPEC, and Apt Reports	Analyzes source code and threat intelligence to detect structural software vulnerabilities by modeling functions or APIs	[63] [70]
19	Rumour Source Detection	Pheme, Twitter15	Tweets, Posts	Models tweet threads and reply interactions to trace back the source of misinformation using graph models.	[68] [74]

Table 3. Summary of GNN models results.

No	PRECISION (%)	RECALL (%)	F1-Score (%)	ACCURACY (%)
[49]	93%	88%	91%	-
[68]	-	-	80%	82%
[54]	99.93%	99.96%	99.96%	99.95%
[58]	100.00%	99.12%	99.56%	97.72%
[50]	72.11%	69.09%	70.57%	73.51%
[52]	-	-	93.46%	-
[61]	95.05%	94.07%	94.56%	97.51%
[51]	87.62%	87.65%	87.64%	87.56%
[56]	88.90%	58.03%	90.22%	85.28%
[53]	93.45%	80.25%	86.34%	95.50%
[55]	96.40%	96.44%	96.42%	96.42%
[57]	99.4%	98.4%	98.91%	-
[59]	98.38%	96.25%	97.18%	97.90%
[62]	99.63%	99.42%	99.52%	-
[63]	95.8%	95.8%	95.8%	95.9%
[64]	93.2%	99.7%	96.3%	-
[66]	100%	77%	87%	78%
[67]	96.86%	96.72%	-	97.78%
[69]	98.00%	96.00%	97.00%	99.00%
[70]	-	-	97.59%	97.59%
[71]	-	-	88.39%	86.81%
[72]	87.56%	87.01%	87.06%	87.79%
[73]	98.95%	98.68%	98.88%	98.86%
[74]	-	-	-	88.64%
[75]	99.72%	99.62%	99.67%	99.62%
[76]	98.96%	98.78%	98.87%	98.90%

Table 4. Summary of future study directions.

Study	Future Direction
[49]	Authors aim to enhance the performance of spam bot detection in social networks.
[51]	The authors intend to employ an improved method and extract visual information for the purpose of improving the current model’s ability to identify and classify fake news.
[54]	The future path of the research will focus on large-scale detection of harmful behavior and analysis of JavaScript code.
[56]	The authors intend to perform experiments on larger datasets and investigate approximation methods that include other forms of malicious traffic, including worms or viruses, that target network traffic.
[57]	The current model did not test whether other malicious traffic such as worms or viruses attack traffic. In addition, authors intend to expand the testing process
[59]	Authors plan to study more efficient attacks to test the performance effectiveness.
[63]	Authors intend to extract additional threat indicators from APT threat data to improve performance.
[64]	The authors’ objective is to enhance the efficiency of the current model in order to more effectively address the trend of traffic encryption.
[66]	Authors want to evaluate the model by testing it with many different datasets, in addition to comparing its performance with that of existing benchmark models.
[68]	Given the constraints of the current rumour analysis dataset, authors intend to generate a more varied dataset.
[69]	Authors intend to expand the system to support additional programming languages to evaluate its effectiveness across various codebases, and additionally, to investigate its potential applications in NLP tasks to enhance time efficiency, reduce costs, and optimize memory usage.
[71]	Authors intend to expand the current neural network into hyperbolic space, which could improve the learning of propagation structures.
[73]	Authors intend to focus on improving the model’s performance in multiclass classification tasks.
[76]	Authors intend to offer more detailed insights into malware functions, thereby improving the interpretability of malware detection results.

Note: Only studies that explicitly stated future research directions are included in this table.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alshehri, S.M.; Sharaf, S.A.; Molla, R.A. Systematic Review of Graph Neural Network for Malicious Attack Detection. Information 2025, 16, 470. https://doi.org/10.3390/info16060470

AMA Style

Alshehri SM, Sharaf SA, Molla RA. Systematic Review of Graph Neural Network for Malicious Attack Detection. Information. 2025; 16(6):470. https://doi.org/10.3390/info16060470

Chicago/Turabian Style

Alshehri, Sarah Mohammed, Sanaa Abdullah Sharaf, and Rania Abdullrahman Molla. 2025. "Systematic Review of Graph Neural Network for Malicious Attack Detection" Information 16, no. 6: 470. https://doi.org/10.3390/info16060470

APA Style

Alshehri, S. M., Sharaf, S. A., & Molla, R. A. (2025). Systematic Review of Graph Neural Network for Malicious Attack Detection. Information, 16(6), 470. https://doi.org/10.3390/info16060470

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Systematic Review of Graph Neural Network for Malicious Attack Detection

Abstract

1. Introduction

2. Background and Previous Work

2.1. Machine Learning Algorithm

2.1.1. Supervised Learning Models

2.1.2. Unsupervised Learning Models

2.1.3. Reinforcement Learning Models

2.2. Deep Learning Algorithm

2.2.1. Multilayer Perceptrons (MLPs)

2.2.2. Radial Basis Function Network (RBFNs)

2.2.3. Recurrent Neural Networks (RNNs)

2.2.4. Self-Organizing Maps (SOMs)

2.2.5. Autoencoders

2.2.6. Long Short-Term Memory Networks (LSTMs)

2.2.7. Deep Belief Networks (DBNs)

2.2.8. Restricted Boltzmann Machines (RBMs)

2.2.9. Generative Adversarial Network (GANs)

2.2.10. Convolutional Neural Networks (CNNs)

2.2.11. Graph Neural Networks (GNNs)

2.3. Related Work

3. Method

Study Strategy

4. Primary Studies

4.1. Study Selection

4.2. Dataset

5. Discussion

5.1. Limitations and Challenges

5.2. Performance Evaluation and Metric

5.3. Future Research Directions

5.4. Comparative Analysis of GNN Models

5.5. Tools and Dataset Diversity

5.6. Summary of Research Questions

6. Conclusions and Future Work

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI