A Causal Graph-Based Approach for APT Predictive Analytics

Liu, Haitian; Jiang, Rong

doi:10.3390/electronics12081849

Open AccessArticle

A Causal Graph-Based Approach for APT Predictive Analytics

by

Haitian Liu

^*,†

and

Rong Jiang

^†

College of Computer Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2023, 12(8), 1849; https://doi.org/10.3390/electronics12081849

Submission received: 9 March 2023 / Revised: 30 March 2023 / Accepted: 10 April 2023 / Published: 13 April 2023

(This article belongs to the Section Networks)

Download

Browse Figures

Versions Notes

Abstract

:

In recent years, complex multi-stage cyberattacks have become more common, for which audit log data are a good source of information for online monitoring. However, predicting cyber threat events based on audit logs remains an open research problem. This paper explores advanced persistent threat (APT) audit log information and uses a combination of causal graphs and deep learning techniques to perform predictive analysis of APT. The study focuses on two different methods of constructing malicious activity scenarios, including those based on malicious entity evolving graphs and malicious entity neighborhood graphs. Deep learning networks are then utilized to learn from past malicious activity scenarios and predict specific malicious attack events. To validate the effectiveness of this approach, audit log data published by DARPA’s Transparent Computing Program and restored by ATLAS are used to demonstrate the confidence of the prediction results and recommend the most effective malicious event prediction by Top-N.

Keywords:

APT; causal graph; evolving graph; neighborhood graph; deep learning; prediction

1. Introduction

Cyberattacks have become more common, which can often cause significant economic damage and can even hinder the operation of core public services. In addition, advanced, persistent cyber threats have recently re-emerged due to the advent of the Internet of Things and the increased number of interconnected devices [1]. For example, in May 2017, the “WannaCry” ransomware attack was detected after targeting 200,000 servers in over 150 countries [2]. In the same year, another form of the same attack caused disruptions to most government websites and several companies in Ukraine, and eventually, the attack spread globally [3].

The forms of network attacks are complex and diverse, and more types of attacks have changed from simple one-step attacks to new composite attacks [4]. In response, the techniques used by attackers to attack computer systems and networks have reached an unprecedented level of sophistication, using a combination of multiple steps to achieve their goals in a premeditated manner, represented by the presence of advanced persistent threats (APTs) [5,6].

It requires the execution of a series of attack stages; however, the individual stages may be benign or malicious, and very occasionally, each attack stage can behave as a benign stage without raising any suspicion. In addition, attacks may last for weeks or years, and traditional intrusion detection systems (IDSs) may not be able to detect these attacks due to the time variation between attack stages [3]. A new intrusion detection model is necessary to address these threats, identify the ongoing attacks early, and anticipate the attacker’s further strategies as much as possible. This approach will provide network analysts with a foundation for preventing attacks. Yet, detection and prediction of various types of dynamic attacks is always a challenging task [7].

In this regard, a variety of different mechanisms can be used to achieve the detection and prediction of multi-stage attacks. These mechanisms include discrete models such as attack graphs, Bayesian networks, Markov models, and game theory or continuous models such as time series and grey models. Among the various graph models, causal graphs appear to be an ideal threat analysis approach, linking causal events in a system, with powerful semantic representation and attack history correlation capabilities.

Audit log data are a good source of information for online monitoring and anomaly/attack detection, considering that they record system status and significant events at various critical moments to help debug performance problems and failures, and for root cause analysis. In addition, as system logs record noteworthy events that occur during active running processes, such log data are universally available in almost all computer systems. This makes it a natural advantage to construct causal graphs from audit log data, which is moreover a very common practice.

However, the prediction of multi-stage attacks based on causal graphs remains an open problem, and previous research on cyber threat events based on causal graphs has mostly stopped at the detection of malicious events that have already occurred and the tracing of attack scenarios, while rarely considering the speculation of specific malicious attacks that will occur next.

When faced with predicting malicious events in multi-stage attacks based on audit log data and causal graphs, there are a number of challenges that need to be considered, including but not limited to the following challenging issues.

(1): The log data themselves are unstructured and may come from different operating platforms, their format and semantics may vary from platform to platform, and it is already challenging to use unstructured logs to diagnose problems
(2): How can one reduce log complexity, minimize data storage size, and balance the space efficiency of causal graph storage with the time efficiency for attack investigation while maintaining the original semantics in audit log data?
(3): Despite experiencing the same type of attack, there is no definitive pattern indicating that a malicious event will always precede or follow another. It is possible to observe unrelated noisy events first or multiple malicious events occurring simultaneously from different adversary groups with distinct attacks.
(4): How can one design an efficient and robust malicious event prediction algorithm that ensures prediction accuracy while minimizing the response time for investigative forensics?
(5): There is a lack of a standard public dataset available to provide real log data from different multi-stage attacks and similarly, a lack of a solid standard to quantify and measure the malicious event prediction performance of such an architecture

In order to meet the above challenges, this paper proposes a novel multi-stage attack malicious event predictive analysis framework, combined with natural language processing and deep learning techniques. Log data are generated by programs that follow a rigorous set of logic and control streams, similar to natural language. Event log entries can be viewed as sequence elements that follow specific patterns and grammar rules, despite being more structured and constrained. It is believed that different attacks may share similar abstract attack strategies and that the key stages usually share comparable patterns at the entity and action level, resulting in the triggering of a similar sequence of malicious events. When forecasting, it is possible to predict the remaining malicious events that will occur next, given a series of known malicious events. The task at hand involves developing a sequence prediction function that accurately learns and predicts the sequence of malicious events:

f : \{e_{1}, e_{2}, \dots, e_{t}\} \to e_{t + 1}

(1)

which accepts a variable length input sequence

\{e_{1}, e_{2}, \dots, e_{t}\}

and predicts the target event

e_{t + 1}

. Based on the advantages of deep learning itself, our prediction system should be able to understand and make predictions given the event sequence of variable length as context, which will better meet real-world scenarios and different prediction needs.

Although fixed-length prediction may not always be ideal, it still holds value. The use of sequence length evaluation helps to determine whether the predicted results are primarily due to long-term memory or short-term memory. Furthermore, it provides insight into the circumstances under which the best results for predicting malicious events can be achieved. This article is organized as follows. Section 2 presents related work on cyber threat prediction analysis, focusing on different use cases based on the graph models. In Section 3, it introduces the definition and the overall framework, and in Section 4, it introduces the proposed architecture itself more specifically. In Section 5, the performance evaluation and measurement of the model are given. Finally, the paper is summarized in Section 6, paving the way for future work.

2. Related Work

Unlike common network attack detection, the analysis and investigation of an attack often begins after the attack is completed. Predictive analysis of network attacks is more focused on timeliness and is faster to be effective so that users can intervene in ongoing attacks or system performance issues.

Typically, predictive methods in cybersecurity use discrete models to represent attacks or network security situations. Clear examples are graphical models of attack processes or game-theoretic representations of interactions between attackers and defenders.

Figure 1 provides a simple profile of the cybersecurity use case, based on the discrete models that compose the approach under consideration. When it comes to predictive analytics of multi-stage attacks, the focus is primarily on attack projection. This involves recording the attacker’s behavior and constructing an attack description for future reference. If a series of events conform to the attack pattern, it can be assumed that the attack will continue along the same lines. In addition, researchers may be more interested in predicting novel attacks rather than analyzing previously observed attacks. Alternatively, researchers may prioritize forecasting the overall security situation rather than examining individual attacks.

Among the various graph models, the attack graph is a graphical representation of an attack scenario proposed by Phillips and Swiler [8] in 1998 and quickly became a popular formal attack representation. Hughes et al. [9] provided an effective method for analyzing and predicting network threats based on network models in 2003, which is considered as one of the earliest practical approaches to attack graphs and an effective static analysis method. Based on this, Polatidis et al. [10,11] constructed attack graphs using information about the underlying infrastructure and proposed a method for predicting network attacks using attack graphs and recommendation systems.

Another practical approach for attack prediction is using Bayesian networks, which are closely related to prediction methods based on attack graphs, as Bayesian networks are often constructed based on attack graphs. For example, Bayesian attack graphs are attack graphs in the form of Bayesian networks [12].

Hidden Markov models (HMMs) have been widely used in intrusion detection and attack prediction methods due to their ability to eliminate the dependence on complete information in graphical models, particularly when unobservable states and transitions exist. An early example is the alert correlation and prediction system proposed by Farhadi et al. [13] in 2011, which uses the Attack Scenario Extraction Algorithm (ASEA) to correlate and extract important alerts and then applies HMM for predictive analysis of intruder behavior. Another example is a new method based on HMM proposed by P. Holgado et al. [14] in 2020, which considers hidden states as similar stages of specific types of attacks and can easily adapt to multi-stage attacks and anticipate the attacker’s subsequent stages. Similarly, T. Shawly et al. [15] proposed a novel framework in 2021 based on HMM modeling to address the challenges of modeling and detecting complex network attacks (such as multiple interleaved attacks), which have not been addressed by previous methods.

Unlike the various graphical methods mentioned above, knowledge graphs are more geared toward dealing with larger and more dynamically changing real-time network attacks. For example, Jia Yan et al. [7] proposed a method for network security knowledge graph and deduction rules based on the five-tuple model in 2018. Qi et al. [4] further stored prior knowledge in the network security knowledge graph and attack rule library as computer-understandable data and then mined attack chains from massive data with temporal and spatial constraints, thus proposing an attack analysis framework for a network attack and defense testing platform.

In contrast to other graph models, causal/dependency graphs are often not directly applied to the problem of proactive attack prediction analysis but are widely used as a promising tool for the problem of APT attack detection. It has a strong abstract representation and relatively high efficiency to abstract the interactions between components in opaque systems through a high-fidelity and visible approach, enough to link events in the system with cause and effect, regardless of the time between events. Thus, a comprehensive understanding of the entire attack is possible, which provides a natural observation platform for the predictive analysis of cyberattacks.

Causal graphs are more commonly used in the detection of APT and the backtracking of attack scenarios and are also known as dependency graphs or provenance graphs. In Backtracker [16,17], researchers first explored the problem of piecing together the causal chains leading to an attack, i.e., the concept of attack tracing, based on the dependency graph for OS-level attack tracing, where backtracking is able to traverse the entire historical context of system execution by given a detection point. Subsequent studies [18,19] have improved the accuracy of the dependency chains constructed by Backtracker. However, these efforts run in a purely forensic setting, i.e., backtracking all relevant events of the entire attack scenario, which requires a complete traceability graph and excessive manual intervention that is neither timely nor efficient. It cannot cope with the analysis of attack activities executed in real time, much less include proactive attack prediction.

For the current causal graph-based threat analysis system, first, a comprehensive system can be divided into three modules: the data collection module, the data management module, and the threat detection module. Each module contains several components that address different research questions. In the end-to-end model, each module can be considered independently of the other. In the proposed causal graph-based malicious event prediction model, the first two modules are not significantly different from other traceability/causal graph-based threat detection systems.

Second, an ideal traceability graph-based threat analysis system needs to consider three attributes simultaneously: fast response, high efficiency, and high accuracy [20]. However, even after pruning, the size of the causal graph is very large. Therefore, threat analysis based on causal graphs may introduce high space and computational overhead. In previous work on causal graph-based threat detection, many attempts have been made by researchers to find a balance between these three properties. Based on the main detection designs, these approaches can be classified into three categories.

Finally, these approaches can be broadly divided into three categories based on the main design options for attack detection. The tag propagation-based approaches (Hossain et al., 2017 [21]; Milajerdi et al., 2019 [22]) try to store system execution history incrementally in tags and utilize the tag propagation process to trace the causality. These algorithms have a roughly linear time complexity. Moreover, they can take streaming graphs as input and respond fast. The abnormal detection approaches (Hassan et al., 2019 [23]; Liu et al., 2018 [24]; Xie et al., 2018, 2021 [25,26]) try to identify anomalous interactions between nodes. Therefore, these approaches will simulate normal behaviors by collecting historical data or data from parallel systems. The graph-matching-based approaches (Han et al., 2020 [27]; Liu et al., 2019 [28]) try to identify suspicious behavior by matching sub-structures in graphs. However, graph matching is computationally complex. Researchers have tried to extract graph features through graph embedding or graph sketching algorithms or using approximation methods.

As a powerful artificial intelligence technology, deep learning has been widely applied in various fields such as computer vision, natural language processing, and bioinformatics. It can learn complex patterns and relationships and extract valuable information from large amounts of data. This makes it possible to combine with various traditional models and algorithm techniques, greatly improving the automation and efficiency of models. In recent years, it has also been used in threat prediction tasks based on various function environments. For example, Deeplog proposed by Du et al. [29] uses a deep neural network model with long short-term memory (LSTM) to model system logs as natural language sequences. However, although it makes predictions, it is essentially an anomaly detection model rather than a prediction model. If the error between the predicted and observed value vectors is within the high confidence interval of the Gaussian distribution, the parameter value vector of the incoming log entry is considered normal; otherwise, it is considered abnormal. Deepag [30] further proposes a new method for threat detection and attack path prediction using bi-directional deep learning based on Deeplog. Unlike the previous two methods, Tiresias [31] does not consider system logs but models security events themselves and demonstrates the feasibility of predicting security events through a recurrent neural network with recurrent memory cells, filling the gap in predicting the specific steps that attackers will take when carrying out attack activities. In addition, there are other examples of attack prediction, such as the prediction of system calls [32] and the combination of attack prediction and network security situation forecasting, using deep learning to predict different types of threats [33,34]. Prior to this, research on predicting attacks focused more on binary results. For example, Bilge et al. [35] proposed a system that analyzes the binary appearance logs of machines to predict which machines are at risk of infection, that is, whether attacks will occur.

3. Preliminaries

3.1. Definitions

Causal Graph. A causal graph G is a data structure extracted from audit logs, typically used for traceability tracing, indicating causal relationships between subjects (e.g., processes) and objects (e.g., files or connections). The causal graph consists of nodes, which represent subjects and objects, connected to edges, which represent actions (e.g., read or connect) between subjects and objects. In this study, a directed cyclic causal graph is considered, with edges pointing from a subject (source) to an object (destination).

Neighborhood Graph [36]. Given a causal graph, two nodes u and v are said to be neighbors if they are connected by an edge. The neighborhood of node n is a subgraph of G consisting of node n and the neighboring nodes of node n with edges of node n. Similarly, a uniform neighborhood graph is created by extracting all nodes {n₁, n₂, …, n_n} and the edges that connect them to their neighbors.

T-Evolving Graph [16]. Each entity in the evolving graph has a time threshold associated with it, which is the longest time an event can occur and be considered relevant to that entity. Based on this, the malicious entity detected at the current time point is used for initialization, and back-tracing is performed based on its acceptable time threshold T. By taking into account all the events that would have an impact on the malicious entity before T, a T-evolving graph based on the malicious entity is constructed.

Entity. The entity e is the unique system subject or object extracted from the causal graph, where it is represented as a node. The entities under consideration comprise processes, files, and network connections (i.e., IP addresses and domain names). For example, winword.exe_21 is a subject that represents a process instance of the MS Word application with a process name and ID, while 192.0.0.1:80 is an object that represents an IP address 192.0.0.1 with a port number 80.

Event. An event ε is a quartet (src, action, dest, t) where the source (src) and the destination (dest) are the two entities associated with an action. t is the event timestamp that shows when the event occurred. For example, given an entity Firefox.exe, an action open, and a timestamp t from node Firefox.exe to node Word.doc, then (Firefox.exe, open, Word.doc, t) is the event in which the Firefox process opens the Word file at time t.

3.2. Architecture Overview

Figure 2 depicts the proposed malicious event prediction framework, which combines natural language processing and deep learning techniques integrated into the causal graph. The framework is divided into two components, based on the training of the deep learning model and the real-time working process: (a) deep model learning based on malicious event sequences and (b) predictive analysis of malicious events in real time.

During the deep model learning based on malicious event sequences, existing attack patterns are mined using causal graphs, and their internal logic is understood through a concept similar to natural language processing. These attack patterns are then learned using deep learning techniques.

To increase the automation of multi-stage attack analysis, support fast detection and real-time analysis, and reduce storage and operation overhead, historical audit log data from various sources are transformed into a platform-independent causal graph by extracting essential system operation events and storing them in a graph data structure. The causal graph structure is stored in a graph database, which is a commonly used NoSQL database that stores data as nodes with edges and provides a semantic query interface for network analysts. This enables the execution of graph algorithms, such as backtracking and graph alignment, with ease.

Next, known malicious event sequences are extracted from the graph database, and the sequences are converted into a generalized context that represents the sequence pattern of semantic interpretation using word form reduction (lemmatization). The known multi-stage attack scenarios are then restored by combining the malicious event sequences after lemmatization. This paper considers both the neighborhood graph and the T-evolving graph for malicious event sequence construction. Lemmatization enables the effective grouping of words into different granularity individual terms for different levels of word forms, meeting the needs of network analysts for various levels of attack analysis and malicious event prediction.

To increase the level of automation during attack analysis and minimize the need for expert analysis knowledge, word embedding [37] is utilized to map word sequences to real vectors. Multi-stage attack scenarios across different instances are learned using suitable deep learning models, similar to audio generation and short sentence text completion. This enables the model to predict potential future attacks, automate the recommendation of possible future malicious events, and provide network analysts with the probability associated with the potential occurrence of a malicious event.

During the predictive analysis of malicious events in real time, network analysts can start the attack investigation from unknown audit logs with identified attack symptom entities (e.g., malware or suspicious host names), while restoring a partial graph of attack scenarios that have currently occurred based on the concepts of neighborhood graphs and T-evolving graphs. Considering that there is no absolutely accurate standard pattern that a malicious event will follow or precede another malicious event even if the same type of attack campaign is being experienced, based on the idea of the Top N recommendation system, this paper converts the prediction problem of malicious events into a problem of ranking the probability of occurrence, where the model gives a prediction score for different possible options for the next malicious event, thus obtaining the most likely N malicious events, which are kept in the attack scenario graphs and recommended to the network analysts.

4. Proposed Architecture

4.1. Dependency Graph Management

As a first phase, the security analyst needs to deploy collectors on the target host to collect source information. There are two types of collectors: coarse-grained collectors that track system-level information flow and fine-grained collectors that track intra-process information flow. The system-level causal graph treats system-level entities as vertices and operations between entities as edges, generating a time-stamped stream of events that are directed and indicate data flow or control flow.

However, even with a coarse-grained collector, the size of the causal graph is very large, and the number of audit events reported on a formal enterprise network can easily reach billions to tens of billions per day. A graph database was utilized in this study, which extracted critical information from system events, stored all data as nodes with edges, and offered a semantic query interface. This made it relatively easy to execute graph algorithms; however, using graph databases still requires loading large amounts of data for long-running attack campaigns.

Pruning of this causal graph is often more than essential and under the premise of ensuring the effectiveness of model learning of malicious event sequences and the accuracy of event prediction. Three basic graph reduction algorithms were utilized in this study to optimize the causal graph, as depicted in Figure 3 [38,39].

First, all nodes and edges that were not reachable from the attack symptom nodes (entities that may pose a threat) were eliminated. Just like node F in Figure 3, all subgraphs in the causal graph that consisted of security entities only were eliminated. Relying on the global reachability of the causal graph, this obviously did not affect the semantic relations related to malicious activities.

Second, this study used the causality-preserving reduction (CPR) approach [38], which maintained the ability to perform causal analysis on causal graphs. A simple and intuitive definition of causality is that the first write to an object affects subsequent reads. Therefore, to avoid changing the causality between objects, CPR will only remove any duplicate writes/reads between a pair of objects, with no reads/writes to the target object, for example, the two actions T2.Read and T5.Read in Figure 3. Although the algorithm may lose statistical information, such as access frequency, etc., this was not the primary focus of our consideration.

Finally, although CPR preserves the semantics in the causal graph well, it has a limited data reduction rate. To further compress the causal graph, this study again used the full dependence-preserving reduction (FDR) [39] approach. The dependence-preserving reduction based on the causal graph considered only the basic operations on the causal graph, i.e., backward tracing and forward tracing. The aim of this reduction was to preserve forward and backward reachability at every instant; more precisely, when reduction was applied, the results of backward forensic analysis and forward analysis from any node identified the exact same set of nodes before and after the reduction. For instance, if nodes B1 and B2 in Figure 3 represented the same node, the nodes and edges were combined by FDR for optimization. Although this process may compromise some of the original temporal order of events used to construct the event sequence, it will not affect the semantics of the temporal order of events for the attack scenario construction.

4.2. Malicious Sequence Learning

Malicious event sequences with temporal relationships were extracted from the causal graph database through malicious scenario construction by two ways. Possible high-level semantic relationships in different malicious event sequences were extracted using word shape reduction (lemmatization). After that, this study used word embeddings to transform the sequences into real vectors and learn the attack campaigns based on temporal sequences by the proposed deep learning model.

4.2.1. Malicious Events Sequence Lemmatization

Log entries, represented by log events, are considered as sequential elements that follow specific patterns and syntax rules. However, in many cases, there is no need to carefully consider the related entities that comprise each log event and the precise representation of the operations. For example, the entity files */svchost.exe_896 and */svchost.exe_920 are not fundamentally different from each other.

Lemmatization is often applied in natural language processing to group different inflected forms of a word into a single word [40]. Lemmatization can be utilized to transform the sequences into a generalized text that represents the sequence patterns for semantic interpretation. This technique preserves the original semantics of the complete sequence and facilitates model learning for the essence of attack strategies in different attack campaigns.

Based on [36], Table 1 gives the classification of different generalized types of entities and actions in log events, which included 16 different types of entity tokens and 14 different types of action tokens, which reduced the flexural and derived related forms of the words to a common basic form. The vocabularies were classified into four different types based on the coarse-grained semantics of the words: process, file, web, and action. All entities and actions of log events could be categorized into these tokens. To better extract the possible attack patterns and attack strategies in different attack campaigns, the coarse- and fine-grained semantics of the words could be further adjusted for better results.

4.2.2. Attack Scenario Graph Construction

In this section, this paper effectively obtains abstract attack scenarios from the causal graph by recovering the attack stories through known malicious events, allowing the deep learning model to learn different attack activities well and providing a precondition for the predictive analysis of multi-stage attacks later.

Based on the set of events after lemmatization, this paper considered both the neighborhood graph and T-evolving graph for malicious event sequence extraction from two different perspectives in the attack story recovering and attack scenario graph construction. Key malicious event sequences were extracted for the same attack activity by commencing from different detection points. This approach enabled the restoration of different possible sub-attack scenarios, which allowed the deep learning model to learn various forms of the same attack activity from different perspectives.

Evolving Graph-Based Attack Scenario Construction

Assuming the administrator detected the infected system and identified at least one detection point, a new causal graph was constructed using back-tracing. The graph was constructed from the graph database of all entities and actions that affected the state of the detection point. Events that triggered dependencies between entities recorded at runtime were used to achieve this, where one entity affected the state of another entity [41].

It is not necessary or desirable to consider the complete causal graph when understanding the attack. Instead, the focus should be on only the relevant parts of the causal graph, treating it as an evolving graph to facilitate comprehension [16]. For the T-evolving graph, the entity associated with the detection point for initialization was used, and the time threshold T associated with the object was the earliest time the analyst knew that the object state was impaired. Because the log was processed in reverse time order, all events encountered in the log after the detection point would occur before the time threshold of all entities currently in the graph

As shown in Figure 4, the complete causal graph is shown on the left, and the reduced 6-evolving graph is shown on the right at time point 6. Initializing malicious entity A as the detection point, the current time point was 6, i.e., the analyst detected malicious entity A at time point 6 and started the construction of the attack scenario based on the malicious entity. The highest priority was given to the event at time 5, and events after that time point were ignored because events after that time would not be causally related to any object in the current graph (malicious entity A). Similarly, the event at time 4 was ignored because this event was not causally related to any entity in the current graph (malicious entity A with entity B). Once the sequence of malicious events, shown at the bottom of Figure 4, was obtained, the causal chain was inferred, and the next temporal event that was likely to be causally related to detection point A was used as the prediction target (new malicious event).

Undoubtedly, the generated attack scenario graph is a subset of the graph in the original causal graph, and it is believed that this type of graph is a useful picture of the events leading to the detection point; in particular, it can significantly reduce the number of entities and actions that the analyst must examine to understand the attack, being able to highlight the causal sequence of the attack scenario relationship itself.

Neighborhood Graph-Based Attack Scenario Construction

In contrast to constructing an attack scenario graph based on an evolving graph, the concept of a neighborhood graph was introduced [36]. The sequence of relevant attack events in the causal graph was extracted, starting from multiple detection points, to reconstruct the attack story. When constructing the attack scenario graph through the neighborhood graph, multiple known malicious entities were utilized to extract attack event sequences for training a deep learning model. Only the malicious events that had the most direct association with the entities were used as ground-truths.

In constructing an attack scenario graph by extracting sequences of relevant attack events from neighborhood graphs, it is first necessary to obtain all potential malicious entities from third-party knowledge and construct their subsets that contain two or more entities. In practice, even the attack entities contained in the same attack activity are not always the same, and as a network analyst, it may likewise not be possible to obtain all the malicious entity markers. However, the number of malicious entities itself is usually not large, because attackers usually try to hide and minimize their activity traces, which facilitates us to learn their attack patterns.

For example, in Figure 5, the full causal graph based on the set of two malicious entities {A, E} and the set of non-malicious entities {B, C, D, F} is given, corresponding to the malicious event sequence and the neighborhood graph. If the source or target node of the edge in the causal graph was a malicious entity, the event was marked as a malicious event, and the extracted malicious events could be converted into a sequence based on the timestamp order. The extracted malicious event sequence could be constructed as an attack scenario graph based on the neighborhood graph by keeping the sub-graphs about the malicious entities in the original causal graph. During the prediction of malicious events, the focus was on the next temporal event that was causally related to any malicious entity on the neighborhood graph (malicious events), as this was the target for prediction.

Compared with the malicious event sequence extraction and attack scenario graph construction based on evolving graphs, attack story recovering through the concept of neighborhood graphs requires network analysts to mine all malicious entities present in the system as much as possible, and it only considers malicious events directly related to malicious entities and ignores other causal events in the causal graph that are indirectly causally related to them.

4.2.3. Deep Learning Model

Here a deep learning network based on word embedding and a variant of the bi-directional long short-term memory (BiLSTM) network was used to learn existing attack patterns by extracting the key malicious event sequences.

Word embedding is common in machine learning, in both natural language processing (NLP) systems and question-and-answer systems, and it is often used in the first layer of neural network learning to convert positive integers (subscripts) into vectors with fixed sizes. Considering how to preserve the semantic relevance of the preceding and following semantics in various attack patterns, as well as the semantic relationship of the context, word embedding was used to transform the sequences into numeric vectors. The different index numbers given to each word in the malicious event sequence were converted into embedding vectors by the embedding layer.

Due to the good results of convolutional neural networks (CNNs) applied on graph-type data, CNNs were added to BiLSTM networks for the initial feature extraction of numeric vectors, preserving key semantic features and contextual relationships to facilitate further attack pattern learning in BiLSTM networks. One-dimensional convolutional layers have been shown to be effective in learning spatial features for tasks such as learning adjacent words.

Considering that the BiLSTM network structure could learn the contextual relationship of the attack process better than a single LSTMstructure, a combination of BiLSTM and LSTM structure stacking was used to increase the nonlinearity of the network, which highlighted the strength of the front-to-back information propagation while considering the information relationship in both directions.

Finally, the discrimination of the probability of the occurrence of different tokens at the next time point was performed by means of a fully connected (dense) layer.

4.3. Attack Predictive Analytics

Before network analysts can begin predictive analysis, they often need to detect some of the early signs of a multi-stage attack. For common network intrusions, there are many ways to detect hazards. For example, a network or host firewall can notice processes that perform port scans or launch denial-of-service attacks; tools such as TripWire [42] can detect modified system files; tools such as Snort IDS [43] can monitor and retrieve network traffic data through officially obtained or custom rule sets to detect different attack methods and provide real-time alerts on attacks; and sandboxing tools can notice a program making impermissible or unusual system call patterns [44,45] or executing external code [46], etc.

The term “detection point” is used to refer to the key point for building an attack scenario graph that contains all attack event entities and actions when recovering the story of a network attack through a causal graph. Here the detection point will necessarily be a malicious entity, which may be a deleted, modified, or appended file, a known malicious hostname, a process or payload that behaves abnormally or suspiciously, etc. The list of malicious entities can be manually specified or obtained through third-party network intrusion/attack analysis. There is ample relevant research available that can mine suspicious malicious entities from the causal graph. For example, with deep learning algorithms, all possible malicious entities in log events can be identified uniformly starting from a single detection point [36].

After obtaining the known attack detection points, the attack scenario can be constructed, and the new malicious event sequences based on neighborhood graphs or evolving graphs can be extracted. Then, the obtained malicious event sequences are fed into the trained deep learning model through word embedding. Attack story recovering based on neighborhood graphs and evolving graphs has its own advantages and disadvantages. The malicious events included in the neighborhood graph are those most directly related to the malicious entities, but it must detect as many malicious entities as possible before it can obtain enough causal chains for malicious event prediction. The evolving graph, on the contrary, requires only a single malicious entity to start malicious event prediction directly, but since it introduces a large number of events that are not directly causally related to the malicious entity as part of the causal chain underlying the prediction, this will lead to a large number of false attack dependencies.

Figure 6 illustrates the process of performing the source, action, and destination prediction for malicious events. By training a deep learning model with existing attack patterns, a variable-length input sequence

\{e_{1}, e_{2}, \dots, e_{t}\}

was provided, and a prediction score for the subject (source), operation (action), and object (destination) of the next malicious event

e_{t + 1}

was generated, resulting in a prediction for the most probable next malicious event. Based on the advantages of deep learning itself, our prediction system should be able to understand and make predictions given variable-length event sequences as contexts by learning from previous historical experiences, which will better fit real-world scenarios and different prediction needs.

5. Results

5.1. Implementation

The main code related to this experiment was mostly implemented in Python. This study used Anaconda3 to build a development environment and then implemented the deep learning network and malicious event prediction algorithm through tensorflow2.3 and its corresponding high-level application programming interface (API) keras2.4.3.

This study used the self-developed JanusGraph native graph database for log event storage and management, while supporting a variety of databases as storage backends, which could be combined with existing big data processing frameworks in Apache to provide graph-based big data analysis capabilities.

5.2. Datasets

The lack of publicly available attack datasets and syslog data is inherently a challenge for APT attack detection and predictive analysis. Researchers can combine natural language processing techniques to automatically extract attack behavior data directly from APT analysis reports published by major security companies [47] and further transform them into causal graphs. However, the lack of real available attack environments and the corresponding standard dataset itself remains unavoidable.

To address this issue, Alsaheel et al. [36] reduced and simulated 10 relatively realistic APT attacks from APT research reports from known security firms and generated 24 h of audit log data from them, which consisted mainly of system events, in addition to DNS queries and browser events. In this paper, the causal graphs generated based on them were used as the baseline experimental dataset, and each attack that exploited different vulnerabilities is detailed in Table 2.

This dataset consisted of 6.7 G of audit log data, with an average of 20 K unique entities generated from ten attack simulations and over 200 K events per attack campaign, with more than 17 K malicious events directly related to malicious entities. Among them, S-1 to S-4 were logs formed by attacks on single hosts, while M-1 to M-6 were attacks against multiple hosts, which spanned multiple hosts (multi-host attacks). In the dataset, the many different attack characteristics included in these attacks were given, such as phishing email links, phishing email attachments, injection, and lateral movements (e.g., leaking sensitive data).

5.3. Malicious Event Prediction Performance Evaluation

The effectiveness of performing predictive analytics of malicious events is evaluated in Table 3, Table 4, Table 5 and Table 6. The single-host dataset (S-1) and multi-host dataset (M-2) were used as test sets for the experiments, while other APT attack campaigns served as the training datasets for the experiments, respectively. Among them, S-1 was a single-host strategic web compromise attack campaign, which exploited the same 2015-5122 vulnerability as M-1, while M-2 was a multi-host targeted GOV phishing attack campaign, which exploited the 2015-5199 vulnerability only. However, it could be seen that the different attack campaigns all shared the same attack features.

In order to evaluate the effectiveness of deep learning-based prediction ways, various forms of LSTM models, the Transformer model [54], and the temporal convolutional network (TCN) model [55] were considered in the study. In this study, both the Transformer model’s encoder layer (Transformer-Encoder, TE) and the TCN model’s causal convolution layer and residual block module (Causal Convolution Residual, CCR) were used for feature extraction, serving as a comparison with the proposed model. A unified Soft-max layer was used for the probability output. For example, in the TCN model, only three layers of feature extraction were used, achieved by stacking three layers of causal convolution and residual block modules. In addition to the deep models proposed in Section 4.2.3, basic LSTM models and LSTM models with convolutional layers for feature extraction were also tested as ablation experiments.

In Table 3 and Table 4, the tested models based on the evolving graph were used to predict the source, action, and destination of malicious events on both the single-host dataset (S-1) and multi-host dataset (M-2). In Table 5 and Table 6, the prediction results were analyzed based on the attack scenarios constructed using the neighborhood graph. In both cases, the evaluation included the loss value of the binary_crossentropy loss function for predicting the malicious events, which provided insight into the model’s fitting performance, as well as the accuracy for correctly predicting the classification, which intuitively judged the reliability of the model’s predictions. (The optimal values for each column in tables are highlighted in bold.)

The experimental results showed that there was a significant difference in the accuracy of the prediction results for different test sets, and generally, the predictive models showed better predictability for the APT-M-2 test set. This will be further analyzed in the sequence length evaluation in Section 5.5.

In this study, the proposed model was preferred for predicting malicious events as it generally yielded better accuracy. However, it was evident that this advantage was not absolute, particularly from the loss value analysis, which suggested that the model may not converge well, thus affecting the confidence in the prediction results. For instance, for the same prediction result, two different probability outputs of (Pro: 0.51) and (Pro: 0.91) could yield the same result but with completely different levels of confidence.

Furthermore, through ablation experiments, it was found that a simple LSTM model was clearly unable to adapt to the accurate prediction of malicious events. However, by using a feature extraction method with convolution layer and stacked BILSTM layer, the stability of the model could be effectively improved. This was reflected in the high accuracy of the model in most cases. From this perspective, the construction of the model was undoubtedly effective.

Interestingly, for the proposed model, the prediction accuracy for each element of the malicious event triplet in the test set reached a range of 96–99%. Although this inevitably accompanied some overfitting effects, in terms of results, it had already learned well the development rules of events under different circumstances. However, due to the differences between the training set and the test set, even using more complex models cannot guarantee a significant improvement in performance on the test dataset. In this case, we need other ways to obtain higher performance gains.

5.4. Varying Potential Options-Based Prediction Analysis

The absolute sequence of malicious events associated with a particular attack activity cannot be guaranteed, even for the same attack activity. There is no absolute pattern that an event will follow or precede another event, and there is also no guarantee whether other security events unrelated to the coordinated attack will be observed in the causal graph, which includes but is not limited to different attack activities from different groups of adversaries occurring at the same time. The expectation is to provide multiple possible options for the next malicious event simultaneously, rather than a single predictive target.

In this paper, the obtained malicious event sequences were inputted into a deep learning model with a word embedding layer. The model generated a prediction score for the next malicious event, providing varying potential options for the related entities and actions, resulting in the most likely N recommendations. For example, in Table 7 and Table 8, based on the CCR, TE, and the proposed stacked model, the impact of the number of recommendations on the prediction accuracy is demonstrated in the Top-1 to Top-3 range for both APT-S-1 and APT-M-2 malicious activities. (The optimal values for each column in tables are highlighted in bold.)

In the prediction, also based on the method of Top-N, the prediction of

N_{1}

event sources was conducted first. For the known set of malicious event sources, the action and the destination of the event related to the sources with the highest probability were predicted. The prediction scores of different (source, action) binary groups and (source, action, destination) malicious event triples were given, and the most probable N malicious events were finally obtained. The Top-N malicious events were kept in the attack scenario graphs, and the recommendations will be presented. The value of N depended on the desired accuracy threshold θ, that is, for all predicted events, the sum of the predicted probabilities output by the model, Sum(Pro) > θ.

In fact, this presentation was far more valuable than predicting only the most likely malicious event because for the most possible malicious events, even if they did not occur at the next event point, there was a high probability that they would occur afterward, except that there was an absolute order of precedence among them.

5.5. Sequence Length Evaluation

Generally, understanding how deep learning models work is challenging as they are often viewed as black boxes. To consider more complex working scenarios, various time series prediction models including RNNs rely not only on long-term memory but also on short-term memory, especially in filtering out noise.

The objective of this study was to determine the relative influence of long-term memory and short-term memory in decision making. Short-term memory refers to a system that relies on only a few elements of the sequence to make a decision, specifically, the elements closest to the system’s prediction target. On the other hand, long-term memory refers to a system that utilizes the entire sequence or a significant portion of it to predict the next security event.

Intuitively, an increase in the number of observed events is not expected to improve the performance of the model if short-term memory dominates. To determine the type of influence that may be more significant to the model, the analysis of neural network weights was avoided, and instead, the impact of different historical event sequence lengths on the final prediction accuracy was examined.

As shown in Figure 7 and Figure 8, the prediction accuracy for different prediction targets was given for historical event lengths of 1–40 in two different test sets. It can be seen that the overall prediction accuracy was higher in the APT-M-2 test set. For both test sets, a sufficiently long historical event length could significantly improve the prediction performance in most cases.

Similarly, sequence length evaluation can also reveal the specific reasons for the prediction performance of different models in different datasets. As shown in Table 5, for the APT-S-1 test set, the source of the event often cannot be predicted well when predicting malicious events based on the extracted malicious event sequences from the neighborhood graph. In Figure 7, it can be seen that the CCR, TE, and proposed models made erroneous predictions when influenced by both short-term and long-term memory when making this prediction. Taking the causal convolution residual (CCR) structure as an example, a longer historical event length often further improved the prediction accuracy when the historical event length reached 35, but the effect sharply declined when the historical event length exceeded 35. Therefore, controlling different historical event lengths can often greatly optimize the model performance.

5.6. Case Study for APT Predictive Analytics

Finally, the proposed model was demonstrated through a case study that depicted its operation using system-level causal graphs and predictive attack analytics. The case study refers to the M-5-Pony campaign attack discussed in Section 5.2, where log data from the initial stages of the attack were analyzed. The native graph database facilitated the extraction of attack scenario graphs for the current Pony campaign attack from the overall causal graph using the method outlined in Section 4.2.

As depicted in Figure 9, the malicious file (ID: 41459768; Name: c:/users/aalsahee/ downloads/msf.doc) was discovered at the latest time point. The original attack scenario graph was constructed using the attack evolving graph, as shown in Figure 9a. However, the current time node was traced back based on the preceding 20 events only. After the lemmatization was completed, the simplified scenario was obtained as shown in Figure 9b.

A careful analysis of the entire back-tracing process required us to first focus on events that were directly linked to the msf.doc file, such as <firefox.exe write(18) msf.doc> and <explorer.exe read(20) msf.doc>. These events illustrated how the malicious file msf.doc compromised other processes on the machine. The rationalization based on this attack scenario was that the process was ongoing and would continue, i.e., the highest probability event prediction given by the model <programfiles_process write-or-read user_file\msf.doc> could be derived from this trend. In fact, in the following time, by <c:/programfiles/*/winword.exe_35 write ~$msf.doc>, the malicious file would further compromise the word process on the victim machine.

Figure 10 illustrates the long-term perspective of the Pony campaign attack using log data spanning a longer time scale than the case study. This approach provided a more comprehensive understanding of the attack campaign. It depicted the transition from the local Firefox browser establishing a connection with the remote server (event <192.168.223.3 connected_remote_ip(2) c:/programfiles/mozillafire- fox/firefox.exe_3776>) to the C&C server communicating with the local mshta.exe program (event <192.168.223.3 connected_remote_ip(20) c:/windows/system32/mshta.exe_264>).

This included the main part of the compromising process starting from acquiring the msf.doc malicious file (event <0xalsaheel.com web_request(7) 0xalsaheel.com/msf.doc>) to that document compromising other processes on the victim’s machine. Based on that scenario, the two most likely new events to occur were <IP_Address connected_session session> and <system32_process connect connection>, with the former correctly predicting the upcoming event (event <192.168.223.3 connected_session(21) session_192.168.22- 3.130_51794>), while the latter, coincidentally, further referred to another event after that (event <c:/windows/system32/mshta.exe_264 connect(23) connection_192.168. 223.13- 0_192.168.223.3>). As a neighborhood graph that contained only part of the earlier malicious campaign, it revealed enough information to us.

In contrast to the domain graph, the evolving graph did not need more known malicious entities as a prediction analytics basis and could start from a single malicious event to trace back the whole chain of events leading to the creation of that event and use it as a dependency for inference analysis. However, due to the lack of a global perspective related to the whole APT attack campaign, it was difficult to simply depict the dynamic global picture of the whole attack activity development experience, and the short-term attack back-tracing could only be used to understand the local working mechanism. The advantages and disadvantages between the two cannot be easily judged in short-term prediction analytics, but long-term prediction of the whole attack campaign must require richer and more refined prior knowledge, and attack scenarios built on the collective of multiple malicious entities would be a better choice

6. Discussion

Predictive analysis and detection analysis of network attacks are important technologies in the field of network security, but their purposes and advantages are different. In general, predictive analysis predicts potential security issues in the future by analyzing and modeling existing data. Compared with attack detection, it can help security teams detect potential threats before attacks occur, greatly improve security response speed, enhance security defense capabilities, provide sufficient time for security teams to respond to threats, and take corresponding measures to prevent them. However, compared with the relatively mature attack detection and prevention measures, how to effectively study opponents’ attack strategies and make effective predictions is still a major challenge in attack prevention.

Although there have been many studies on attack patterns in the previous literature, in fact, due to the task being too generic, there are not many common elements in the proposed approaches. For example, the attack graph mentioned earlier in Section 2 is more concerned with static configurations, HMM is concerned with alert correlation, and the causal graph is often constructed through log events, making it impossible to compare their performance through conventional methods. In this regard, this paper compared and analyzed the predictive research on network attacks in recent years mentioned earlier in Table 9 and summarized the application backgrounds, specific attack types, methods used, and datasets used for testing in different studies. In addition, some of the methods used in some papers have certain similarities but are used for different research purposes. For example, in [14,15], although the same method is used, they have different application purposes, with the latter focusing on detection at the attack stage and the former focusing on predicting alerts.

Complex multi-step attacks, such as APTs, pose a significant threat to network security due to their diversity and complexity. Attackers develop specialized attack strategies and plans based on the characteristics and vulnerabilities of the target system to ensure the success and persistence of the attack. Moreover, in many appropriate case studies, these behaviors can last for several months or even years, as intruders repeat the exfiltration process many times. This also provides network analysts with certain information for attack analysis. Due to the complex nature, detecting and predicting APT attacks has always been the mainstream of various types of security prevention work.

In the study, attack scenarios were learned through a simple deep learning model, and in Section 5, different deep learning structures were further discussed for attack pattern learning and prediction performance. The study found that by slightly controlling the potential options for predicting malicious events, the accuracy of event prediction could be effectively controlled in a high range. Further sequence length evaluation revealed that with sufficient historical data, prediction accuracy could be greatly improved.

However, due to the characteristics of the current deep models and the structure of the attack scenario graph that is being used, it is difficult to guarantee long-term prediction of malicious events and even more difficult to complete the global generation of the entire attack scenario. In addition, it is still necessary to manually intervene in how to restore the lemmatized malicious events to the actual system entities. Furthermore, determining how to better construct the attack scenario graph; effectively extract events that are most directly related to the attack activity; and exclude irrelevant, duplicate, and redundant events are necessary in order to intuitively and effectively standardize the causal relationship of the attack activity process. For example, in the neighborhood graph, all events directly related to malicious entities are considered malicious events, but in fact, only events that actually cause damage to the system directly or indirectly are needed.

7. Conclusions

To better help users understand the attack strategies of intruders, it is important to establish a nonlinear dependency model and construct an attack graph. To this end, this paper proposed a new framework for predicting and analyzing APT attack events based on causal graphs. This was a new attempt to use system log data to predict future attack events. Based on the events obtained from the log data, two methods for constructing attack scenario graphs were proposed in this paper, namely, the evolving graph and the neighborhood graph. The former tended to construct attack scenarios based on backtracking from a single malicious event, while the latter tended to construct new attack scenarios composed of multiple malicious entities and multiple malicious events. Based on this, a sequence-based analytics was used to predict possible consecutive attack events using deep models.

By processing log data into events, the semantic gap between different types of logs can be effectively eliminated. However, this may also affect the semantic information expression of log data. A more ideal approach is to reserve certain feature values for event entities, which will provide richer information for lemmatized event sequences. In future work, we will attempt to use graph neural networks to learn from historical attack scenarios and improve the ability of deep models to predict long-term attack event sequences.

Author Contributions

Conceptualization, H.L.; data curation, H.L.; formal analysis, H.L.; funding acquisition, R.J.; investigation, H.L.; methodology, H.L.; project administration, R.J.; resources, H.L.; software, H.L. and R.J.; supervision, R.J.; validation, H.L.; visualization, R.J.; writing—original draft, H.L.; writing—review and editing, H.L. and R.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2022YFB3104103) and the National Natural Science Foundation of China (No. 62072131).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://github.com/purseclab/ATLAS (accessed on 10 October 2021).

Conflicts of Interest

The authors declare no conflict of interest.

References

Ghafir, I.; Kyriakopoulos, K.G.; Lambotharan, S.; Aparicio-Navarro, F.J.; Assadhan, B.; Binsalleeh, H.; Diab, D.M. Hidden Markov models and alert correlations for the prediction of advanced persistent threats. IEEE Access 2019, 7, 99508–99520. [Google Scholar] [CrossRef]
CNET. ‘Wannacry’ Ransomware: Everything You Need to Know. Available online: https://www.windowscentral.com/wannacry-ransomware-attack-windows (accessed on 22 October 2017).
Washington Post. Massive Cyberattack Hits Europe with Widespread Ransom Demands. Available online: https://www.thegazette.com/nation-world/massive-cyberattack-hits-europe-with-widespread-ransom-demands (accessed on 22 October 2017).
Qi, Y.; Jiang, R.; Jia, Y.; Li, A. Attack Analysis Framework for Cyber-Attack and Defense Test Platform. Electronics 2020, 9, 1413. [Google Scholar] [CrossRef]
Steinberger, J.; Sperotto, A.; Golling, M.; Baier, H. How to exchange security events? Overview and evaluation of formats and protocols. In Proceedings of the 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), Ottawa, ON, Canada, 11–15 May 2015; pp. 261–269. [Google Scholar]
Kaspersky. What Is WannaCry Ransomware. Available online: https://usa.kaspersky.com/resource-center/threats/ransomware-wannacry (accessed on 10 January 2022).
Jia, Y.; Qi, Y.; Shang, H.; Jiang, R.; Li, A. A Practical Approach to Constructing a Knowledge Graph for Cybersecurity. Engineering 2018, 4, 117–133. [Google Scholar] [CrossRef]
Phillips, C.; Swiler, L.P. A Graph-Based System for Network-Vulnerability Analysis. In Proceedings of the Workshop New Security Paradigms, Charlottesville, VA, USA, 22–26 September 1998; pp. 71–79. [Google Scholar]
Hughes, T.; Sheyner, O. Attack Scenario Graphs for Computer Network Threat Analysis and Prediction. Complexity 2003, 9, 15–18. [Google Scholar] [CrossRef]
Polatidis, N.; Pimenidis, E.; Pavlidis, M.; Kameas, A. Recommender Systems Meeting Security: From Product Recommendation to Cyber-Attack Prediction. In Proceedings of the Engineering Applications of Neural Networks: 18th International Conference, Athens, Greece, 25–27 August 2017; pp. 508–519. [Google Scholar]
Polatidis, N.; Pimenidis, E.; Pavlidis, M.; Papastergiou, S.; Mouratidis, H. From product recommendation to cyber-attack prediction: Generating attack graphs and predicting future attacks. Evol. Syst. 2020, 11, 479–490. [Google Scholar] [CrossRef] [Green Version]
Ramaki, A.A.; Khosravi-Farmad, M.; Bafghi, A.G. Real time alert correlation and prediction using Bayesian networks. In Proceedings of the 12th International Iranian Society of Cryptology Conference on Information Security and Cryptology (ISCISC), Rasht, Iran, 8–10 September 2015; pp. 98–103. [Google Scholar]
Farhadi, H.; AmirHaeri, M.; Khansari, M. Alert correlation and prediction using data mining and HMM. ISeCure 2011, 3, 77–101. [Google Scholar]
Holgado, P.; Villagrá, V.A.; Vazquez, L. Real-time multistep attack prediction based on hidden markov models. IEEE Trans. Dependable Secur. Comput. 2017, 17, 134–147. [Google Scholar] [CrossRef]
Shawly, T.; Elghariani, A.; Kobes, J.; Ghafoor, A. Architectures for Detecting Interleaved Multi-Stage Network Attacks Using Hidden Markov Models. IEEE Trans. Dependable Secur. Comput. 2021, 18, 2316–2330. [Google Scholar] [CrossRef] [Green Version]
King, S.T.; Chen, P.M. Backtracking intrusions. In Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles (SOSP), Bolton, NY, USA, 19–22 October 2003. [Google Scholar]
King, S.T.; Mao, Z.M.; Lucchetti, D.G.; Chen, P.M. Enriching intrusion alerts through multi-host causality. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 8–11 February 2005. [Google Scholar]
Lee, K.H.; Zhang, X.; Xu, D. High accuracy attack provenance via binary-based execution partition. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2013. [Google Scholar]
Ma, S.; Zhang, X.; Xu, D. ProTracer: Towards practical provenance tracing by alternating between logging and tainting. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 21–24 February 2016. [Google Scholar]
Li, Z.; Chen, Q.A.; Yang, R.; Chen, Y.; Ruan, W. Threat Detection and Investigation with System-level Provenance Graphs: A Survey. Comput. Secur. 2021, 106, 102282. [Google Scholar] [CrossRef]
Hossain, M.N.; Milajerdi, S.M.; Wang, J.; Eshete, B.; Gjomemo, R.; Sekar, R.; Stoller, S.D.; Venkatakrishnan, V.N. SLEUTH: Real-time Attack Scenario Reconstruction from COTS Audit Data. In Proceedings of the USENIX Security Symposium, Vancouver, BC, Canada, 16–18 August 2017. [Google Scholar]
Milajerdi, S.M.; Eshete, B.; Gjomemo, R.; Venkatakrishnan, V.N. Poirot: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security, London, UK, 11–15 November 2019; pp. 1795–1812. [Google Scholar]
Hassan, W.U.; Guo, S.; Li, D.; Chen, Z.; Jee, K.; Li, Z.; Bates, A. NoDoze: Combatting Threat Alert Fatigue with Automated Provenance Triage. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
Liu, Y.; Zhang, M.; Li, D.; Jee, K.; Li, Z.; Wu, Z.; Rhee, J.; Mittal, P. Towards a Timely Causality Analysis for Enterprise Security. In Proceedings of the Network and Distributed System Security Symposium, San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
Xie, Y.; Feng, D.; Hu, Y.; Li, Y.; Sample, S.; Long, D.L. Pagoda: A Hybrid Approach to Enable Efficient Real-time Provenance Based Intrusion Detection in Big Data Environments. IEEE Trans. Dependable Secur. Comput. 2018, 17, 1283–1296. [Google Scholar] [CrossRef]
Xie, Y.; Wu, Y.; Feng, D.; Long, D. P-Gaussian: Provenance-Based Gaussian Distribution for Detecting Intrusion Behavior Variants Using High Efficient and Real Time Memory Databases. IEEE Trans. Dependable Secur. Comput. 2021, 18, 2658–2674. [Google Scholar] [CrossRef]
Han, X.; Pasquier, T.; Bates, A.; Mickens, J.; Seltzer, M. Unicorn: Runtime Provenance-Based Detector for Advanced Persistent Threats. arXiv, 2020; arXiv:2001.01525. [Google Scholar]
Liu, F.; Wen, Y.; Zhang, D.; Jiang, X.; Xing, X.; Meng, D. Log2vec: A Heterogeneous Graph Embedding Based Approach for Detecting Cyber Threats within Enterprise. In Proceedings of the 2019 ACM SIGSAC Conference, London, UK, 11–15 November 2019. [Google Scholar]
Du, M.; Li, F.; Zheng, G.; Srikumar, V. Deeplog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1285–1298. [Google Scholar]
Li, T.; Jiang, Y.; Lin, C.; Obaidat, M.S.; Shen, Y.; Ma, J. Deepag: Attack graph construction and threats prediction with bi-directional deep learning. IEEE Trans. Dependable Secur. Comput. 2022, 20, 740–757. [Google Scholar] [CrossRef]
Shen, Y.; Mariconti, E.; Vervier, P.A.; Stringhini, G. Tiresias: Predicting security events through deep learning. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, Toronto, ON, USA, 15–19 October 2018; pp. 592–605. [Google Scholar]
Lv, S.; Wang, J.; Yang, Y.; Liu, J. Intrusion prediction with system-call sequence-to-sequence model. IEEE Access 2018, 6, 71413–71421. [Google Scholar] [CrossRef]
Yin, K.; Yang, Y.; Yao, C.; Yang, J. Long-Term Prediction of Network Security Situation Through the Use of the Transformer-Based Model. IEEE Access 2022, 10, 56145–56157. [Google Scholar] [CrossRef]
Hu, C.; Liu, G.; Li, M. A Network Security Situation Prediction Method Based on Attention-CNN-BiGRU. In Proceedings of the 2022 IEEE 25th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Hangzhou, China, 4–6 May 2022; pp. 257–262. [Google Scholar]
Bilge, L.; Han, Y.; Dell’Amico, M. Riskteller: Predicting the risk of cyber incidents. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 1299–1311. [Google Scholar]
Alsaheel, A.; Nan, Y.; Ma, S.; Yu, L.; Walkup, G.; Celik, Z.B.; Zhang, X.; Xu, D. ATLAS: A Sequence-based Learning Approach for Attack Investigation. In Proceedings of the USENIX Security Symposium, USENIX Association, Virtual, 11–13 August 2021. [Google Scholar]
Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [Google Scholar] [CrossRef]
Xu, Z.; Wu, Z.; Li, Z.; Jee, K.; Rhee, J.; Xiao, X.; Xu, F.; Wang, H.; Jiang, G. High Fidelity Data Reduction for Big Data Security Dependency Analyses. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 504–516. [Google Scholar]
Hossain, M.N.; Wang, J.; Sekar, R.; Stoller, S.D. Dependence-Preserving Data Compaction for Scalable Forensic Analysis. In Proceedings of the USENIX Security Symposium, Baltimore, MD, USA, 15–17 August 2018. [Google Scholar]
Plisson, J.; Lavrac, N.; Mladenic, D.; Grobelnik, M. A Rule Based Approach to Word Lemmatization. In Proceedings of the 7th International Multi Conference Information Society IS, Ljubljana, Slovenia, 11–15 October 2004. [Google Scholar]
Lamport, L. Time, Clocks, and the Ordering of Events in a Distributed System. In Concurrency: The Works of Leslie Lamport; Malkhi, D., Ed.; Morgan & Claypool Publishers: San Rafael, CA, USA, 2019; pp. 179–196. [Google Scholar]
Kim, G.H.; Spafford, E.H. The Design and Implementation of Tripwire: A File System Integrity Checker. In Proceedings of the 1994 ACM Conference on Computer and Communications Security (CCS), Fairfax, VA, USA, 2–4 November 1994. [Google Scholar]
CISCO & Affiliates. Snort 3 Is Available. Available online: https://www.snort.org (accessed on 13 June 2021).
Forrest, S.; Hofmeyr, S.A.; Somayaji, A.; Longstaff, T.A. A Sense of Self for Unix Processes. In Proceedings of the 1996 IEEE Symposium on Computer Security and Privacy, Oakland, CA, USA, 6–8 May 1996. [Google Scholar]
Goldberg, I.; Wagner, D.; Thomas, R.; Brewer, E.A. A Secure Environment for Untrusted Helper Applications. In Proceedings of the 1996 USENIX Technical Conference, San Jose, CA, USA, 22–25 July 1996. [Google Scholar]
Kiriansky, V.; Bruening, D.; Amarasinghe, S. Secure Execution Via Program Shepherding. In Proceedings of the 2002 USENIX Security Symposium, San Francisco, CA, USA, 5–9 August 2002. [Google Scholar]
Satvat, K.; Gjomemo, R.; Venkatakrishnan, V.N. Extractor: Extracting attack behavior from threat reports. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), Vienna, Austria, 6–10 September 2021; pp. 598–615. [Google Scholar]
FireEye Threat Intelligence. Second Adobe Flash Zeroday CVE-2015-5122 from Hackingteam Exploited in Strategic Web Compromise Targeting Japanese Victims. Available online: https://www.fireeye.com/blog/threat-research/2015/07/second_adobe_flashz0.html (accessed on 6 June 2020).
Li, B.; Chen, J.C. Exploit Kits in 2015: Flash Bugs, Compromised Sites, Malvertising Dominate. Available online: https://blog.trendmicro.com/trendlabs-securityintelligence/exploit-kits-2015-flash-bugscompromised-sites-malvertising-dominate/ (accessed on 6 June 2020).
Cedrick Ramos. Spam Campaigns with Malware Exploiting CVE-2017-11882 Spread in Australia and Japan. Available online: https://www.trendmicro.com/vinfo/us/threat-encyclopedia/spam/3655/spam-campaigns-with-malware-exploiting-cve201711882-spread-in-australia-and-japan (accessed on 1 March 2017).
Jiang, G.; Mohandas, R.; Leathery, J.; Berry, A.; Galang, L. CVE-20170199: In the Wild Attacks Leveraging HTA Handler. Available online: https://www.fireeye.com/blog/threat-research/2017/04/cve-2017-0199-hta-handler.html (accessed on 6 June 2020).
Paganini, P. Phishing Campaigns Target Us Government Agencies Exploiting Hacking Team Flaw CVE-20155119. Available online: https://securityaffairs.co/wordpress/38707/cyber-crime/phishing-cve-2015-5119.html (accessed on 6 June 2020).
Trend Micro. Rig Exploit Kit Now Using CVE-2018-8174 to Deliver Monero Miner. Available online: https://www.trendmicro.com/en_us/research/18/e/rig-exploit-kit-now-using-cve-2018-8174-to-deliver-monero-miner.html (accessed on 6 June 2020).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv, 2018; arXiv:1803.01271. [Google Scholar]

Figure 1. Cybersecurity use case profiling.

Figure 2. Overview of proposed architecture.

Figure 3. Graph optimization. An example causal graph (left) to illustrate reduction process. One malicious entity C is contained, while the others such as {A, B1, B2, D, F} (left) and {A, B, D} (right) are non-malicious entities.

Figure 4. The 6-evolving graph. An example of generating an evolving graph (right) from a causal graph (left), back tracing from the current point T6 to all events at time point T1. One malicious entity A is contained, while the others such as {B, C, D, E, F} (left) and {B, C, D, F} (right) are non-malicious entities.

Figure 5. Neighborhood graph. It is an example of generating a neighborhood graph (right) from a causal graph (left). Two malicious entities {A, E} are contained, while the others such as {B, C, D, F} (left) and {B, D, F} (right) are non-malicious entities.

Figure 6. Attack predictive analytics.

Figure 7. Sequence length evaluation based on APT-S-1. (a) CCR. (b) TE. (c) Proposed.

Figure 8. Sequence length evaluation based on APT-M-2. (a) CCR. (b) TE. (c) Proposed.

Figure 9. Case study based on evolving graph. This is an example of malicious event prediction based on evolving graph. Due to the long names of some entities, a temporary ID (which does not represent any statistical significance) was randomly generated for each node, such as a malicious file (File ID: 41459768; File Name: c:/users/aalsahee/downloads/msf.doc). (a) The original evolving graph. (b) The evolving graph after lemmatization.

Figure 10. Case study based on a neighborhood graph. This is an example of malicious event prediction based on neighborhood graph and a temporary ID was randomly generated for each node. Compared to the example given in Figure 9, it contains more attack information. (a) The original neighborhood graph. (b) The neighborhood graph after lemmatization.

Table 1. Abstracted vocabulary set for lemmatization.

Type	Generalized Vocabulary
Process	user_process, programfiles_process, system32_process, windows_process, other_process
File	user_file, programfiles_file, system32_file, windows_file, combined_files, other_File
Web	IP_Address, domain_name, web_object, connection, session
Action	connected_remote_ip, connected_session, read, write, delete, execute, executed, fork, connect, resolve, web_request, refer, bind, sock_send

Table 2. Overview of implemented APT attacks.

ID	APT Campaign	Exploited CVE	Attack Features							Size (MB)	Total
ID	APT Campaign	Exploited CVE	PL	PA	INJ	IG	BD	LM	DE	Size (MB)	Entity	Event
S-1	Strategic web compromise [48]	2015-5122	✓		✓	✓	✓		✓	381	7468	95 K
S-2	Malvertising dominate [49]	2015-3105	✓		✓	✓	✓		✓	990	34,021	397 K
S-3	Spam campaign [50]	2017-11882		✓	✓	✓	✓		✓	521	8998	128 K
S-4	Pony campaign [51]	2017-0199		✓	✓	✓	✓		✓	448	13,037	125 K
M-1	Strategic web compromise [48]	2015-5122	✓		✓	✓	✓	✓	✓	851	17,599	251 K
M-2	Targeted GOV phishing [52]	2015-5199	✓		✓	✓	✓	✓	✓	819	24,496	284 K
M-3	Malvertising dominate [49]	2015-3105	✓		✓	✓	✓	✓	✓	496	24,481	334 K
M-4	Monero miner by Rig [53]	2018-8174		✓	✓	✓	✓	✓	✓	653	15,409	258 K
M-5	Pony campaign [51]	2017-0199	✓		✓	✓	✓	✓	✓	878	35,709	258 K
M-6	Spam campaign [50]	2017-11882		✓	✓	✓	✓	✓	✓	725	19,666	354 K

PL: phishing email link. PA: phishing email attachment. INJ: injection. IG: information gathering. BD: backdoor. LM: lateral movement. DE: data exfiltration.

Table 3. Performance based on APT-S-1 and evolving graph.

Models	APT-S-1
	Src		Act		Dest
	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy
LSTM	0.131	0.460	0.213	0.295	0.146	0.351
CNN + LSTM	0.136	0.461	0.212	0.296	0.151	0.350
CCR	0.153	0.771	0.122	0.651	0.058	0.819
TE	0.108	0.608	0.127	0.601	0.055	0.763
Proposed	0.095	0.773	0.192	0.742	0.073	0.890

Table 4. Performance based on APT-M-2 and evolving graph.

Models	APT-M-2
	Src		Act		Dest
	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy
LSTM	0.129	0.576	0.220	0.366	0.157	0.149
CNN + LSTM	0.127	0.579	0.224	0.366	0.049	0.806
CCR	0.181	0.786	0.125	0.691	0.064	0.872
TE	0.085	0.697	0.136	0.659	0.059	0.833
Proposed	0.176	0.798	0.155	0.695	0.102	0.796

Table 5. Performance based on APT-S-1 and neighborhood graph.

Models	APT-S-1
	Src		Act		Dest
	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy
LSTM	0.131	0.586	0.200	0.464	0.135	0.503
CNN + LSTM	0.128	0.586	0.125	0.792	0.028	0.917
CCR	0.118	0.586	0.139	0.729	0.035	0.927
TE	0.102	0.593	0.108	0.674	0.022	0.909
Proposed	0.157	0.582	0.117	0.738	0.028	0.928

Table 6. Performance based on APT-M-2 and neighborhood graph.

Models	APT-M-2
	Src		Act		Dest
	Loss	Accuracy	Loss	Accuracy	Loss	Accuracy
LSTM	0.101	0.759	0.177	0.562	0.109	0.651
CNN + LSTM	0.101	0.775	0.088	0.782	0.116	0.650
CCR	0.105	0.691	0.139	0.729	0.073	0.922
TE	0.062	0.798	0.081	0.749	0.021	0.923
Proposed	0.108	0.704	0.116	0.790	0.020	0.930

Table 7. Performance based on APT-S-1.

Methods		APT-S-1
		Src			Act			Dest
		Top-1	Top-2	Top-3	Top-1	Top-2	Top-3	Top-1	Top-2	Top-3
Evolving graph	CCR	0.771	0.876	0.915	0.651	0.880	0.907	0.819	0.848	0.931
	TE	0.608	0.781	0.845	0.601	0.803	0.870	0.763	0.904	0.950
	Proposed	0.773	0.886	0.942	0.742	0.886	0.930	0.890	0.920	0.984
Neighborhood graph	CCR	0.586	0.651	0.816	0.729	0.858	0.880	0.927	0.968	0.978
	TE	0.593	0.793	0.894	0.674	0.860	0.897	0.909	0.975	0.993
	Proposed	0.582	0.880	0.951	0.738	0.873	0.904	0.928	0.981	0.989

Table 8. Performance based on APT-M-2.

Methods		APT-M-2
		Src			Act			Dest
		Top-1	Top-2	Top-3	Top-1	Top-2	Top-3	Top-1	Top-2	Top-3
Evolving graph	CCR	0.786	0.858	0.904	0.691	0.890	0.913	0.872	0.913	0.930
	TE	0.697	0.824	0.915	0.659	0.798	0.884	0.833	0.888	0.919
	Proposed	0.798	0.878	0.907	0.695	0.857	0.895	0.796	0.909	0.935
Neighborhood graph	CCR	0.691	0.894	0.955	0.729	0.855	0.931	0.922	0.980	0.991
	TE	0.798	0.891	0.956	0.749	0.928	0.965	0.923	0.985	0.996
	Proposed	0.704	0.858	0.952	0.790	0.947	0.971	0.930	0.994	0.998

Table 9. Comparison of approaches for predictive analysis.

Ref.	Target Scenario	Cyber Attack	Architecture	Det.	Pred.	Dataset
Ref.	Target Scenario	Cyber Attack	Architecture	Det.	Pred.	Name or Source	Description
[10,11]	To predict the possible paths	Maritime supply chain attack	Recommender system		√	From maritime supply chain IT infrastructure	Hardware, software assets, numerous vulnerabilities
[14]	To foresee the next steps	Multi-step attacks	HMM		√	Virtual scenario deployed with VNX and DARPA2000	Generated alerts by Snort IDS
[15]	To detect and track the progress	Interleaved multi-stage network attacks	HMM	√		DARPA2000:	Generated alerts by Snort IDS
[29]	To detect and diagnose anomaly	Anomaly	LSTM	√		HDFS, and Open-Stack log	System log
[30]	To detect threats and predict attack paths	Multi-step attacks	Bi-directional LSTM	√	√	HDFS, Open-Stack, PageRank, and BGL	System log
[31]	To predict security events		Modified LSTM		√	Security event data collected from Symantec	Security events
[32]	To predict a sequence of system calls		Sequence-to-sequence with attention	√	√	ADFA-LD	Consists of system call traces
[33]	To predict network security situation		TCN-Transformer		√	UNSW-NB15 and CSE-CIC-IDS2018	Traffic features
[34]	To predict network security situation		Attention-CNN-BiGRU		√	Situation data released by CNCERT/CC and KDDCUP99	Five situation indicators and traffic features

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, H.; Jiang, R. A Causal Graph-Based Approach for APT Predictive Analytics. Electronics 2023, 12, 1849. https://doi.org/10.3390/electronics12081849

AMA Style

Liu H, Jiang R. A Causal Graph-Based Approach for APT Predictive Analytics. Electronics. 2023; 12(8):1849. https://doi.org/10.3390/electronics12081849

Chicago/Turabian Style

Liu, Haitian, and Rong Jiang. 2023. "A Causal Graph-Based Approach for APT Predictive Analytics" Electronics 12, no. 8: 1849. https://doi.org/10.3390/electronics12081849

APA Style

Liu, H., & Jiang, R. (2023). A Causal Graph-Based Approach for APT Predictive Analytics. Electronics, 12(8), 1849. https://doi.org/10.3390/electronics12081849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Causal Graph-Based Approach for APT Predictive Analytics

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. Definitions

3.2. Architecture Overview

4. Proposed Architecture

4.1. Dependency Graph Management

4.2. Malicious Sequence Learning

4.2.1. Malicious Events Sequence Lemmatization

4.2.2. Attack Scenario Graph Construction

Evolving Graph-Based Attack Scenario Construction

Neighborhood Graph-Based Attack Scenario Construction

4.2.3. Deep Learning Model

4.3. Attack Predictive Analytics

5. Results

5.1. Implementation

5.2. Datasets

5.3. Malicious Event Prediction Performance Evaluation

5.4. Varying Potential Options-Based Prediction Analysis

5.5. Sequence Length Evaluation

5.6. Case Study for APT Predictive Analytics

6. Discussion

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI