1. Introduction
Cyberattacks have become more common, which can often cause significant economic damage and can even hinder the operation of core public services. In addition, advanced, persistent cyber threats have recently re-emerged due to the advent of the Internet of Things and the increased number of interconnected devices [
1]. For example, in May 2017, the “WannaCry” ransomware attack was detected after targeting 200,000 servers in over 150 countries [
2]. In the same year, another form of the same attack caused disruptions to most government websites and several companies in Ukraine, and eventually, the attack spread globally [
3].
The forms of network attacks are complex and diverse, and more types of attacks have changed from simple one-step attacks to new composite attacks [
4]. In response, the techniques used by attackers to attack computer systems and networks have reached an unprecedented level of sophistication, using a combination of multiple steps to achieve their goals in a premeditated manner, represented by the presence of advanced persistent threats (APTs) [
5,
6].
It requires the execution of a series of attack stages; however, the individual stages may be benign or malicious, and very occasionally, each attack stage can behave as a benign stage without raising any suspicion. In addition, attacks may last for weeks or years, and traditional intrusion detection systems (IDSs) may not be able to detect these attacks due to the time variation between attack stages [
3]. A new intrusion detection model is necessary to address these threats, identify the ongoing attacks early, and anticipate the attacker’s further strategies as much as possible. This approach will provide network analysts with a foundation for preventing attacks. Yet, detection and prediction of various types of dynamic attacks is always a challenging task [
7].
In this regard, a variety of different mechanisms can be used to achieve the detection and prediction of multi-stage attacks. These mechanisms include discrete models such as attack graphs, Bayesian networks, Markov models, and game theory or continuous models such as time series and grey models. Among the various graph models, causal graphs appear to be an ideal threat analysis approach, linking causal events in a system, with powerful semantic representation and attack history correlation capabilities.
Audit log data are a good source of information for online monitoring and anomaly/attack detection, considering that they record system status and significant events at various critical moments to help debug performance problems and failures, and for root cause analysis. In addition, as system logs record noteworthy events that occur during active running processes, such log data are universally available in almost all computer systems. This makes it a natural advantage to construct causal graphs from audit log data, which is moreover a very common practice.
However, the prediction of multi-stage attacks based on causal graphs remains an open problem, and previous research on cyber threat events based on causal graphs has mostly stopped at the detection of malicious events that have already occurred and the tracing of attack scenarios, while rarely considering the speculation of specific malicious attacks that will occur next.
When faced with predicting malicious events in multi-stage attacks based on audit log data and causal graphs, there are a number of challenges that need to be considered, including but not limited to the following challenging issues.
- (1)
The log data themselves are unstructured and may come from different operating platforms, their format and semantics may vary from platform to platform, and it is already challenging to use unstructured logs to diagnose problems
- (2)
How can one reduce log complexity, minimize data storage size, and balance the space efficiency of causal graph storage with the time efficiency for attack investigation while maintaining the original semantics in audit log data?
- (3)
Despite experiencing the same type of attack, there is no definitive pattern indicating that a malicious event will always precede or follow another. It is possible to observe unrelated noisy events first or multiple malicious events occurring simultaneously from different adversary groups with distinct attacks.
- (4)
How can one design an efficient and robust malicious event prediction algorithm that ensures prediction accuracy while minimizing the response time for investigative forensics?
- (5)
There is a lack of a standard public dataset available to provide real log data from different multi-stage attacks and similarly, a lack of a solid standard to quantify and measure the malicious event prediction performance of such an architecture
In order to meet the above challenges, this paper proposes a novel multi-stage attack malicious event predictive analysis framework, combined with natural language processing and deep learning techniques. Log data are generated by programs that follow a rigorous set of logic and control streams, similar to natural language. Event log entries can be viewed as sequence elements that follow specific patterns and grammar rules, despite being more structured and constrained. It is believed that different attacks may share similar abstract attack strategies and that the key stages usually share comparable patterns at the entity and action level, resulting in the triggering of a similar sequence of malicious events. When forecasting, it is possible to predict the remaining malicious events that will occur next, given a series of known malicious events. The task at hand involves developing a sequence prediction function that accurately learns and predicts the sequence of malicious events:
which accepts a variable length input sequence
and predicts the target event
. Based on the advantages of deep learning itself, our prediction system should be able to understand and make predictions given the event sequence of variable length as context, which will better meet real-world scenarios and different prediction needs.
Although fixed-length prediction may not always be ideal, it still holds value. The use of sequence length evaluation helps to determine whether the predicted results are primarily due to long-term memory or short-term memory. Furthermore, it provides insight into the circumstances under which the best results for predicting malicious events can be achieved. This article is organized as follows.
Section 2 presents related work on cyber threat prediction analysis, focusing on different use cases based on the graph models. In
Section 3, it introduces the definition and the overall framework, and in
Section 4, it introduces the proposed architecture itself more specifically. In
Section 5, the performance evaluation and measurement of the model are given. Finally, the paper is summarized in
Section 6, paving the way for future work.
2. Related Work
Unlike common network attack detection, the analysis and investigation of an attack often begins after the attack is completed. Predictive analysis of network attacks is more focused on timeliness and is faster to be effective so that users can intervene in ongoing attacks or system performance issues.
Typically, predictive methods in cybersecurity use discrete models to represent attacks or network security situations. Clear examples are graphical models of attack processes or game-theoretic representations of interactions between attackers and defenders.
Figure 1 provides a simple profile of the cybersecurity use case, based on the discrete models that compose the approach under consideration. When it comes to predictive analytics of multi-stage attacks, the focus is primarily on attack projection. This involves recording the attacker’s behavior and constructing an attack description for future reference. If a series of events conform to the attack pattern, it can be assumed that the attack will continue along the same lines. In addition, researchers may be more interested in predicting novel attacks rather than analyzing previously observed attacks. Alternatively, researchers may prioritize forecasting the overall security situation rather than examining individual attacks.
Among the various graph models, the attack graph is a graphical representation of an attack scenario proposed by Phillips and Swiler [
8] in 1998 and quickly became a popular formal attack representation. Hughes et al. [
9] provided an effective method for analyzing and predicting network threats based on network models in 2003, which is considered as one of the earliest practical approaches to attack graphs and an effective static analysis method. Based on this, Polatidis et al. [
10,
11] constructed attack graphs using information about the underlying infrastructure and proposed a method for predicting network attacks using attack graphs and recommendation systems.
Another practical approach for attack prediction is using Bayesian networks, which are closely related to prediction methods based on attack graphs, as Bayesian networks are often constructed based on attack graphs. For example, Bayesian attack graphs are attack graphs in the form of Bayesian networks [
12].
Hidden Markov models (HMMs) have been widely used in intrusion detection and attack prediction methods due to their ability to eliminate the dependence on complete information in graphical models, particularly when unobservable states and transitions exist. An early example is the alert correlation and prediction system proposed by Farhadi et al. [
13] in 2011, which uses the Attack Scenario Extraction Algorithm (ASEA) to correlate and extract important alerts and then applies HMM for predictive analysis of intruder behavior. Another example is a new method based on HMM proposed by P. Holgado et al. [
14] in 2020, which considers hidden states as similar stages of specific types of attacks and can easily adapt to multi-stage attacks and anticipate the attacker’s subsequent stages. Similarly, T. Shawly et al. [
15] proposed a novel framework in 2021 based on HMM modeling to address the challenges of modeling and detecting complex network attacks (such as multiple interleaved attacks), which have not been addressed by previous methods.
Unlike the various graphical methods mentioned above, knowledge graphs are more geared toward dealing with larger and more dynamically changing real-time network attacks. For example, Jia Yan et al. [
7] proposed a method for network security knowledge graph and deduction rules based on the five-tuple model in 2018. Qi et al. [
4] further stored prior knowledge in the network security knowledge graph and attack rule library as computer-understandable data and then mined attack chains from massive data with temporal and spatial constraints, thus proposing an attack analysis framework for a network attack and defense testing platform.
In contrast to other graph models, causal/dependency graphs are often not directly applied to the problem of proactive attack prediction analysis but are widely used as a promising tool for the problem of APT attack detection. It has a strong abstract representation and relatively high efficiency to abstract the interactions between components in opaque systems through a high-fidelity and visible approach, enough to link events in the system with cause and effect, regardless of the time between events. Thus, a comprehensive understanding of the entire attack is possible, which provides a natural observation platform for the predictive analysis of cyberattacks.
Causal graphs are more commonly used in the detection of APT and the backtracking of attack scenarios and are also known as dependency graphs or provenance graphs. In Backtracker [
16,
17], researchers first explored the problem of piecing together the causal chains leading to an attack, i.e., the concept of attack tracing, based on the dependency graph for OS-level attack tracing, where backtracking is able to traverse the entire historical context of system execution by given a detection point. Subsequent studies [
18,
19] have improved the accuracy of the dependency chains constructed by Backtracker. However, these efforts run in a purely forensic setting, i.e., backtracking all relevant events of the entire attack scenario, which requires a complete traceability graph and excessive manual intervention that is neither timely nor efficient. It cannot cope with the analysis of attack activities executed in real time, much less include proactive attack prediction.
For the current causal graph-based threat analysis system, first, a comprehensive system can be divided into three modules: the data collection module, the data management module, and the threat detection module. Each module contains several components that address different research questions. In the end-to-end model, each module can be considered independently of the other. In the proposed causal graph-based malicious event prediction model, the first two modules are not significantly different from other traceability/causal graph-based threat detection systems.
Second, an ideal traceability graph-based threat analysis system needs to consider three attributes simultaneously: fast response, high efficiency, and high accuracy [
20]. However, even after pruning, the size of the causal graph is very large. Therefore, threat analysis based on causal graphs may introduce high space and computational overhead. In previous work on causal graph-based threat detection, many attempts have been made by researchers to find a balance between these three properties. Based on the main detection designs, these approaches can be classified into three categories.
Finally, these approaches can be broadly divided into three categories based on the main design options for attack detection. The tag propagation-based approaches (Hossain et al., 2017 [
21]; Milajerdi et al., 2019 [
22]) try to store system execution history incrementally in tags and utilize the tag propagation process to trace the causality. These algorithms have a roughly linear time complexity. Moreover, they can take streaming graphs as input and respond fast. The abnormal detection approaches (Hassan et al., 2019 [
23]; Liu et al., 2018 [
24]; Xie et al., 2018, 2021 [
25,
26]) try to identify anomalous interactions between nodes. Therefore, these approaches will simulate normal behaviors by collecting historical data or data from parallel systems. The graph-matching-based approaches (Han et al., 2020 [
27]; Liu et al., 2019 [
28]) try to identify suspicious behavior by matching sub-structures in graphs. However, graph matching is computationally complex. Researchers have tried to extract graph features through graph embedding or graph sketching algorithms or using approximation methods.
As a powerful artificial intelligence technology, deep learning has been widely applied in various fields such as computer vision, natural language processing, and bioinformatics. It can learn complex patterns and relationships and extract valuable information from large amounts of data. This makes it possible to combine with various traditional models and algorithm techniques, greatly improving the automation and efficiency of models. In recent years, it has also been used in threat prediction tasks based on various function environments. For example, Deeplog proposed by Du et al. [
29] uses a deep neural network model with long short-term memory (LSTM) to model system logs as natural language sequences. However, although it makes predictions, it is essentially an anomaly detection model rather than a prediction model. If the error between the predicted and observed value vectors is within the high confidence interval of the Gaussian distribution, the parameter value vector of the incoming log entry is considered normal; otherwise, it is considered abnormal. Deepag [
30] further proposes a new method for threat detection and attack path prediction using bi-directional deep learning based on Deeplog. Unlike the previous two methods, Tiresias [
31] does not consider system logs but models security events themselves and demonstrates the feasibility of predicting security events through a recurrent neural network with recurrent memory cells, filling the gap in predicting the specific steps that attackers will take when carrying out attack activities. In addition, there are other examples of attack prediction, such as the prediction of system calls [
32] and the combination of attack prediction and network security situation forecasting, using deep learning to predict different types of threats [
33,
34]. Prior to this, research on predicting attacks focused more on binary results. For example, Bilge et al. [
35] proposed a system that analyzes the binary appearance logs of machines to predict which machines are at risk of infection, that is, whether attacks will occur.
3. Preliminaries
3.1. Definitions
Causal Graph. A causal graph G is a data structure extracted from audit logs, typically used for traceability tracing, indicating causal relationships between subjects (e.g., processes) and objects (e.g., files or connections). The causal graph consists of nodes, which represent subjects and objects, connected to edges, which represent actions (e.g., read or connect) between subjects and objects. In this study, a directed cyclic causal graph is considered, with edges pointing from a subject (source) to an object (destination).
Neighborhood Graph [
36]. Given a causal graph, two nodes u and v are said to be neighbors if they are connected by an edge. The neighborhood of node n is a subgraph of
G consisting of node n and the neighboring nodes of node n with edges of node n. Similarly, a uniform neighborhood graph is created by extracting all nodes {n
1, n
2, …, n
n} and the edges that connect them to their neighbors.
T-Evolving Graph [
16]. Each entity in the evolving graph has a time threshold associated with it, which is the longest time an event can occur and be considered relevant to that entity. Based on this, the malicious entity detected at the current time point is used for initialization, and back-tracing is performed based on its acceptable time threshold T. By taking into account all the events that would have an impact on the malicious entity before T, a T-evolving graph based on the malicious entity is constructed.
Entity. The entity e is the unique system subject or object extracted from the causal graph, where it is represented as a node. The entities under consideration comprise processes, files, and network connections (i.e., IP addresses and domain names). For example, winword.exe_21 is a subject that represents a process instance of the MS Word application with a process name and ID, while 192.0.0.1:80 is an object that represents an IP address 192.0.0.1 with a port number 80.
Event. An event ε is a quartet (src, action, dest, t) where the source (src) and the destination (dest) are the two entities associated with an action. t is the event timestamp that shows when the event occurred. For example, given an entity Firefox.exe, an action open, and a timestamp t from node Firefox.exe to node Word.doc, then (Firefox.exe, open, Word.doc, t) is the event in which the Firefox process opens the Word file at time t.
3.2. Architecture Overview
Figure 2 depicts the proposed malicious event prediction framework, which combines natural language processing and deep learning techniques integrated into the causal graph. The framework is divided into two components, based on the training of the deep learning model and the real-time working process: (a) deep model learning based on malicious event sequences and (b) predictive analysis of malicious events in real time.
During the deep model learning based on malicious event sequences, existing attack patterns are mined using causal graphs, and their internal logic is understood through a concept similar to natural language processing. These attack patterns are then learned using deep learning techniques.
To increase the automation of multi-stage attack analysis, support fast detection and real-time analysis, and reduce storage and operation overhead, historical audit log data from various sources are transformed into a platform-independent causal graph by extracting essential system operation events and storing them in a graph data structure. The causal graph structure is stored in a graph database, which is a commonly used NoSQL database that stores data as nodes with edges and provides a semantic query interface for network analysts. This enables the execution of graph algorithms, such as backtracking and graph alignment, with ease.
Next, known malicious event sequences are extracted from the graph database, and the sequences are converted into a generalized context that represents the sequence pattern of semantic interpretation using word form reduction (lemmatization). The known multi-stage attack scenarios are then restored by combining the malicious event sequences after lemmatization. This paper considers both the neighborhood graph and the T-evolving graph for malicious event sequence construction. Lemmatization enables the effective grouping of words into different granularity individual terms for different levels of word forms, meeting the needs of network analysts for various levels of attack analysis and malicious event prediction.
To increase the level of automation during attack analysis and minimize the need for expert analysis knowledge, word embedding [
37] is utilized to map word sequences to real vectors. Multi-stage attack scenarios across different instances are learned using suitable deep learning models, similar to audio generation and short sentence text completion. This enables the model to predict potential future attacks, automate the recommendation of possible future malicious events, and provide network analysts with the probability associated with the potential occurrence of a malicious event.
During the predictive analysis of malicious events in real time, network analysts can start the attack investigation from unknown audit logs with identified attack symptom entities (e.g., malware or suspicious host names), while restoring a partial graph of attack scenarios that have currently occurred based on the concepts of neighborhood graphs and T-evolving graphs. Considering that there is no absolutely accurate standard pattern that a malicious event will follow or precede another malicious event even if the same type of attack campaign is being experienced, based on the idea of the Top N recommendation system, this paper converts the prediction problem of malicious events into a problem of ranking the probability of occurrence, where the model gives a prediction score for different possible options for the next malicious event, thus obtaining the most likely N malicious events, which are kept in the attack scenario graphs and recommended to the network analysts.
5. Results
5.1. Implementation
The main code related to this experiment was mostly implemented in Python. This study used Anaconda3 to build a development environment and then implemented the deep learning network and malicious event prediction algorithm through tensorflow2.3 and its corresponding high-level application programming interface (API) keras2.4.3.
This study used the self-developed JanusGraph native graph database for log event storage and management, while supporting a variety of databases as storage backends, which could be combined with existing big data processing frameworks in Apache to provide graph-based big data analysis capabilities.
5.2. Datasets
The lack of publicly available attack datasets and syslog data is inherently a challenge for APT attack detection and predictive analysis. Researchers can combine natural language processing techniques to automatically extract attack behavior data directly from APT analysis reports published by major security companies [
47] and further transform them into causal graphs. However, the lack of real available attack environments and the corresponding standard dataset itself remains unavoidable.
To address this issue, Alsaheel et al. [
36] reduced and simulated 10 relatively realistic APT attacks from APT research reports from known security firms and generated 24 h of audit log data from them, which consisted mainly of system events, in addition to DNS queries and browser events. In this paper, the causal graphs generated based on them were used as the baseline experimental dataset, and each attack that exploited different vulnerabilities is detailed in
Table 2.
This dataset consisted of 6.7 G of audit log data, with an average of 20 K unique entities generated from ten attack simulations and over 200 K events per attack campaign, with more than 17 K malicious events directly related to malicious entities. Among them, S-1 to S-4 were logs formed by attacks on single hosts, while M-1 to M-6 were attacks against multiple hosts, which spanned multiple hosts (multi-host attacks). In the dataset, the many different attack characteristics included in these attacks were given, such as phishing email links, phishing email attachments, injection, and lateral movements (e.g., leaking sensitive data).
5.3. Malicious Event Prediction Performance Evaluation
The effectiveness of performing predictive analytics of malicious events is evaluated in
Table 3,
Table 4,
Table 5 and
Table 6. The single-host dataset (S-1) and multi-host dataset (M-2) were used as test sets for the experiments, while other APT attack campaigns served as the training datasets for the experiments, respectively. Among them, S-1 was a single-host strategic web compromise attack campaign, which exploited the same 2015-5122 vulnerability as M-1, while M-2 was a multi-host targeted GOV phishing attack campaign, which exploited the 2015-5199 vulnerability only. However, it could be seen that the different attack campaigns all shared the same attack features.
In order to evaluate the effectiveness of deep learning-based prediction ways, various forms of LSTM models, the Transformer model [
54], and the temporal convolutional network (TCN) model [
55] were considered in the study. In this study, both the Transformer model’s encoder layer (Transformer-Encoder, TE) and the TCN model’s causal convolution layer and residual block module (Causal Convolution Residual, CCR) were used for feature extraction, serving as a comparison with the proposed model. A unified Soft-max layer was used for the probability output. For example, in the TCN model, only three layers of feature extraction were used, achieved by stacking three layers of causal convolution and residual block modules. In addition to the deep models proposed in
Section 4.2.3, basic LSTM models and LSTM models with convolutional layers for feature extraction were also tested as ablation experiments.
In
Table 3 and
Table 4, the tested models based on the evolving graph were used to predict the source, action, and destination of malicious events on both the single-host dataset (S-1) and multi-host dataset (M-2). In
Table 5 and
Table 6, the prediction results were analyzed based on the attack scenarios constructed using the neighborhood graph. In both cases, the evaluation included the loss value of the binary_crossentropy loss function for predicting the malicious events, which provided insight into the model’s fitting performance, as well as the accuracy for correctly predicting the classification, which intuitively judged the reliability of the model’s predictions. (The optimal values for each column in tables are highlighted in bold.)
The experimental results showed that there was a significant difference in the accuracy of the prediction results for different test sets, and generally, the predictive models showed better predictability for the APT-M-2 test set. This will be further analyzed in the sequence length evaluation in
Section 5.5.
In this study, the proposed model was preferred for predicting malicious events as it generally yielded better accuracy. However, it was evident that this advantage was not absolute, particularly from the loss value analysis, which suggested that the model may not converge well, thus affecting the confidence in the prediction results. For instance, for the same prediction result, two different probability outputs of (Pro: 0.51) and (Pro: 0.91) could yield the same result but with completely different levels of confidence.
Furthermore, through ablation experiments, it was found that a simple LSTM model was clearly unable to adapt to the accurate prediction of malicious events. However, by using a feature extraction method with convolution layer and stacked BILSTM layer, the stability of the model could be effectively improved. This was reflected in the high accuracy of the model in most cases. From this perspective, the construction of the model was undoubtedly effective.
Interestingly, for the proposed model, the prediction accuracy for each element of the malicious event triplet in the test set reached a range of 96–99%. Although this inevitably accompanied some overfitting effects, in terms of results, it had already learned well the development rules of events under different circumstances. However, due to the differences between the training set and the test set, even using more complex models cannot guarantee a significant improvement in performance on the test dataset. In this case, we need other ways to obtain higher performance gains.
5.4. Varying Potential Options-Based Prediction Analysis
The absolute sequence of malicious events associated with a particular attack activity cannot be guaranteed, even for the same attack activity. There is no absolute pattern that an event will follow or precede another event, and there is also no guarantee whether other security events unrelated to the coordinated attack will be observed in the causal graph, which includes but is not limited to different attack activities from different groups of adversaries occurring at the same time. The expectation is to provide multiple possible options for the next malicious event simultaneously, rather than a single predictive target.
In this paper, the obtained malicious event sequences were inputted into a deep learning model with a word embedding layer. The model generated a prediction score for the next malicious event, providing varying potential options for the related entities and actions, resulting in the most likely N recommendations. For example, in
Table 7 and
Table 8, based on the CCR, TE, and the proposed stacked model, the impact of the number of recommendations on the prediction accuracy is demonstrated in the Top-1 to Top-3 range for both APT-S-1 and APT-M-2 malicious activities. (The optimal values for each column in tables are highlighted in bold.)
In the prediction, also based on the method of Top-N, the prediction of event sources was conducted first. For the known set of malicious event sources, the action and the destination of the event related to the sources with the highest probability were predicted. The prediction scores of different (source, action) binary groups and (source, action, destination) malicious event triples were given, and the most probable N malicious events were finally obtained. The Top-N malicious events were kept in the attack scenario graphs, and the recommendations will be presented. The value of N depended on the desired accuracy threshold θ, that is, for all predicted events, the sum of the predicted probabilities output by the model, Sum(Pro) > θ.
In fact, this presentation was far more valuable than predicting only the most likely malicious event because for the most possible malicious events, even if they did not occur at the next event point, there was a high probability that they would occur afterward, except that there was an absolute order of precedence among them.
5.5. Sequence Length Evaluation
Generally, understanding how deep learning models work is challenging as they are often viewed as black boxes. To consider more complex working scenarios, various time series prediction models including RNNs rely not only on long-term memory but also on short-term memory, especially in filtering out noise.
The objective of this study was to determine the relative influence of long-term memory and short-term memory in decision making. Short-term memory refers to a system that relies on only a few elements of the sequence to make a decision, specifically, the elements closest to the system’s prediction target. On the other hand, long-term memory refers to a system that utilizes the entire sequence or a significant portion of it to predict the next security event.
Intuitively, an increase in the number of observed events is not expected to improve the performance of the model if short-term memory dominates. To determine the type of influence that may be more significant to the model, the analysis of neural network weights was avoided, and instead, the impact of different historical event sequence lengths on the final prediction accuracy was examined.
As shown in
Figure 7 and
Figure 8, the prediction accuracy for different prediction targets was given for historical event lengths of 1–40 in two different test sets. It can be seen that the overall prediction accuracy was higher in the APT-M-2 test set. For both test sets, a sufficiently long historical event length could significantly improve the prediction performance in most cases.
Similarly, sequence length evaluation can also reveal the specific reasons for the prediction performance of different models in different datasets. As shown in
Table 5, for the APT-S-1 test set, the source of the event often cannot be predicted well when predicting malicious events based on the extracted malicious event sequences from the neighborhood graph. In
Figure 7, it can be seen that the CCR, TE, and proposed models made erroneous predictions when influenced by both short-term and long-term memory when making this prediction. Taking the causal convolution residual (CCR) structure as an example, a longer historical event length often further improved the prediction accuracy when the historical event length reached 35, but the effect sharply declined when the historical event length exceeded 35. Therefore, controlling different historical event lengths can often greatly optimize the model performance.
5.6. Case Study for APT Predictive Analytics
Finally, the proposed model was demonstrated through a case study that depicted its operation using system-level causal graphs and predictive attack analytics. The case study refers to the M-5-Pony campaign attack discussed in
Section 5.2, where log data from the initial stages of the attack were analyzed. The native graph database facilitated the extraction of attack scenario graphs for the current Pony campaign attack from the overall causal graph using the method outlined in
Section 4.2.
As depicted in
Figure 9, the malicious file (ID: 41459768; Name: c:/users/aalsahee/ downloads/msf.doc) was discovered at the latest time point. The original attack scenario graph was constructed using the attack evolving graph, as shown in
Figure 9a. However, the current time node was traced back based on the preceding 20 events only. After the lemmatization was completed, the simplified scenario was obtained as shown in
Figure 9b.
A careful analysis of the entire back-tracing process required us to first focus on events that were directly linked to the msf.doc file, such as <firefox.exe write(18) msf.doc> and <explorer.exe read(20) msf.doc>. These events illustrated how the malicious file msf.doc compromised other processes on the machine. The rationalization based on this attack scenario was that the process was ongoing and would continue, i.e., the highest probability event prediction given by the model <programfiles_process write-or-read user_file\msf.doc> could be derived from this trend. In fact, in the following time, by <c:/programfiles/*/winword.exe_35 write ~$msf.doc>, the malicious file would further compromise the word process on the victim machine.
Figure 10 illustrates the long-term perspective of the Pony campaign attack using log data spanning a longer time scale than the case study. This approach provided a more comprehensive understanding of the attack campaign. It depicted the transition from the local Firefox browser establishing a connection with the remote server (event <192.168.223.3 connected_remote_ip(2) c:/programfiles/mozillafire- fox/firefox.exe_3776>) to the C&C server communicating with the local mshta.exe program (event <192.168.223.3 connected_remote_ip(20) c:/windows/system32/mshta.exe_264>).
This included the main part of the compromising process starting from acquiring the msf.doc malicious file (event <0xalsaheel.com web_request(7) 0xalsaheel.com/msf.doc>) to that document compromising other processes on the victim’s machine. Based on that scenario, the two most likely new events to occur were <IP_Address connected_session session> and <system32_process connect connection>, with the former correctly predicting the upcoming event (event <192.168.223.3 connected_session(21) session_192.168.22- 3.130_51794>), while the latter, coincidentally, further referred to another event after that (event <c:/windows/system32/mshta.exe_264 connect(23) connection_192.168. 223.13- 0_192.168.223.3>). As a neighborhood graph that contained only part of the earlier malicious campaign, it revealed enough information to us.
In contrast to the domain graph, the evolving graph did not need more known malicious entities as a prediction analytics basis and could start from a single malicious event to trace back the whole chain of events leading to the creation of that event and use it as a dependency for inference analysis. However, due to the lack of a global perspective related to the whole APT attack campaign, it was difficult to simply depict the dynamic global picture of the whole attack activity development experience, and the short-term attack back-tracing could only be used to understand the local working mechanism. The advantages and disadvantages between the two cannot be easily judged in short-term prediction analytics, but long-term prediction of the whole attack campaign must require richer and more refined prior knowledge, and attack scenarios built on the collective of multiple malicious entities would be a better choice
6. Discussion
Predictive analysis and detection analysis of network attacks are important technologies in the field of network security, but their purposes and advantages are different. In general, predictive analysis predicts potential security issues in the future by analyzing and modeling existing data. Compared with attack detection, it can help security teams detect potential threats before attacks occur, greatly improve security response speed, enhance security defense capabilities, provide sufficient time for security teams to respond to threats, and take corresponding measures to prevent them. However, compared with the relatively mature attack detection and prevention measures, how to effectively study opponents’ attack strategies and make effective predictions is still a major challenge in attack prevention.
Although there have been many studies on attack patterns in the previous literature, in fact, due to the task being too generic, there are not many common elements in the proposed approaches. For example, the attack graph mentioned earlier in
Section 2 is more concerned with static configurations, HMM is concerned with alert correlation, and the causal graph is often constructed through log events, making it impossible to compare their performance through conventional methods. In this regard, this paper compared and analyzed the predictive research on network attacks in recent years mentioned earlier in
Table 9 and summarized the application backgrounds, specific attack types, methods used, and datasets used for testing in different studies. In addition, some of the methods used in some papers have certain similarities but are used for different research purposes. For example, in [
14,
15], although the same method is used, they have different application purposes, with the latter focusing on detection at the attack stage and the former focusing on predicting alerts.
Complex multi-step attacks, such as APTs, pose a significant threat to network security due to their diversity and complexity. Attackers develop specialized attack strategies and plans based on the characteristics and vulnerabilities of the target system to ensure the success and persistence of the attack. Moreover, in many appropriate case studies, these behaviors can last for several months or even years, as intruders repeat the exfiltration process many times. This also provides network analysts with certain information for attack analysis. Due to the complex nature, detecting and predicting APT attacks has always been the mainstream of various types of security prevention work.
In the study, attack scenarios were learned through a simple deep learning model, and in
Section 5, different deep learning structures were further discussed for attack pattern learning and prediction performance. The study found that by slightly controlling the potential options for predicting malicious events, the accuracy of event prediction could be effectively controlled in a high range. Further sequence length evaluation revealed that with sufficient historical data, prediction accuracy could be greatly improved.
However, due to the characteristics of the current deep models and the structure of the attack scenario graph that is being used, it is difficult to guarantee long-term prediction of malicious events and even more difficult to complete the global generation of the entire attack scenario. In addition, it is still necessary to manually intervene in how to restore the lemmatized malicious events to the actual system entities. Furthermore, determining how to better construct the attack scenario graph; effectively extract events that are most directly related to the attack activity; and exclude irrelevant, duplicate, and redundant events are necessary in order to intuitively and effectively standardize the causal relationship of the attack activity process. For example, in the neighborhood graph, all events directly related to malicious entities are considered malicious events, but in fact, only events that actually cause damage to the system directly or indirectly are needed.