1. Introduction
Due to the constant development of cyber threats, various defense solutions need to be continuously improved. In addition to developing prevention systems, it is also necessary to focus on detection systems that help to obtain information about threats and attacks. The detection of malicious actions is one of the most critical cybersecurity issues. Intrusion detection refers to the detection of specific patterns or anomaly observations. Nowadays, however, we need to preventively anticipate upcoming harmful activities so that we can react to them and prevent an attack in time before it causes some damage.
Attack prediction study is not as prevalent as detection. Therefore, it is necessary to explore this area of interest because it is beneficial for the entire field of cybersecurity. To predict attacks, it is necessary to examine how they proceed and what steps are being taken. These data can be used to continually improve the systems to detect each phase of the attack. In this way, it is possible to detect the earlier stages of the attacks and predict how they proceed.
Early detection and prediction of cybersecurity incidents, such as attacks, is a challenging task. The threat landscape is continuously evolving, and even with the usage of intrusion detection systems, advanced attackers can spend more than 100 days in a system before being discovered [
1]. After the detection of a security incident, we need to determine how the attack will proceed. This is essential because if we can stop the attacker in time, they cannot do as much damage.
It is important to learn from existing attacks so that we can develop tools to find out if such attacks have been repeated. Attack modeling is an intrusion-based methodology that allows one to focus on the different stages of an attack. It is aimed at focusing on different stages of attacks. By identifying attacks at different stages and by implementing tools to disarm the attacks at their various stages, one can take preventive measures to ensure that similar attacks will be detected. It is important to have a layered model to ensure that if one of the defense systems is bypassed, there is another defense line to protect one’s organization’s assets. That is why we need to establish a multi-layered model of cyber attacks.
In recent years, it has not been sufficient to only be alerted of a security incident. Prevention of the attack altogether has become a necessity. The highest priority in computer security is to prevent an attack and stop the attacker from doing damage. If the path of an attack can be predicted, one has the ability to avoid attacks at every phase. By looking at a survey of the technology, from the host to the network level, one will have an opportunity to study tools or solutions that can be used in protecting against these threats. There are numerous existing prevention methods that are able to stop attacks in progress.
Recognizing an attack’s steps is the goal of many cybersecurity analysts. The authors in [
2] categorized prediction methods into three categories. An overview can be seen in
Table 1.
The research is focused on early-stage detection and it is based on attack prediction, especially attack projection. This area focuses on the prognosis of the future steps of the attack. The projection of the future stages of an active cyber attack is essential in the context of Cyber Situational Awareness. The attacks often occur over an extended period of time. They involve a lot of steps and use multiple techniques for reconnaissance, exploitation, and obfuscation activities to achieve the attacker’s goal. Therefore, it is not sufficient to just detect new or ongoing threats. The projection of future attack steps is deduced from already detected malicious activities. The estimates of current attack tactics may be used to assess imminent threats to critical assets [
3].
This paper is based on the previous research of [
7] and further develops research conducted by Ramaki et al. [
8]. Based on the above-mentioned considerations, we state the following research sub-goals:
To propose a multi-stage model suitable for attack projection and early-stage detection, and
to design a model for early-stage detection of a cyber attack.
This paper is divided into seven sections. In
Section 2, which is focused on related work and existing methods, the analysis of the current approaches of cyber attack prediction is provided.
Section 3 presents the drawbacks of existing models and describes the suitable cyber attack model in detail. Subsequently, in
Section 4, we propose the approach for early-stage detection of cyber attacks. This includes all of the necessary steps for data processing, alert aggregation, and causal relationship discovery. This section also covers the definition of Bayesian networks. After that, the model for the construction of the Bayesian network and prediction of cyber security alerts is proposed.
Section 5 focuses on preprocessing and analysis of the data collection, including the creation of cyber alerts. The example cases of methods for aggregation, causal relationship discovery, and Bayesian network construction are shown.
Section 6 presents and discusses the results of the presented methods. Concurrently, it describes groups of alerts and some of the attack paths. In the last section, the conclusion is provided.
2. Related Works
A large number of cyber attack prediction methods use discrete models and graph models, such as attack graphs, Bayesian networks, or Markov models.
In 1998, an attack graph was introduced by Swiler and Phillips [
9]. It is a graphical representation of an attack scenario, and it has happened to be a popular method for formal description of attacks. It has become a foundation for other approaches, e.g., methods using Bayesian networks, Markov models, and game-theoretical methods. Their goal was to create a tool for qualitative and quantitative assessment of vulnerabilities. The approach was a great success because it examined a network security state from the system perspective.
Cao et al. [
10,
11] proposed another variant of the attack graph—the factor graph. It is a probabilistic model that consists of random variables and factor functions. In this paper, it is compared to Bayesian networks and Markov random fields. They used the factor graph to predict attacks with an accuracy of 75% over a dataset of actual security incidents (several years of reports).
The RTECA (Real-Time Episode Correlation Algorithm) was proposed in 2014 by Ramaki et al. [
12]. It can be used to detect and predict multi-step attack scenarios. They explain the theoretical and functional implications of the creation of such a tool. Although they propose leveraging the attack graph, the authors have widely used causal correlations in their method.
The authors in [
13] developed a method for correlating the intrusion alerts. It produces correlation graphs, which they use for creating attack strategy graphs. They presented techniques for automatically learning attack strategies from alerts raised by intrusion detection systems. These methods extracted attributes relevant to determining an attack strategy, which is represented as a directed acyclic graph, which they called an attack strategy graph. The nodes are known attacks, and the edges between them represent the order of attacks and relationships between them. They also developed a method for easier computer and network forensic analysis. It measures the similarity between sequences of alerts based on their strategies. Their research showed that the proposed methods can successfully extract invariant strategies from alert sequences and can also determine the likeness of those sequences. It can be widely used in identifying attacks that could have been missed by detection systems.
In [
14], Li et al. presented another approach based on attack graphs. They described the generation of attack graphs constructed on a data mining approach. The algorithm they proposed uses association rule mining to get multi-step attack scenarios from Intrusion detection system (IDS) alert database. After that, the attack graph is created. The method is also used for calculating the predictability of the attack scenario. It is used for ranking the real-time detection and can help with intrusion prediction.
Liu and Peng [
15] developed a game-theoretic framework used for attack prediction. The proposed method can quantitatively predict the probability of attack actions. It can also predict the strategic behavior of the attacker. Thus, it can optimize the precision of correlation-based prediction. This paper presents the first complex framework for motive-based modeling and inference of attackers’ intents. In conclusion, the goal of this method is modeling and inference of attack intents, objectives, and strategies.
Wu et al. [
16] used another attack prediction method using Bayesian networks. These methods are related to approaches based on attack graphs because a Bayesian network is built from an attack graph. The distinct characteristic of Bayesian networks is the conditional variables and probabilities that are considered in the model.
A Bayesian network is a probabilistic graphical model that describes the variables and the relationships between them. The network is a directed acyclic graph (DAG), where nodes represent the discrete or continuous random variables and edges depict the relationships between them. Each variable has a finite set of mutually exclusive states. The variable and direct edge form a DAG. To each variable A with parents
, there is attached a conditional probability table
[
2].
Ishida et al. [
17] proposed forecast techniques for fluctuation of attacks. They used Bayesian inference for calculating the probability of increase or decrease of the attacks. Two algorithms were considered in this paper—focusing on the attack cycle and the fluctuation range of the number of events. Because the event counts of some attacks change frequently, the proposed algorithms based on Bayesian inference were used for predicting the probability, since it can calculate event counts directly. Subsequently, they implemented the forecasting system and tested it on real IDS events.
A real-time alert correlation and prediction framework was introduced by Ramaki et al. [
8]. The system includes an online and offline mode. In online mode, the attacker’s next move is predicted by the Bayesian attack graph. In the offline mode, the Bayesian attack graph is constructed of low-level alerts. The authors used the DARPA 2000 dataset for research. The prediction accuracy was found to increase with the duration of the scenario for the attack. Thus, accuracy ranged from 92.3% when processing the first attack step to 99.2% when processing the fifth attack step.
Okutan et al. [
18] used signals unrelated to the target network in their Bayesian-network-based attack prediction process. The signals include mention of Twitter attacks or the total number of Hackmageddon attacks [
19]. As was shown in the results, the prediction accuracy differed from 63% to 99%, making it a promising method.
Since probabilistic graphical models are very powerful modeling and reasoning tools, Tabia et al. [
20] proposed an efficient approach based on Bayesian networks. It allows the modeling of local influence relationships. It is dedicated to two main problems in alert correlation. Firstly, an approach based on Bayesian multi-nets was designed, which considered the local influence relationships to improve the prediction. The second problem occurs when multiple intrusion detection systems are in use in the network. In this case, too many of the raised alerts are redundant. Therefore, they proposed an approach for handling IDSs’ reliability to reduce the number of false alerts. They based this approach on Pearl’s virtual evidence [
21].
Another widely used approach to predicting attacks is using Markov models. These methods were implemented along with approaches focused on attack graphs and Bayesian networks at the end of 2000. Farhadi et al. [
22] proposed a complex system for alert correlation and prediction. Sequential pattern mining was used to collect the attack scenarios, which were then represented using the hidden Markov model, which was used to identify the attack strategy. Markov models perform well in the presence of unobservable states and transitions. They are not reliant on the possession of complete knowledge. This allowed a successful attack prediction, even though some of the attack stages were undetected or absent.
Using hidden Markov models, Sendi et al. [
23] proposed a real-time intrusion prediction system. Multi-step attacks were the main interest in this paper. An empirical review showed how their method could anticipate multi-step attacks, which is especially useful in preventing the attacker from taking control of a huge number of hosts in the computer network.
In 2013, Shin et al. [
24] introduced a probabilistic approach for the network-based intrusion detection system APAN, which uses a Markov chain for modeling unusual events in the network traffic to predict intrusion. Unlike other Markov-based methods, this method detects network anomalies and does not aim to predict the next step of an attack as different model-checking approaches do.
Holgado et al. [
25] proposed a novel method based on a hidden Markov model for multi-step attack prediction using IDS alerts. They considered hidden states as a particular type of attack. At first, the preliminary training phase based on IDS alert information needs to be done. These observations are acquired by pairing the IDS alert information with a previously built database. Unsupervised and supervised methods for learning are performed in the training model. The prediction module can compute the best state sequence using the Viterbi and forward–background algorithms. The success of this method was shown in the successful detection of the distributed denial of service (DDoS) stages, which is a big problem in detection systems nowadays.
Table 2 shows the approaches in the cyber attack prediction methods. The first proposed method that has become popular involves prediction using an attack graph. It is the most transparent and easy-to-understand model for attack step representation. It has become beneficial in predicting the next steps in an attack. One of the lesser-known approaches is game theory. Nevertheless, it can be very useful in detecting DDoS attacks, which are very hard to predict. More commonly used methods include machine learning models. The first of them, the Bayesian network, has excellent accuracy results. However, it is tough to create this model from actual network traffic because the attackers can create loops in security alert data during attack implementation. Less intuitive approaches, but with great results, are the Markov chains and the hidden Markov model. These can be handy in predicting multi-step attacks.
On the other hand, Markov chains and the hidden Markov model need specific information. Due to the lack of information provided from the specific type of dataset, it is not possible to determine the values of the observation probability matrix. It is not certain what the probability of an attack is based on an observable alert. Therefore, we have decided to use a Bayesian network to create a method for cyber attack prediction.
6. Results and Discussion
In this section, the process of the evaluation of the individual methods will be presented. At first, the idea of this paper came from the research made by Ramaki et al. [
8]. However, several problems occurred when we tried to follow the steps they took in their approach. The issues within that paper are presented in this section, as well as the modified approach that was developed in order to avoid those problems.
The authors in the mentioned paper tried to create a Bayesian network to predict cyber attack steps. After preprocessing our data from Snort, a similar method of aggregation was used to reduce the number of alerts. Next, the algorithm for causal relationship discovery was evaluated. It was also inspired by the method in the mentioned paper.
At this point, in their proposed method, a problem arose. The occurrence of cycles in the resulting graph was not mentioned in the given paper. However, we succeeded in solving this problem because the two conditions were stated in the causal relationship discovery phase. The attacker can only move across the stages of the cyber attack.
The creation of these conditions was based on the presented model of attack steps. The first condition resulted in eliminating the double-sided edges between the vertices. For example, many vertices from the SCAN phase would be interconnected. The rule said that there should be no edge between two hyper-alerts belonging to the same phase. The second rule resulted in the overall elimination of cycles. The rule was defined so that an oriented edge can only go from an alert that is in a lower phase to an alert in the higher phase. As a result, there are no cycles or bidirectional edges in the resulting graph and the Bayesian network.
After applying the rules, a Bayesian network was created. The following sections display an example of usage of the methods presented in the previous section. The data from Thursday were used to introduce the results of the individual implemented methods. This approach aims to create a Bayesian network designed to predict cyber attacks. The data from this day are described in detail. After that, the aggregation, causal relationship discovery, and Bayesian network construction methods are shown, along with other auxiliary methods.
After the Bayesian network was constructed and written into a file, the last step of the algorithm followed. The Python library pgmpy was used to load the network from the file with BIF format. After that, the Bayesian inference could finally be calculated, and the probabilities of attack steps could be computed. Our Bayesian network tells us how likely it is that an alert from the final stages will occur if the occurrence of an alert from the first phase, i.e., the scanning phase, has been detected. The results are presented in
Table 9. In our case, there are two alerts in the Bayesian network belonging to the final phase. These two belong to phase number 4b, which symbolizes some malicious tasks.
The first of these alerts that we can see in the graph under ID - A3 is GPL NETBIOS SMB-DS IPC$ share access. This alert symbolizes the establishment of a connection using the samba protocol. It is meant to detect share access from outside the network. In this case, for example, it may be an EternalBlue infection. This exploit uses a vulnerability in Microsoft’s implementation of the SMB (Server Message Block) protocol for remote code execution. However, this type of network activity is often a false positive case. It can be a null session attack on samba functionality, which enables anonymous access to hidden administrative shares on a system. On the other hand, it may be legitimate network traffic; for example, traffic to a domain controller. This alert emerges if the rule defined in the Snort detection system is fulfilled.
The second alert that belongs to the final phase is NF - Bad TLD domain - click DNS query - Check domains. It can be recognized under ID - A24 in the graph. This means that the device has accessed a domain whose TLD is marked as malicious. A top-level domain can be lablled as bad when there were some indications that it was tied to spam or malware dissemination. Websites using the new top-level domains, such as .men, .work, or .click, are some of the riskiest. The rule that was made for this detection is presented next.
The mentioned alerts and all others emerged from the rules defined in Snort IDS. As was mentioned before, two sets of rules were used—NF and ET.
Next, the types of paths in the graph will be described based on a certain grouping of alerts. Based on the groups of alerts from the first phase according to the type of attack, paths were created in the graph. Their probabilities were calculated using the Bayesian network. Three groups of alerts belonging to the SCAN phase were created.
7. Conclusions
Prompt response to security incidents means minimizing the damage caused by the security incidents to the organization. The main goal of the organization is maximum preparation for handling security incidents, as well as their prevention through proactive activities. The transition from reactive activities to proactive activities is currently a challenge in cybersecurity research.
The attack—or the steps of the attackers—can be divided into several stages. Consequently, one of the ways to move from reactive activities to proactive activities is to identify the initial stages of the attack. To be able to make such an identification, it is necessary to predict the next steps of the attacker, which is called the projection of attacks in the current research.
Within this paper, we focused on the projection of attacks, and for this purpose, we defined four stages of attacks. The aim of the paper was the prediction of the final stages (the third and fourth stages), provided that we knew the first two stages of the attack. For this purpose, we chose Bayesian networks.
The aim was to design a model that, based on the projection of the attack, would identify the early stages of the attack. The proposed model includes not only the aggregation of alerts, but also their correlation. We used the proposed attack model for the prediction itself. As research has shown, the paper [
8] on which our research is based and was the inspiration for the use of the Bayesian network does not take into account the situation where an attacker can return within a particular stage. The situation creates a cyclic graph, which creates a problem, since the Bayesian network assumes an acyclic graph as the input. For this reason, it was necessary to add a condition to the model under which the attacker does not return within the individual stages.
We tested the proposed model on the publicly available Intrusion Detection Evaluation Dataset CICIDS2017. In the paper, we showed the real application of the proposed model on a prepared dataset, including the creation of alerts, their aggregation and correlation, and the subsequent detection of early stages based on the attack projection.
This research can be expanded in the future. There are several challenges for future work. One of them is the processing and prediction of events, even if there are cycles in the attack graph. This case is problematic because many computational models that take into account acyclic graphs cannot be used. Therefore, it could be appropriate to try another method. For example, using attack graph modeling or a hidden Markov model would be successful. The second challenge that this research presents is creating a complex and comprehensive dataset. Nowadays, to the best of our knowledge, no suitable datasets have been created that emphasize the attacks. Therefore, it is important to work with a dataset that would contain attacks that cover all detectable stages of an attack.