Attacker Behaviour Forecasting Using Methods of Intelligent Data Analysis: A Comparative Review and Prospects

: Early detection of the security incidents and correct forecasting of the attack development is the basis for the e ﬃ cient and timely response to cyber threats. The development of the attack depends on future steps available to the attackers, their goals, and their motivation—that is, the attacker “proﬁle” that deﬁnes the malefactor behaviour in the system. Usually, the “attacker proﬁle” is a set of attacker’s attributes—both inner such as motives and skills, and external such as existing ﬁnancial support and tools used. The deﬁnition of the attacker’s proﬁle allows determining the type of the malefactor and the complexity of the countermeasures, and may signiﬁcantly simplify the attacker attribution process when investigating security incidents. The goal of the paper is to analyze existing techniques of the attacker’s behaviour, the attacker’ proﬁle speciﬁcations, and their application for the forecasting of the attack future steps. The implemented analysis allowed outlining the main advantages and limitations of the approaches to attack forecasting and attacker’s proﬁle constructing, existing challenges, and prospects in the area. The approach for attack forecasting implementation is suggested that speciﬁes further research steps and is the basis for the development of an attacker behaviour forecasting technique.


Introduction
The attacker model plays an important role in the tasks of the attack modelling, forecasting, and risk analysis. Existing approaches consider different attacker's characteristics when modelling attacks. Some of them use high level goals of the malefactor [1]-hackers, spies, terrorists, corporate raiders, professional criminals, vandals, and voyeurs.
Others approaches analyze the location of the attacker-internal or external [2]-and the complexity of the vulnerabilities they exploit [3]-script kiddies, hackers, and botnet owners.
In [2], the classification of attackers based on several attributes is suggested. The analyzed parameters include the quantity of the malefactors, their motives, and their goals, which allows authors to define three types of attackers-individuals, organized groups, and intelligence agency.
Federal Service for Technical and Expert Control (FSTEC) of Russian Federation classifies attacker according to its skills and location in the system-internal attacker with low skills, internal attacker with medium skills, internal attacker with high skills, external attacker with low skills, external attacker with medium skills, and external attacker with high skills.

1.
Techniques based on attack graph analysis.
Techniques based on fuzzy inference. 4.
Techniques based on attributing cyber attacks using intelligent data mining techniques including neural networks, statistics, and so on.
However, it is possible to highlight several limitations of these techniques. One of them is the lack of a unified and validated approach to the attacker model description. According to these approaches, different attackers' attributes result in different attackers' profiles, and these approaches as a rule do not consider the latest paradigm shifts and novel attack vectors that appear owing to the development of the Internet of Things (IoT), cyber-physical systems, software defined networking (SDN), 5G mobile networks, and so on. Another significant problem in the attacker profiling process is the lack of consistent labeled datasets for model training.
There are currently a number of surveys in the area of attack forecasting and prediction. For example, in 2016, Gheyas and Abdallah [4] surveyed the detection and prediction of insider threats. In [5], the authors investigated the attack projection, prediction, and forecasting methods in cyber security. They distinguish between attack projection that relates to the next adversary steps [6]; attack intention recognition, which deals with detection of the final malefactor goal [7]; attack/intrusion prediction, which relates to the definition of which type of attack will take place, as well as when and where it will arise [8]; and, finally, network situation forecasting, which is connected with assessment of possible cyber security risks and their evolution. The authors outlined four different classes of the approaches based on the type of mathematical model used-discrete models (attack graphs, Bayesian networks, Markov Models), continuous models (time series analysis, Grey models), machine learning techniques and data mining techniques, and other approaches (similarity based, among others). They also focused on the problem of the source data used for predictions as different approaches operate on different levels of abstraction and require different types of data. They showed that the following types of input data can be used: (1) raw data, such as network traffic and system logs; and (2) abstract data, such as alerts from intrusion detection/protection systems and/or numerical representation of network security state. The authors discussed the advantages and limitations of each approach and showed the current status of each approach, that is, proof of concept or live tool. However, these surveys did not address the issues of the attacker profile definition or attacker attribution and its influence on the attack forecasting process.
Another interesting review of attacker models and profiles for cyber-physical systems (CPSs) is provided in [9]. The authors focused on the related work on the following: (1) attacks against CPS and ad-hoc attacker models, (2) profiling attackers for CPS, and (3) generic attacker models for CPS. They reviewed works that discuss attackers who target or leverage the physical layer in their attacks (mechanical, electrical interactions). The authors gave the main definitions concerning the attacker and attacker's profile. For example, they define an attacker as a person(s) aimed to achieve some malicious goal in the system, and an attacker profile as a template listing possible actions, motivations, or capabilities of the attacker. They note that an attacker model (together with compatible system models) should represent all possible interactions between the attacker and the system. Besides, they also include the constraints for the attacker model such as finite computational resources and no access to shared keys.
The authors reviewed 19 related works and came to the following conclusions: 1. Seven works explicitly use different attacker profiles, seventeen define dimensions, and the vast majority use actions to characterize the attacker. Just two works define a system model and perform risk analysis without explicitly considering an attacker model. This shows the trend of All the papers share the same actions or the same intuitions on the attackers, but they apply those actions to different definitions of attacker models.

3.
Different works propose different attacker profiles. The boundaries between the different attacker profiles are not well defined, thus it is hard to classify a specific attacker as one specific profile. The authors outline the following six types of attackers based on related research: (1) a basic user [10,11] (also known as script kiddie, unstructured hacker, hobbyist, or cracker) uses already established and potentially automated techniques to attack a system, and has average access to hardware, software, and Internet connectivity; (2) an insider [11][12][13][14] (disgruntled employees or social engineering victims) can cause damage to the target depending on the employment position or the system privileges he/she owns (e.g., user, supervisor, administrator)-this type is of high importance for systems that are mainly protected through air-gaps between the system network and the outside world (often used in CPS); (3) a hacktivist [10][11][12] aims to promote a political agenda, often related to freedom of information (e.g., Anonymous); (4) a terrorist [11][12][13], also known as cyber-terrorist, is a politically motivated attacker who uses information technology to cause severe disruption or widespread fear [15,16]; (5) a cybercriminal [10][11][12][13][14] (sometimes called black hat hacker or structured hacker) is an attacker with extensive security knowledge and skills, he/she takes advantage of known vulnerabilities, and potentially has the knowledge and intention of finding new zero-day vulnerabilities, his/her goals can range from blackmailing to espionage (industrial, foreign) or sabotage; (6) a nation-state [10][11][12][13] is an attacker sponsored by a nation/state, and his/her targets are usually public infrastructure systems, mass transit, power or water systems, and general intelligence.

4.
Finally, the authors outlined nine common parameters that are used to generate metrics. Examples of metrics are as follows: a. tools (resources) available, also known as attacklets, or actions in the abstract definition of the attacker model-these define which types of tools are available to the attacker; b.
camouflage or preference to stay hidden-expresses the aim and/or the ability of the attacker to not be tracked down after or while performing an attack; c.
distance to the CPS-an attacker can be located in another country, within WiFi range, or possibly have direct access to the system.
The authors also introduced the multilevel framework of metrics that is aimed to correlate low level events with high level events in order to determine the attacker profile. The limitation of the approach is that it does not establish techniques and methods linking low level events with high level events. For example, the financial support metric (which can take values of low, medium, or high) expresses what budget the attacker has in order to perform an attack. However, it is not clear how the budget can be calculated on the basis of the security events registered in the system.
To conclude, modern monitoring tools and data analysis systems give new possibilities in the area of the attacker's profile construction and prediction based on the traces that the attacker leaves in the system. We argue that an approach to attack forecasting that uses relations between features in the raw security related data, attacker attributes that represent his/her behaviour, and attack development is promising for timely and efficiently counteracting cyberattacks. In this paper we start with reviewing studies that take into account such relations as soon as it is not considered in detail in the aforementioned surveys. We analyze the latest research in this area, existing challenges, and possible solutions, and conclude with a general description of the approach that can be used for forecasting attacker's goals.
Thus, the main contribution of this paper is as follows: • Comparative analysis and classification of existing techniques for attackers' behaviour forecasting and used characteristics of attackers. • Existing challenges and solutions in the considered area. • A common approach to attack forecasting task implementation that specifies further research steps and is the basis for the development of an attacker behaviour forecasting technique.
The paper is structured as follows. The comparative analysis of the existing approaches to the attacker's profile specification, the characteristics used to describe the attacker's profile, and the attack forecasting using it are given in Section 2. Section 3 outlines existing challenges and solutions in the considered area. Besides, a common approach to attack forecasting implementation that specifies further research steps is given in Section 3, and is the basis for the development of the attacker behaviour forecasting technique. The paper ends with the conclusion and future work prospects.

The Comparative Analysis of the Approaches to the Attacker's Profile Specification and Attack Forecasting
The review of the existing approaches to the attacker's profile definition and attack forecasting showed that it is possible to highlight two general approaches: (1) the results of the attack prediction depend strongly on the attacker's model, and it is required to define the attacker's model explicitly; (2) the attack forecasting is based on data analysis without explicit attacker's model specification, and the attacker's behaviour is constructed implicitly on the basis of the sequence of the security events.
The second group of approaches consists of techniques that implement attack attribution using machine learning techniques including neural networks, statistics, and some others [38][39][40][41].
In the subsections below, these approaches are given more in detail. The summarized information on these techniques, their advantages, and their limitations is given in Table 1.
It should be noted that different researchers use not only different techniques to specify the attacker's profile, but different concepts and terms to describe attacker's behaviour, for example, "threat model", "attacker's profile", and "attacker's behaviour". high level abstract variables 6-9 profiles, e.g., No unified set of attributes to define attacker's profile. example: • source IP address

Attacker Behaviour Prediction Based on Attack Graphs
The construction and application of attack graphs for attack modeling and prediction is one of the most widely used approaches. First proposed in [17], this concept was developed in many other research papers [18][19][20][21][22][23][24][25][26][27][28]. In the general case, an attack graph is a set of linked nodes that represents the attacker's aims and actions. The construction of the attack graph is usually based on analysis of the network topology, vulnerability analysis, and software and hardware configuration analysis, and as the result, it shows dependencies between vulnerabilities and the overall security state of the target network.
In major cases, the attacker's model is defined via two important characteristics-his/her skills and location. For example, in the literature [19,21,27], these attributes are used to implement attack reachability analysis depending on the location (internal or external) and skills of the attacker (low, medium, or high). In fact, the level of the attacker's skills defines a list of vulnerabilities that could be exploited by the given attacker. In [27], the attacker's skills are correlated with meanings of "attacker skills" or "knowledge required" parameters of the attack patterns defined in Common Attack Pattern Enumeration and Classification (https://capec.mitre.org/) database and weaknesses from Common Weakness Enumeration (https://cwe.mitre.org/) database. This allows authors to link existing vulnerabilities to high-level malefactor activity such as "host discovery", "active operating system (OS) fingerprinting", and so on.
Wang et al. 2008 [26] assigned to each malefactor action a score that reflected the probability of its implementation. This score implicitly defines the attacker's skills, and in the approach, it was determined on the basis of the expert's knowledge regarding the vulnerability being exploited. Kheir et al. [20] enhanced the attack graph model by adding the service-dependency graph, which presents a network model for the relationships between users and services, showing how they perform their activities using the available services in order to increase the efficiency of the attack modeling.
In [25], the authors introduced the concept of the uncertainty-aware attack graph, which is used to handle the uncertainty of attack probability. This uncertainty appears owing to the measuring probability of vulnerability exploitation. In fact, it is difficult to find the precise probabilities for all attack graph nodes, and the authors suggest assigning the node probability in the form of interval values or constraints. However, both probability intervals and constraints are set by the experts. For example, the constrain may be described as follows [25]: "The probability of attack on workstation is greater than the probability of attack on webserver plus 0.05".
The experiments showed that the introduction of the uncertainty to the attack graph modeling and forecasting, on one hand, adds extra flexibility to the security administrator and may significantly reduce the attack graph, resulting in its better comprehensiveness. On the other hand, the definition of the probabilities and constraints is a complicated process and requires great expertise of the security administrator.
A set of European research projects devoted to the attacker's behaviour prediction as well as risk assessment utilized the approach based on analysis of the attack graphs, including TREsPASS (https://cordis.europa.eu/project/id/318003) (Technology-supported Risk Estimation by Predictive Assessment of Socio-technical Security) and MASSIF (MAnagement of Security information and events in Service InFrastructures) [47].
The TREsPASS project is interesting in that, when constructing an attack graph, the authors consider not only software exploits and configuration weaknesses, but also physical entities that could be used to gain access to the information resources. As the result, they developed the special attack navigator map tool, which allows uniting computer network entities and physical objects of the critical infrastructure, highlighting the fact that the attack may be implemented on both the networking level and the level of the physical objects. The forecasting of the malefactor actions considers the attacker's profiles presented in [48]. These profiles, known as threat agents, are based on eight attributes: intent, access, outcome, limits, resource, skill level, objective, and visibility. To summarize, it is possible to say that attack graphs show every possible path that an attacker can use to gain further privileges-the path to be selected is determined by the attacker's skills as well as goals and motivation. In the general case, the attack graph complexity is O(scn 2 ), for n machines in the attack graph, where s is the average number of exploits per machine and c is the average number of security conditions per machine. The survey of the graph-based techniques showed that the used attacker's model utilizes, in major cases, only two dimensions of the attacker's model-skills that could be defined explicitly or implicitly, and his/her location. Obviously, understanding the attacker's motivation and goal could significantly reduce the complexity of the attack graph and, as a result, increase the efficiency of the attack forecasting.

Attacker Behaviour Prediction Based on Hidden Markov Model
The Markov-based methods are very close to the attack tree models. In general, they are constructed on the basis of system states, and transitions between them, caused by events. Each transition is characterized by a probability that is independent of the past, and depends only on the two states involved-the behaviour of a process at a given point in time depends only on the state of the process at a previous point in time. The hidden Markov models (HMMs) for modeling normal behaviour to detect cyber attacks were first proposed in [29]. The authors used them to describe normal behaviour of the users as a sequence of the events and then applied them to detect insider threat. Since then, a significant amount of research has been done to enhance the HMM and its learning algorithm for detecting and predicting cyber attacks [30,31,33,34]. They vary in structure of HMM, used datasets, and particular tasks solved.
Thus, the authors managed to link different types of events in one model that is able to reveal trends in attack implementation and is able to detect abnormal attack sequences.
In [31], the authors applied a set of HMMs named as the fusion hidden Markov model. They construct k HMMs on k different low-correlated partitions of data and make a prediction using a nonlinear weight function. The latter is implemented by a neural network that is trained on the predictions of HMMs to the next state output. The application of k HMMs defines rather strict requirements to the HMMs; they have to be diverse and low correlated. To fulfill this requirement, the authors use a dissimilarity function to divide data into k different subsets, such that each subset contains a particular temporal pattern of the data. The input data are the real attack logs collected by the Cowrie honeypot [32], which is a medium-interaction SSH and telnet honeypot. The authors divided them into 19 groups corresponding to different activities, and these groups were modeled as states of the HMMs.
In [33], the continuous time Markov chain is used to make a prediction of the attack propagation.
It is clearly seen that this group of approaches does not use the attacker's model explicitly. The result of the prediction by the HMM strongly depends on the input dataset and the distribution of the events. The prediction of the attack goal is done on the basis of the most probable transition for the current system state, that is, the most frequently met sequence of the events. The skills of the attackers as well as motivation, available tools, and financial support are not considered.
In [34], the authors specify the attacker behaviour based on their goals, intention, and level of expertise, and outlined eight profiles of the attackers such as criminal groups, insiders, terrorists, hackers, phishers, nations, spyware/malware authors, and bot-net operators. However, the definition of the HMM presented in their approach did not consider the attacker's profile. The HMM is described as follows: λ = (A, B, π, N), where N corresponds to five different types of malicious behaviour (scanning, enumeration, access attempt, malware attempt, exploitation by denial of service), where π is the state probabilities, A is the transition probabilities, and B is the observation probabilities.
Interestingly, the authors used the attacker's profiles for generating different training sets containing five types of malicious behaviour.

Attacker Behaviour Pattern Discovery Using Fuzzy Inference
The benefits of the fuzzy logic approaches consist in their ability to operate with uncertainty. We consider several works devoted to the intrusion detection based on fuzzy logic [45,[49][50][51]. In major cases, fuzzy logic is applied to produce some averaged description of the parameters used to describe either normal or malicious activities. For example, in [50], the fuzzification process is applied to the metrics describing TCP service channel between two IP end-points-count, uniqueness, and variance. The authors defined five fuzzy sets for each metric: LOW, MEDIUM-LOW, MEDIUM, MEDIUM-HIGH, and HIGH, and defined the fuzzy set distributions using historical data. The authors applied fuzzy rules constructed as a combination of these parameters to determine the type of malicious activity, such as port scanning. In [51], fuzzy rules are constructed based on the results obtained by association rule mining. In [45], the authors applied leader-based k-means clustering to preprocess data before application of the fuzzification process. Thus, the existing approaches differ in preprocessing steps and data attributes to construct fuzzy rules for classifying the types of the malicious activities.
In [37], the authors solve the problem of constructing profiles of the normal user behaviour based on the analysis of the log events such as keyboard keys' sequences, characteristic data sequences retrieved from pointing device, chosen options, requested network resources, and so on. They apply fuzzy logic to the qualitative attributes of these events to describe a set of fuzzy profiles and identify masqueraded attacks.
In [46], an approach to combining attack graphs and fuzzy logic to predict attacker's behaviour was suggested. The attack graph is constructed in a traditional manner as a sequence of possible malefactor steps. Four parameters characterizing the attacker are assigned to each step: "the required knowledge for performing the attack action; (ii) the required access for conducting the attack action (the attack step may need physical access or it can be performed remotely); (iii) the required user interaction level for successful preformation of the attack (such as social engineering attacks against employees or the attacks targeting human-machine interface operators); and (iv) the required skill for conducting the attack" [46]. These parameters take the following values: low importance, moderate importance, importance, high importance, and very high importance. The fuzzy sets are described by triangular function. The complexity of the attack step depends on the values of these four variables. Apart of the assessment of the complexity of each attack step, the authors rate the alternatives existing for each attack step. This rating reflects the attractiveness of each step for the attacker and is evaluated on the basis of the expert's assessments. It is also a fuzzy variable that takes the following values: very low, moderate, high, and very high. To make a prediction of the attack deployment, the authors apply the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) approach, which is a multi-criteria decision making method suggested for fuzzy environment [52]. It allows the analyst to compare alternatives described by fuzzy variables. The general scheme of the approach is given in Figure 1 [46]. each attack step. This rating reflects the attractiveness of each step for the attacker and is evaluated on the basis of the expert's assessments. It is also a fuzzy variable that takes the following values: very low, moderate, high, and very high. To make a prediction of the attack deployment, the authors apply the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) approach, which is a multi-criteria decision making method suggested for fuzzy environment [52]. It allows the analyst to compare alternatives described by fuzzy variables. The general scheme of the approach is given in Figure 1 [46]. Pricop and Mihalache [35] apply a fuzzy approach to model the impact of cyber attacks. Like in [46], the authors describe the attacker's profile as a combination of the following three parameters: knowledge, technical resources, and motivation-that is, a function of three inputs and one output. They define six types of attackers, as follows: script kiddie, hacker, disgruntled employee, terrorists, industrial spy, and cyber warrior. The script kiddie is an inexperienced and unskilled attacker that uses known exploits, and whose motivation is usually to get reputation, while the cyber warrior has the highest levels of knowledge, resources, and motivation. The cyber warrior is the most dangerous type of attacker, targeting the critical infrastructure.
The variables describing the attacker profile are linguistic variables that take values from fuzzy sets-very small, small, medium, big, and very big-which are presented by triangular curves. The highest score is assigned to the industrial spy; the cyber warrior, terrorist, and disgruntled employee have a medium score; the hacker's score is small; and the script kiddie has a very small score.
The attacker's profile, that is, the score [35], is used then to estimate the attack success rate. The impact of the attack is also a fuzzy function of four linguistic variables: the attacker profile (score), protection level, vulnerabilities, and restore cost. In the approach, these variables are described by a membership function of triangular form, defined for three fuzzy values-small, medium, and big. The attack success rate allows the analyst to understand how these parameters influence the overall security state of the information system.
In [36], the authors try to link attack steps to produce the attacker's profile. They developed a fuzzy inference system that takes as input the following linguistic variables: scanning/reconnaissance, enumeration, exploit by access attempt, exploit by denial of service, exploit by malware attempt, and output the attacker category. The possible attacker's categories are as follows: сriminals, insiders, terrorists, hackers, phishers, nations, spyware/malware authors, bot-net operators, and amateur/script kids.
The linguistic variables used to determine the attacker's category may take the following fuzzy values: none, low, and high, which are described by a triangular form.  Pricop and Mihalache [35] apply a fuzzy approach to model the impact of cyber attacks. Like in [46], the authors describe the attacker's profile as a combination of the following three parameters: knowledge, technical resources, and motivation-that is, a function of three inputs and one output. They define six types of attackers, as follows: script kiddie, hacker, disgruntled employee, terrorists, industrial spy, and cyber warrior. The script kiddie is an inexperienced and unskilled attacker that uses known exploits, and whose motivation is usually to get reputation, while the cyber warrior has the highest levels of knowledge, resources, and motivation. The cyber warrior is the most dangerous type of attacker, targeting the critical infrastructure.
The variables describing the attacker profile are linguistic variables that take values from fuzzy sets-very small, small, medium, big, and very big-which are presented by triangular curves. The highest score is assigned to the industrial spy; the cyber warrior, terrorist, and disgruntled employee have a medium score; the hacker's score is small; and the script kiddie has a very small score.
The attacker's profile, that is, the score [35], is used then to estimate the attack success rate. The impact of the attack is also a fuzzy function of four linguistic variables: the attacker profile (score), protection level, vulnerabilities, and restore cost. In the approach, these variables are described by a membership function of triangular form, defined for three fuzzy values-small, medium, and big. The attack success rate allows the analyst to understand how these parameters influence the overall security state of the information system.
In [36], the authors try to link attack steps to produce the attacker's profile. They developed a fuzzy inference system that takes as input the following linguistic variables: scanning/reconnaissance, enumeration, exploit by access attempt, exploit by denial of service, exploit by malware attempt, and output the attacker category. The possible attacker's categories are as follows: сriminals, insiders, terrorists, hackers, phishers, nations, spyware/malware authors, bot-net operators, and amateur/script kids.
The linguistic variables used to determine the attacker's category may take the following fuzzy values: none, low, and high, which are described by a triangular form.
Thus, it is possible to conclude that there are two broad groups of approaches based on fuzzy logic to predict the attacker's behaviour.
The first group of techniques uses fuzzy inference to detect the type of the malicious activity, and the fuzzy rules describe generalized (fuzzy) dependencies between security event attributes. It is worth noticing that, in major cases, the authors apply fuzzy inference to detect attacks that have rather specific characteristics, such DoS attacks and port scanning. It could be explained that the most widely used data sets are NSL-KDD CUP 1999 and CAIDA UCSD "DDoS Attack 2007". These datasets do not contain complicated long-term attacks. They also do not consider attacks targeting IoT-based infrastructures, cyber-physical systems, "smart" homes, and so on.
The second group of techniques mostly focuses on risk assessment and uses the attacker's profile explicitly as an input variable that defines the success rate of the attack. The advantage of the application of fuzzy logic is the ability to describe such fuzzy parameters as the motivation or knowledge of the malefactor. However, the major limitation of this group is the inability to link low level events to the attributes used to characterize the malefactor profile. The possible solution is to implement consequent mapping of low level events to middle level activities, and then determine the high level attributes of the attacker such as skills, resources, and motivation.

Attributing Cyber Attacks
In [38], the concept of attack attributing is used, that is, the determination of attack author, based on behavioural indicators. Behavioural indicators are combinations of actions and other indicators of malicious activity. These indicators can be atomic and computed. Atomic indicators are discrete pieces of data that cannot be broken down into their components without losing their forensic value. Atomic indicators include IP addresses, email addresses, domain names, and small pieces of text. Computed indicators are similarly discrete pieces of data, but they involve some element of computation. An example is a 'hash', a unique signature derived from input data, for instance, a password or a program. Hashes of programs running on their network's computers may match hashes of programs known to be malicious.
In some cases, behavioural indicators point to a specific adversary who has employed similar behaviours in the past. It might be repeated social engineering attempts of a specific style via email against low-level employees to gain a foothold in the network, followed by unauthorized remote desktop connections to other computers on the network delivering specific malware.
The authors outline that, though details are critical for attacker attributing, they will be useful only in the case of correct synthesis of information flows from the technical to the operational and strategic layers.
In [39], the authors build a cyber attacker model profile (CAMP) that can be used to characterize and predict cyber attacks. The authors define two types of variables used-dependable and independent. They denote the frequency and distribution of attacks as well as money earned from cybercrime as dependable variables (DVs), while unemployment rate, level of education, and corruption are independent variables. The authors constructed the attack prediction model linking both types of variables and showed how much variation in the DVs they can explain for given values of independent variables.
In [40], the attribution of honeypot data is considered. The authors define an attacker via a unique tuple (source IP address, operating system, user-agent (protocol), {cookies}). They assumed that the knowledge of the operating system, user agent, and set of cookies allows more accurate classification than the source IP address only. Honeypot data (HD) are used to calculate skill, resources, motivation, and intention. Further, they integrate skill (S) and resources (R) into the capability rating, and motivation (M) and intention (I) into the threat rating. Their combination is used to calculate the total threat score. S, R, M, and I are determined by weighted accumulation of all affecting features f i : where n is the total number of features f i ; a i is the weight for the i-th feature, n i=1 a i = 1. The features f i are derived from the considered observed HD features v i and get values of 0, . . . , γ ∈ Q. The maximal value of S, R, M, and I is γ. The dimension and boundaries for v i vary between the parameter and sensor resolution. The part of sample feature set provided by the authors is represented in Table 2. Table 2. Part of sample feature set for attackers' classification suggested in [40]. Port  IP address  User  agent  URL  Domain E-mail  User  ID  OS  Inter-Arrival Time   Protocol  Service  Autonomous  system  Country  Standard  deviation  Average For example, to calculate R, the assumption can be made that fast inter-arrival times are related to a higher degree of automation (higher attackers' resources). The motivation attribute can be estimated by the time and effort an attacker invests into a particular attack. Quantifying an attacker's intention is the most complex task. The authors define intention as the degree or potential of attacker's maliciousness.
The values for different classes are calculated using V ∈ {S,R,I,M}, which are ordered as V ci < V cj . . . < V cn , ∀ c ∈ C, and then transformed to 0, . . . , γ ∈ Q, by assigning 1 to the first class and iterating over all classes while incrementing the value by 1 for each less-than operator. Then, all values are normalized with γ = 10. In Figure 2, the heat map proposed by the authors to represent attackers' classes is provided (the capitalized abbreviation marks the appropriate class). For example, to calculate R, the assumption can be made that fast inter-arrival times are related to a higher degree of automation (higher attackers' resources). The motivation attribute can be estimated by the time and effort an attacker invests into a particular attack. Quantifying an attacker's intention is the most complex task. The authors define intention as the degree or potential of attacker's maliciousness.
The values for different classes are calculated using V∊{S,R,I,M}, which are ordered as < … < , ∀ ∊ , and then transformed to {0, … , } ∈ ℚ, by assigning 1 to the first class and iterating over all classes while incrementing the value by 1 for each less-than operator. Then, all values are normalized with γ = 10. In Figure 2, the heat map proposed by the authors to represent attackers' classes is provided (the capitalized abbreviation marks the appropriate class).  Figure 2. Attackers' classes [40]. Figure 2. Attackers' classes [40].
Though the method is introduced for IT-Security in Industry 4.0, nonetheless, the specific features of CPS are not considered.
In [41], the authors propose the method for predicting the behaviour of cyberattacks using recurrent neural networks (RNNs). They use the dataset obtained from the 2017 Collegiate Penetration Testing Competition (CPTC) to obtain long-short-term-memory (LSTM) models. The dataset includes Suricata alerts obtained, while ten student teams attempted to penetrate the virtualized network and exploit vulnerabilities. The authors trained two sets of models: the first set determines the team that caused the alert, and the second predicts the second alert. The used features are as follows: destination port, alert signature, alert category, alert severity, proto, source port, and host. The authors achieved accuracy of 55% for teams classification and 80% for the next alerts prediction.
Finally, while the last works on the attacker behaviour forecasting using machine learning make attempts to overcome the challenge of linking raw data with valuable attacker metrics, the feature set is still not specified, the set of metrics that forms the attacker profile is not unified, the techniques of metrics calculation on the basis of the extracted features should be enhanced, and the training dataset problem still exists.

Challenges, Possible Solutions, and Common Approaches
The analysis conducted allowed us to conclude on the main challenges existing in attack goal forecasting to this moment:

1.
Lack of uniformity in the classification of attackers, distinguished metrics, and attributes, as well as the definition of the same classes and metrics. 2.
The gap between the raw data (such as network traffic and events logs), attacker profile, and forecasting of the attacker behaviour, as well as the methods for the determination of relationships between them. 3.
Lack of datasets suitable for research of the relationships between attacker steps and his/her goals. 4.
Absence of the research that demonstrates if there is an influence of the attacker profiling and attributing on the attack forecasting.
The lack of uniformity indicates insufficient elaboration of the problem under research. Besides, it prevents efficient countermeasure selection for different classes of attackers, understanding of current research state, comparative quantitative analysis of the various developed techniques, and elaboration of the existing results. An attempt to overcome this challenge was made in [9]. However, the authors do not describe how to link low level events with high level events, that is, they did not proposed a solution to the second problem.
Considering the second challenge, an attempt to link low level events with security metrics was made in the Structured Threat Information Expression (STIX) project [53]. Structured Threat Information Expression (STIX) is a structured language for specification of various threats and automated analysis. The idea behind the development of this language is to link low level events with high level concepts. The following components of the language are specified [54]: observables; indicators (observation patterns and their meanings); incidents (attack actions instances); adversary tactics, techniques, and procedures (methods that are used by an attacker, including attack patterns, malware, exploits, and so on); exploit targets (e.g., vulnerabilities, weaknesses, and configurations); courses of action (response actions to prevent an attack); campaigns (sets of incidents and/or TTPs with a single goal); threat actors (attacker identification); and reports.
For each component, the set of properties is specified. For example, for threat actors, the following properties are used: name, description, aliases, roles, goals, sophistication, resource_level, primary_motivation, secondary_motivations, and personal_motivations.
All properties are of a nominal type (i.e., values are selected from a list). Thus, for threat actor labels, the possible values are as follows: activist, competitor, crime-syndicate, criminal, hacker, insider-accidental, insider-disgruntled, nation-state, sensationalist, spy, and terrorist. While for the threat actor sophistication (captures the skill level of a threat actor; ranges from "none", which describes a complete novice, to "strategic", which describes an attacker who is able to influence supply chains to introduce vulnerabilities), the values are as follows: none, minimal, intermediate, advanced, expert, innovator, and strategic. In this project, however, how to determine the values of these properties automatically from the raw data is not also described. It should be actively used by the security companies in order to reveal and then automate the process of linking low events and high level attack concepts; however, there is not much activity in this field.
The second challenge is connected with the third challenge, that is, the absence of datasets for analysis aimed at revealing existing interrelations and features characterizing attackers and their goals. The following approaches are used to overcome it: • Use existing datasets with specific attacks' data.

•
Use honeypots to generate real data.

•
Use normal data and add data on attacks intentionally (use attack generators).
The first approach is used for the detection of specific types of attacks based on training using the datasets. However, the most used datasets are deprecated and do not represent the last trends in attacks or paradigm of the modern information systems.
According to the second approach, in [30], the authors used the honeypot technology. The detailed description of attack features logged and dataset description, when using the honeypot technology, is provided in [55]. The analysis is based on the following assumption: the data are grouped by session ID for considering that the attacker attempts to implement some malicious scenario in one session, that is, different session IDs are independent of individual attacker characteristics. This allows the authors to group event sequences to create a training sample by sessions. However, this approach does not consider an opportunity to use several sessions to implement a complex multistep attack by a single attacker.
The researchers usually create their own datasets and use them [56]. Unfortunately, however, in most of these approaches, all these data are not annotated by attackers, that is, their skills, knowledge, and other characteristics that form their profiles. In fact, datasets contain only attacks of different types, and there are no labeled datasets characterizing attackers' skills. This is explained by the fact that the techniques used to detect attacks analyze the event sequences, their frequencies, and attributes. Until there is no research proving that the application of the attacker attribution may enhance the efficiency of the attack detection, there will be no datasets linking raw security events with attacker's profile concepts such as attacker motivation, goal, and so on. However, having such a dataset maybe extremely useful in detecting targeted and distributed in time attacks. Unfortunately, the absence of datasets is a common problem that can be solved with their targeted generation.
Thus, we argue that there is a need in the research that demonstrates if there is an influence of the attacker profiling and attributing on the attack forecasting. Thus, the fourth challenge is one of our future research directions. However, it is necessary to overcome the first three challenges first. In particular, we are planning start with the generation of the specific dataset. We consider that the approach presented in [57] is the promising one to generate datasets for attack attribution. It is based on mixed traffic generation, including attacks and normal traffic.
To conclude, we propose the following approach to the attacker behaviour forecasting: 1.
First of all, we suppose to outline possible raw data sources. There are two types of sources: structured data and unspecified data. In [58], we outlined the following open sources of structured data considering objects of information security assessments: vulnerability databases, attack patterns databases, weaknesses databases, software and hardware databases, and so on. For accurate attack forecasting in real time, it is required to add another type of source data, network traffic, and event logs (which is unspecified). From the analyzed events datasets, the most interesting is the one provided in [56]. The dataset should contain data on various attacks with different goals implemented by attackers of different classes. From our point of view, the most complete classification from those reviewed was proposed in [40]. It incorporates the following classes: guest, external employee, internal employee, activists, state-sponsored, ethical hacker, criminals, cracker, and hobby hacker.

2.
Extract features from the events dataset that can characterize different classes of attackers with different goals. While there are rather detailed sets of features from the network traffic (such as source IP address, operating system, user-agent (protocol), and {cookies} in [40]), the events features should be researched in more detail. In [41], the following set is proposed: destination port, alert signature, alert category, alert severity, proto, source port, and host. We can use this as the basis for future research.

3.
Then, we suppose to outline and classify high level metrics that form the attacker profile, on the basis of the following metrics, proposed in [59]: attacker skill level, attacker knowledge, tools complexity, attack steps complexity, steps success rate, trace coverage rate, and so on.

4.
Then, we propose to find out structural and semantic relations between data sources, objects of the attacker behaviour forecasting subject area, and metrics (from features extracted from the raw data to high level metrics of attackers and attacks). To implement these, we plan to extend an ontology provided in [59] and determine transitional metrics.

5.
Then, we propose to use the outlined characteristics and relationships to do the following: a. develop algorithms for metrics calculation; b.
train a neuro-fuzzy network for attackers' behaviour forecasting.
We state that steps 1-4 are the necessary basis for step 5, while overcoming challenges 1-4 is the basis for the successful implementation of our research task.
Thus, at this stage, we developed the common approach to forecasting attacker goals and considered the future work scope on the basis of comparative analysis of the related research and existing challenges in the area.

Conclusions
In the paper, we reviewed the research in the area of attacker behaviour forecasting. Compared with the close survey described in [4], our research is focused on issues of the attacker profile definition or attacker attribution and its influence on the attack forecasting process. In [9], an interesting study that highlights main challenges in the area of attacker behaviour forecasting is provided and the multilevel system of metrics is introduced. Our goal in this research, however, is to determine how to link low level events with high level events. Besides, compared with the aforementioned papers, the main goal of the research outlined in the paper is the novel approach development. Though the main goal of the research outlined in the paper is not devoted only to the state-of-the-art, it is necessary for novel approach development. In the scope of our research, we considered four classes of approaches to the attacker behaviour forecasting, including attack graph based approach, HMM, fuzzy inference, and approaches based on intelligent data processing. The analysis shows that there is a lack of formalization and systematic representation of the attacker profile and of the definition of his/her characteristics that can be used for his/her specification. From our point of view, the most promising are approaches based on intelligent data analysis, as soon as they allow linking raw data and metrics describing an attacker.
The conducted analysis allowed us to outline the key challenges in the area. On the basis of these challenges and our task, we have selected the approach to the task implementation. The proposed approach specifies our further research steps and is the basis for the technique of attacker behaviour and goals forecasting under development.
The approach incorporates the following steps: (1) outline raw data sources, both structured and unspecified; (2) extract features from the events dataset that characterize different classes of attackers with different goals; (3) outline and classify high level metrics that form attacker profile; (4) find out structural and semantic relations between data sources, objects of the attacker behaviour forecasting subject area, and metrics (from features extracted from the raw data to high level metrics of attackers and attacks); and (5) use the outlined characteristics and relationships to develop algorithms for metrics calculation, and to train neuro-fuzzy network for attackers' behaviour forecasting. Compared with the other approaches, summarized in this paper, our approach is focused on the accurate determination of relations among raw data and attacker behaviour characteristics. Each step of the proposed approach will be discussed in detail in the following research. Moreover, in the scope of our future research, we will analyze if there is the influence of the attacker profiling and attributing on the attack forecasting.