Online Intrusion Scenario Discovery and Prediction Based on Hierarchical Temporal Memory (HTM)

: With the development of intrusion detection, a number of the intelligence algorithms (e.g., artiﬁcial neural networks) are introduced to enhance the performance of the intrusion detection systems. However, many intelligence algorithms should be trained before being used, and retrained regularly, which is not applicable for continuous online learning and analyzing. In this paper, a new online intrusion scenario discovery framework is proposed and the intelligence algorithm HTM (Hierarchical Temporal Memory) is employed to improve the performance of the online learning ability of the system. The proposed framework can discover and model intrusion scenarios, and the constructed model keeps evolving with the variance of the data. Additionally, a series of data preprocessing methods are introduced to enhance its adaptability to the noisy and twisted data. The experimental results show that the framework is e ﬀ ective in intrusion scenario discovery, and the discovered scenario is more concise and accurate than our previous work.


Introduction
Since the beginning of this century, cyberattacks have threatened users and organizations. They have become more complex and sophisticated with the development of computer networks. Nowadays, multistep attacks mostly aim at stealing important data from organizations and leads to serious data security problems.
A multistep attack, also called a multistage attack, intrusion scenario, refers to the ensemble of steps taken by one or several attackers with a single specific objective inside the network, containing at least two distinct actions [1]. The multistep attack consists of multiple single-step attacks which can be detected by the IDS (intrusion detection system), however, the multistep attack is not easy to discover, having some difficulties: (1) Although the IDS is efficient in detecting the single-step attack, it can hardly reveal the logical relations among these single-step attacks. (2) The real attack is a small probability event, and the most dangerous attacks happen only rarely, which mean that in each dataset we have only a few examples of attacks. This problem in intrusion detection research, called the "rare data problem" [2], seems even worse when multistep attacks are considered.
(3) IDS sensors could miss some steps of a multistep attack because these steps may appear normal to them [3]. Thus, we cannot assume that the important alerts related to the real attack will always be there. (4) High data redundancy and volumes will slow down the analysis process, which is a significant challenge for the development of systems working in real time.
In this paper, an intrusion scenario discovery framework is proposed to tackle these difficulties by data preprocessing and online correlation analysis. The proposed framework was designed based on the following considerations: First, the framework is a totally online system and has no offline mining or learning part; it is different from other multistep attack detection systems [4,5] that need offline processing. The HTM algorithm can learn the association patterns of intrusion actions in real time and the learned patterns will evolve with the incoming data over time, which is a good characteristic for online learning because it provides a continuous learning mechanism for our framework. However, the "short-term memory" has its negative influence on the prediction accuracy, so, an auxiliary method is proposed to mitigate the influence.
Second, the framework is designed based on the processing of streaming data and does not limit the boundary or the quantity of data, which means the model should be constructed cumulatively.
Third, the framework needs to predict the intrusion at an early stage without waiting for the model to be completely built. Given any intrusion action, the proposed framework will output the predicted action sequence that may happen with a higher probability in the future. This is the first time to our best knowledge that the HTM algorithm is combined with intrusion scenario discovery. HTM has some attractive features, such as strong noise resistance, continuous learning without retraining, and an unsupervised prediction-driven learning mechanism. The proposed framework can benefit from these features.
In this section, four categories of existing approaches for constructing the intrusion scenarios are discussed and, after that, some HTM use cases are introduced.
(1) Some early prediction models based on prior knowledge are created manually by security experts.
The domain knowledge, such as known cyber threats or intrusion scenarios, are leveraged in the modeling process. These models can highly represent real intrusion scenarios, and provide a high accuracy of prediction with no ambiguity, which is the reason why they are popular in the industry. However, it is heavy work for the experts to keep the models always available for attack strategies that are changing all the time. Early studies used the approach proposed by Qin and Lee [6], which can identify attack plans based on isolated attack scenarios, but appreciable adjustments should be made to predict the attacks that are beyond the predefined attack plans. (2) Two types of graph models, HMM (hidden Markov models) and Bayesian networks, are commonly used in intrusion predictions. Essentially, they are attack graphs that can be constructed automatically. Statistical methods are utilized to obtain the transition probabilities or probability distributions based on which the prediction can be made. Ramaki et al. [7] proposed an attack prediction model based on the previous work [4], which is a real-time IDS alert correlation framework. In their model, Bayes networks and a Bayes attack graph (BAG) are employed to increase the accuracy of the prediction. Unlike the models based on Bayes networks, the HMM-based methods are insensitive to the missed states and transitions, which means it can work well without complete information of the intrusion scenarios [8].
Farhadi et al. [9] proposed the alert correlation and prediction framework utilizing the HMM model as the prediction component, which does not require the information about the network topology, system vulnerabilities, and system configurations, and is robust against over-fitting. However, the model cannot predict patterns of unknown attacks. Additionally, both of the models are sensitive to outliers and the lack of scalability for the various intrusion scenarios.
(3) Data mining is another prevalent technology for discovering correlations within the massive amount of security data before constructing the intrusion prediction models. Husák et al. [10] proposed an attack prediction framework based on data mining to identify the attack patterns from a million alerts. Sequential rule mining is employed to derive rules for prediction. Kim et al. [11] proposed an approach to create an attack graph based on data mining techniques for intrusion prediction. Although data mining can discover unknown intrusion patterns, redundant, small, frequent items would affect the accuracy of the prediction. The most important is that the data mining models cannot discover the critical data items with lower frequency. (4) Since the ANN (artificial neural network)-related approaches are proposed to meet the intrusion prediction, many studies show that neural networks have a better result in predicting attacks than others. However, neural networks need to be trained offline and retrained in a regular time interval to meet the constantly changing data. Additionally, feature selection and dimensional reduction are two essential and complex tasks to conduct, and the parameters should be learned from the data. In other words, neural networks are heavy models and not optimal for online prediction in real-time. Subba et al. [12] proposed an intrusion detection model which can reduce the computational resources in the training phase and is suitable for real-time deployment. As described above, it needs feature selection and daunting parameter tuning to gain better performance.
The HTM (Hierarchical Temporal Memory) algorithm is a new type of intelligent learning system inspired by the structures and mechanism of data processing of the mammalian neocortex. Although the HTM is still evolving with the latest discoveries in neurobiology, many applications of it are proposed for its outstanding characteristics. The existing use cases based on HTM theory are discussed in detail.
Cui et al. [13] analyzed the properties of HTM sequence memory in the sequence learning and prediction tasks and compared the HTM sequence memory with other sequence learning algorithms. The model has better performance in continuous online learning, robustness to noise, fault tolerance, and no parameter tunings.
El-Ganainy et al. [14] proposed a method to predict medical streams in real-time; the paper employs two machine learning algorithms: HTM (Hierarchical Temporal Memory) and LSTM (Long Short-Term Memory) to model the same medical data stream, and the experimental results show that the HTM algorithm is more effective than the LSTM algorithm in three aspects: no training process, no domain knowledge, and the ability to adapt to variations in the data.
Li et al. [15] proposed a framework to detect the anomaly in the stream of air traffic data broadcasted between a flight and the air traffic control center. The HTM is used to encode the data and learns the patterns from the sequential data for the anomaly prediction, and then conducts the deviation analysis to identify the attack based on the former predictions.
Wang et al. [16] proposed a distributed vehicle network anomaly detection framework with the abilities of continuous online learning and exception scoring. The framework can learn the network data sequence and detect the abnormal states of the vehicle networks.
Shah et al. [17] proposed a framework to categorize documents using the SP (Spatial Pooling) algorithm of the HTM theory, and LSI (Latent Semantic Indexing) technology is used for extracting the top features from the input and converting them into binary format. The assessment shows that the HTM model is accurate compared to the conventional technologies used in text classification.
Shen et al. [18] have proposed a gait pattern recognition approach using the extended HTM algorithm. In the paper, the authors decomposed the gait image into sequences of movements of each body parts, and learns the features through three levels of neurons. A Markov chain is used to discover the feature groups in which the mutual transition probabilities are above the threshold, then the HTM algorithm, which is extended with a dynamic programming algorithm, is used to infer the complex sequences. The HTM algorithm proves the effectiveness of visual problems.
The main contributions of this paper are: (1) to introduce a series of data preprocessing methods to refine the streaming data; (2) the HTM algorithm is introduced to enhance the ability of online learning; and (3) an auxiliary method is proposed to improve the accuracy of intrusion scenario discovery.

Materials and Methods
In this section, the basic concepts and the theory of the HTM are introduced, and then the intrusion action clustering and the session reconstruction of the proposed intrusion scenario discovery framework is described in detail.

Preliminaries of HTM
The formal name of the HTM is HTM cortical learning algorithms, which is a machine learning technology that aims to capture the structural and algorithmic properties of the neocortex. The HTM models neurons (cells), which are arranged in columns, in layers, in regions, and in a hierarchy [19]. The HTM learns and memorizes the patterns by building and destroying the synapse connections around the neuron bodies. The HTM theory and basic concepts are explained in this section.
The software implementation of the HTM algorithm was published in 2011, which may be the real birth of the HTM. The theory of the HTM is developing quickly and the related applications are increasing every year for its distinct properties in online learning and prediction, especially in anomaly detection. More applications are being explored by academics and industry.
The HTM model is a multi-layer structure that each layer contains a number of columns and each column possesses several neuron cells. The model extracts and represents the features of the outside world by checking the activated column locations within the whole layer. The process of converting the outside data into the inner representation (sparse distributed representations, SDR for short) is called SP (spatial pooling), which is the first phase of the whole learning process. The column can be activated as long as one of the cells composing the column is activated. Under two situations, the neuron cells will get fired into an active state. First, the specified cell is activated in the bursting phase. Second, if the cell is in the predictive state, it will be active in the next time step (for more details, please see [19]). The structure of the HTM is shown in Figure 1.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 4 of 17 In this section, the basic concepts and the theory of the HTM are introduced, and then the intrusion action clustering and the session reconstruction of the proposed intrusion scenario discovery framework is described in detail.

Preliminaries of HTM
The formal name of the HTM is HTM cortical learning algorithms, which is a machine learning technology that aims to capture the structural and algorithmic properties of the neocortex. The HTM models neurons (cells), which are arranged in columns, in layers, in regions, and in a hierarchy [19]. The HTM learns and memorizes the patterns by building and destroying the synapse connections around the neuron bodies. The HTM theory and basic concepts are explained in this section.
The software implementation of the HTM algorithm was published in 2011, which may be the real birth of the HTM. The theory of the HTM is developing quickly and the related applications are increasing every year for its distinct properties in online learning and prediction, especially in anomaly detection. More applications are being explored by academics and industry.
The HTM model is a multi-layer structure that each layer contains a number of columns and each column possesses several neuron cells. The model extracts and represents the features of the outside world by checking the activated column locations within the whole layer. The process of converting the outside data into the inner representation (sparse distributed representations, SDR for short) is called SP (spatial pooling), which is the first phase of the whole learning process. The column can be activated as long as one of the cells composing the column is activated. Under two situations, the neuron cells will get fired into an active state. First, the specified cell is activated in the bursting phase. Second, if the cell is in the predictive state, it will be active in the next time step (for more details, please see [19]). The structure of the HTM is shown in Figure 1. Each layer not only represents the different data features but also accumulates the feature transitions over time through constructing or destroying the connections among cells of different columns in the same layer. The permanence value of the specified connection is increased when the connected two cells are successively activated in two adjacent time steps, otherwise, it decreased. This is the second process of HTM called TM (temporal memory), which can store and predict the most probable features based on strong connections. Each lower layer in the hierarchy is the input representation of the higher layer, as a result, the highly abstracted patterns of the original data are obtained by the top layer.
In a word, the learning process contains two phases: the SP (spatial pooling) and TM (Temporal memory). In the SP phase, the original data are first converted into bit array and then abstracted into an inner representation called SDR. In the TM phase, the temporal correlation of SDRs are Each layer not only represents the different data features but also accumulates the feature transitions over time through constructing or destroying the connections among cells of different columns in the same layer. The permanence value of the specified connection is increased when the connected two cells are successively activated in two adjacent time steps, otherwise, it decreased. This is the second process of HTM called TM (temporal memory), which can store and predict the most probable features based on strong connections. Each lower layer in the hierarchy is the input representation of the higher layer, as a result, the highly abstracted patterns of the original data are obtained by the top layer.
In a word, the learning process contains two phases: the SP (spatial pooling) and TM (Temporal memory). In the SP phase, the original data are first converted into bit array and then abstracted into an inner representation called SDR. In the TM phase, the temporal correlation of SDRs are accumulated over time. Finally, the SDRs that are predicted by TM are classified by the classifier in order to map the inner representation to the outer presentation. The learning phases are illustrated in Figure 2.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17 order to map the inner representation to the outer presentation. The learning phases are illustrated in Figure 2. In this paper, we will not discuss the details of the HTM model itself. For more information about the concepts, functions, and mathematical definitions used in the HTM, please see [13,[20][21][22][23][24].

The Intrusion Scenario Discovery and Prediction Framework
The main components (phases) of the proposed framework is illustrated in Figure 3: 1. Normalizing: Raw alerts are converted into a unified data structure for future analyzing; 2. Intrusion action extraction: A group of normalized alerts can correlated into a hyper-alert, which is called intrusion action in this paper, and should meet three requirements: first, they have the same source IP address and destination IP address; second, they have the same destination port or the same intrusion type; and last, they occur continuously in time order. The massive redundant raw alerts are aggregated in this process; 3. Online action clustering: Many extracted actions possess similar semantics, which will be merged based on the similarity calculating; 4. Intrusion session reconstruction: The time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions, each subsequence is named the intrusion session. Note that all the actions of a particular intrusion session are extracted from the same pair of physical devices. The extracted session will be refined in order to reduce more noise actions; 5. Intrusion session encoding: The intrusion session should be encoded before feed to the HTM algorithm. The session can be first converted into the action ID sequence, and the ID will be In this paper, we will not discuss the details of the HTM model itself. For more information about the concepts, functions, and mathematical definitions used in the HTM, please see [13,[20][21][22][23][24].

The Intrusion Scenario Discovery and Prediction Framework
The main components (phases) of the proposed framework is illustrated in Figure 3: Appl. Sci. 2020, 10, x FOR PEER REVIEW 5 of 17 order to map the inner representation to the outer presentation. The learning phases are illustrated in Figure 2. In this paper, we will not discuss the details of the HTM model itself. For more information about the concepts, functions, and mathematical definitions used in the HTM, please see [13,[20][21][22][23][24].

The Intrusion Scenario Discovery and Prediction Framework
The main components (phases) of the proposed framework is illustrated in Figure 3: 1. Normalizing: Raw alerts are converted into a unified data structure for future analyzing; 2. Intrusion action extraction: A group of normalized alerts can correlated into a hyper-alert, which is called intrusion action in this paper, and should meet three requirements: first, they have the same source IP address and destination IP address; second, they have the same destination port or the same intrusion type; and last, they occur continuously in time order. The massive redundant raw alerts are aggregated in this process; 3. Online action clustering: Many extracted actions possess similar semantics, which will be merged based on the similarity calculating; 4. Intrusion session reconstruction: The time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions, each subsequence is named the intrusion session. Note that all the actions of a particular intrusion session are extracted from the same pair of physical devices. The extracted session will be refined in order to reduce more noise actions; 5. Intrusion session encoding: The intrusion session should be encoded before feed to the HTM algorithm. The session can be first converted into the action ID sequence, and the ID will be The descriptions of each component are listed as follows: 1.
Normalizing: Raw alerts are converted into a unified data structure for future analyzing; 2.
Intrusion action extraction: A group of normalized alerts can correlated into a hyper-alert, which is called intrusion action in this paper, and should meet three requirements: first, they have the same source IP address and destination IP address; second, they have the same destination port or the same intrusion type; and last, they occur continuously in time order. The massive redundant raw alerts are aggregated in this process; 3.
Online action clustering: Many extracted actions possess similar semantics, which will be merged based on the similarity calculating; 4.
Intrusion session reconstruction: The time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions, each subsequence is named the intrusion session. Note that all the actions of a particular intrusion session are extracted from the same pair of physical devices. The extracted session will be refined in order to reduce more noise actions; 5.
Intrusion session encoding: The intrusion session should be encoded before feed to the HTM algorithm. The session can be first converted into the action ID sequence, and the ID will be encoded into a bit array which is the HTM inner representation of that action. The learning phase of the HTM starts with the encoding of the ID sequences; 6.
HTM online learning: The HTM learns the patterns of the action sequences and predicts the next step according to the current input, and the anomaly score which indicates the discrepancy between the predicted value and the input value. The predicted values and the anomaly scores will be updated in a particular matrix; 7.
Intrusion scenario discovery: Based on the analyzing of historical predicted actions and the anomaly scores in the correlation matrix, the correlation strengths of intrusion actions are calculated by the auxiliary method; and 8.
Intrusion prediction: The system will search through the matrix to determine the attacking paths that may happen in the future.
The system is proposed based on our previous work [24], a few concepts, such as intrusion action, intrusion session, correlation graph, and methods, such as normalizing and intrusion action extraction, are inherited. In this paper, the data preprocessing is not discussed in detail.

Intrusion Action Clustering
To simplify the associations between actions, the online action clustering is introduced, which can reduce the classes of actions. The definition of intrusion action is modified for the consequential processes.
Intrusion action, which referrs to a group of attack types derived from the raw IDS alerts, can be denoted as a tuple A: <time, types, cid, srcIp, dstIp>. The time field indicates the creating time, the types field contains a group of alert types expressing the same malicious behavior, the cid field indicates the class id, and the fields srcIp and dstIp indicate the source IP address and the destination IP address, respectively. The actions are extracted by the intrusion action extracting process.
The intrusion actions can be compared with each other based on the similarity of the types field. In this paper, the edit distance of two strings is used to calculate the similarity of the actions. Consider two intrusion actions A and A ' , the types field can be denoted as T(X) = { t n |t n ∈ X, n = 1, 2, . . . , |X|}, the similarity of two attack types which are strings can be calculated with Equation (1): where LD is the Levenshtein Distance function that returns the distance of two attack types: t x and t y . The parameter µ is a factor that determines the capacity of a potential cluster. Thus, the similarity of action A and A' is calculated with Equation (2): where A(t k ) indicates the kth attack-type of A, and p = max (|A|, |A'|), if sim (A, A') = 1, A is the same as A'. Based on the sim function, online clustering can be achieved. The cid field will be set to the cluster id for each action in the same cluster.
For a cluster C = {A k |k ∈ Z, 1 ≤ k ≤|C|}, the average similarity between an action X and cluster C can be calculated with Equation (3): where C[k] denotes the kth action in cluster C. If the value of avgsim (X, C) is higher than the threshold, then X will be put into cluster C, otherwise, a new cluster will be created. Based on the equations described above, the actions with similar semantics can be merged into one class, and the number of actions will be reduced significantly.

Intrusion Session Reconstruction
An intrusion session is a sequence with all the unique actions extracted from the communication of a pair of devices in it. As described before, the time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions. Each subsequence is named intrusion session in this paper.
In practice, the intrusion session always contains long, repeated, and disordered actions, which lower the performance of the sequence learning system. In most situations, the types of actions involved in a particular intrusion are more important than the order of them; in other words, the order of actions can be altered due to variety of situations, such as the network delay, the system error, even the intended disorders made by the attacker. Thus, the session reconstruction is proposed to refine the sessions.
The extracted session in Figure 4 is the ordinary state after applied the pruning algorithm proposed in [24]. If the session is converted into a more precise and clearer one, the performance of the system can be highly enhanced.

Intrusion Session Reconstruction
An intrusion session is a sequence with all the unique actions extracted from the communication of a pair of devices in it. As described before, the time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions. Each subsequence is named intrusion session in this paper.
In practice, the intrusion session always contains long, repeated, and disordered actions, which lower the performance of the sequence learning system. In most situations, the types of actions involved in a particular intrusion are more important than the order of them; in other words, the order of actions can be altered due to variety of situations, such as the network delay, the system error, even the intended disorders made by the attacker. Thus, the session reconstruction is proposed to refine the sessions.
The extracted session in Figure 4 is the ordinary state after applied the pruning algorithm proposed in [24]. If the session is converted into a more precise and clearer one, the performance of the system can be highly enhanced. In order to discover the potential skeleton of the intrusion, a session refining method is proposed in this paper to highlight the real information hiding in the sessions. It removes the repeated actions from intrusion sessions by calculating the similarity of actions. When reconstructing the session, the new arrived action will be compared with the existing actions of the converted session, and it will discard the actions if they have been in the converted session. The reconstruction process keeps all the action information and the basic orders of actions, and it is more effective than our pruning algorithm proposed in our previous work [24]. Figure 5 shows how the method works. Although the session refining method is simple, it is effective in discovering intrusion patterns. The most important is that it can stabilize the association patterns of actions in the session, and provide a clean sequence for the HTM algorithm to learn. In order to discover the potential skeleton of the intrusion, a session refining method is proposed in this paper to highlight the real information hiding in the sessions. It removes the repeated actions from intrusion sessions by calculating the similarity of actions. When reconstructing the session, the new arrived action will be compared with the existing actions of the converted session, and it will discard the actions if they have been in the converted session. The reconstruction process keeps all the action information and the basic orders of actions, and it is more effective than our pruning algorithm proposed in our previous work [24]. Figure 5 shows how the method works.

Online Learning and Prediction
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 17

Intrusion Session Reconstruction
An intrusion session is a sequence with all the unique actions extracted from the communication of a pair of devices in it. As described before, the time-ordered action sequence will be split into many subsequences by calculating the variance of the time gaps between actions. Each subsequence is named intrusion session in this paper.
In practice, the intrusion session always contains long, repeated, and disordered actions, which lower the performance of the sequence learning system. In most situations, the types of actions involved in a particular intrusion are more important than the order of them; in other words, the order of actions can be altered due to variety of situations, such as the network delay, the system error, even the intended disorders made by the attacker. Thus, the session reconstruction is proposed to refine the sessions.
The extracted session in Figure 4 is the ordinary state after applied the pruning algorithm proposed in [24]. If the session is converted into a more precise and clearer one, the performance of the system can be highly enhanced. In order to discover the potential skeleton of the intrusion, a session refining method is proposed in this paper to highlight the real information hiding in the sessions. It removes the repeated actions from intrusion sessions by calculating the similarity of actions. When reconstructing the session, the new arrived action will be compared with the existing actions of the converted session, and it will discard the actions if they have been in the converted session. The reconstruction process keeps all the action information and the basic orders of actions, and it is more effective than our pruning algorithm proposed in our previous work [24]. Figure 5 shows how the method works. Although the session refining method is simple, it is effective in discovering intrusion patterns. The most important is that it can stabilize the association patterns of actions in the session, and provide a clean sequence for the HTM algorithm to learn.

Online Learning and Prediction
The HTM algorithm can provide some advantages for the proposed framework: (1) It learns the Although the session refining method is simple, it is effective in discovering intrusion patterns. The most important is that it can stabilize the association patterns of actions in the session, and provide a clean sequence for the HTM algorithm to learn.

Online Learning and Prediction
The HTM algorithm can provide some advantages for the proposed framework: (1) It learns the patterns and trends based on the variance of the features over time, which makes it suitable for sequence learning, and it has a better performance than other sequence learning algorithm [13]; (2) it further reduces the influence of the noise due to the special representations of the data; and (3) it can be used to reduce the negative influences of data problems, such as sudden changes in the pattern, and it highlights the real patterns that should be focused on.
The association relation of actions is the main objective to be learned by HTM algorithm. The correlation between two actions A and B can be measured by their temporal orders in the sequence, and the TM (Temporal Memory) learning process of HTM which is described in Section 2.1 can discover and accumulate the temporal association strength between A and B. Thus, the intrusion sessions are first converted into the action ID sequences and the IDs are fed to the HTM by their occurrence order.
The backend system will record and analyze the output of the HTM in real-time. There are three main output values are recorded for analyzing: The current input value, the predicted value, and the anomaly score calculated with the two values. The anomaly score assesses the differences between the predicted SDR and the input SDR.
Considering P' t−1 as the predicted SDR at time step t − 1, and P t as the input SDR at time step t. The anomaly score at time step t can be calculated as: where |P t | denotes the number of one bit in P t , the symbol • computes the number of one bit of the SDR that is different from the other one. ω t will be 0 if the predicted SDR is totally different from the input SDR. The prediction made by HTM can be denoted as a triple: <p, p', ω> where p indicates the current input value, p' indicates the predicted value at the previous time step, and ω indicates the anomaly score. The records with ω ≥ 0.6 will be selected to update the correlation matrix.
Although the introduction of the HTM algorithm enhances the framework in the ability of online learning and analyzing, the learned patterns will evolve with the data variance in the long-term continuous learning process. In other words, the learned patterns should be kept stable in a way. Thus, the correlation matrix (CM) is proposed.
For given actions A 1 , A 2 , A 3 , . . . , A n , the correlation matrix can be defined as an n × n matrix where A i indicates the input action, and A j indicates the predicted action. The value of M(A i , A j ) is a tuple, A i → A j denotes that the number of A i → A j has been predicted, and ω 0 is a single exponential smoothed anomaly score. ω 0 can be calculated by Equation (5): where ω n indicates the anomaly score of the latest prediction. α denotes the smooth factor. In the prediction phase, the correlation strengths (CS values) between actions are calculated by the historical predictions as: where |A i → * | denotes the total number of the predictions with A i as the input action.
For a particular input action A, the actions predicted by HTM can be searched in the matrix and the CS values between A and the predicted actions are calculated, the action B will be selected if CS (A, B) is the max value among the CS values. Then, the framework iteratively finds the action C that makes CS (B, C) the max value among CS values calculated with B as the input action, and so on. Thus, the prediction sequence is constructed. In order to avoid the system falling into an endless loop, the searching progress will be ended if the newly searched action has appeared in the prediction sequence.

Results
In this section, the experimental evaluations are performed. Two main aspects of the framework will be evaluated: (1) The accuracy of intrusion pattern learning; and (2) the ability of multistep attack scenario discovery.

Experiment Setup
The proposed system is implemented with Java programming language and runs on a server with 8 GB RAM and 2.6 GHz Intel CPU, on which Windows 10 operating system is installed. In addition, the standard Java implementation of the HTM framework is employed and the free open source IDS software Snort (Sourcefire Columbia, Maryland, USA) is used in reading, detecting network traffic data, and generating raw alerts. The initial parameters of the system are summarized in Table 1.

Evaluations on CICIDS2017 Dataset
CICIDS2017 intrusion detection evaluation dataset [25] was published by the Canadian Institute for Cybersecurity in 2017 which covers a variety of the most up-to-date well-known attacks and it is free to download for studying. We adopt the dataset for the reasons listed below:

1.
The dataset resembles real-world network traffic data that contains realistic and naturalistic benign background traffic that simulates everyday interaction behaviors of 25 users; 2.
The data was continuously captured for five days, from Monday to Friday in July 2017. About 10 GB of data was created per day and was labeled for training. It needs to noted that the Monday dataset only contains benign interactions of daily work; 3.
The network traffic was captured in a completely configured network, including modem, firewall, switches, routers, and PCs, with a variety of operating systems, such as Windows, Ubuntu, and Mac OS X, and most of the traffic is based on the HTTP, HTTPS, FTP, SSH, and email protocols.
To evaluate the proposed system, we used the data captured on Thursday, containing two attacking processes which can be seen as two scenarios. First, in the morning, the attacker executed three web attacks: brute force, XSS (Cross Site Scripting), and SQL injection. Second, in the afternoon, the attacker performed a series of infiltrations.
The evaluation process is divided into two parts: first, we will assess the abilities of the intrusion sequences extracting and reconstructing for the learning process. In the second part, we will evaluate the predictions of the system.

Data Preprocessing Results
We split the Thursday data file into four parts for snort to read and detect in a continuous way. More than 64,000 alerts generated by snort are recorded in a file. About 5876 intrusion actions are extracted and 39 classes of intrusion actions remain after clustering. After that, 780 intrusion sessions are extracted and only 28 sessions have more than two actions. After data preprocessing, 28 distinct intrusion sessions consisting of 39 classes of actions are extracted. The statistical results are listed in Table 2. In Table 2, about 2040 raw alerts (3.2% of total alerts) are used to construct the actions and sessions, the rest of alerts are merged or discarded by system. The system reduces the number of alerts by two ways: (1) intrusion action extraction, the similar alerts (with small edit distance of the alert type) will be merged to express the same action; (2) action clustering, 5876 actions are clustered into 39 classes, which can significantly reduce the number of alerts; and (3) intrusion session reconstruction, which can reduce the actions in a session. For the statistical requirements of the experiments, each intermediate process' results are saved in the files. The data related to the real intrusion can be very sparse compared to the whole dataset. The extracted intrusion actions are listed in Table 3, and some actions consist of multiple alerts. Compared with the previous work [24], there are two improvements in the data processing: (1) The total number of the extracted intrusion actions decreased significantly due to the action clustering which can reduce the similar actions in a session and make the semantic information more clearly, while, at the same time, it saves many computing resources; and (2) the rebuilt sessions are more accurate in expressing the intrusion strategies. This is because the complicated binary relations are simplified. Most importantly, it highlights the semantics hiding in the sessions.

Data Learning Results
The accuracy of the framework in discovering potential intrusion patterns and the ability of discovering the intrusion scenarios will be evaluated in this section. First, the framework will be tested on an automatically generated dataset by programming. Second, the system will be tested on a series of intrusion sessions extracted in the data preprocessing evaluation, and these intrusion sessions contain the real intrusion scenarios.
In order to test the core algorithm of the system in a flexible way, the automatically generated dataset are used by programming. Five intrusion patterns are generated with about 10 actions in each pattern. Some irrelevant actions are inserted into these patterns randomly to mislead the system. The target of the system is to discover these unknown patterns in an unsupervised way. After training, some actions are provided to the system to make predictions. The predicted actions will be compared with the five patterns, and all the predicted actions that are not match with the patterns will be recorded for statistics.
The details about the dataset are listed in Table 4. The whole evaluation contains 10 tests, and each test is conducted separately with the testing data regenerated. The test data contains three parts: (1) The intrusion patterns to be discovered by the system; (2) the background traffic; and (3) the gradually increased noise actions inserted into each pattern. Table 4. Intrusion patterns for testing. The system makes predictions for input actions.

Intrusion Patterns
Input Action System Prediction 1-2-3-4-5-6-7-8-9-10 1 2-3-4-5-6-7-8-9-10 11-12-13-14-11-12-13-14-15-16 12 13-14-11 (13-14-15- The generated data are shown in Figure 6, the number denotes the action IDs, and the actions with ID more than 100 are irrelevant actions. For the system, the frequency of one or more actions appearing together in the sequence will be learned by the HTM algorithm through the prediction and revising process. The testing program will create the data file for each test and load the data into the framework. Each data file contains about 2000 actions and 20 randomly-selected intrusion patterns with noise. The noise actions will increase with testing.
For each test, the framework will correlate the actions based on their high CS value. If two actions appear together only once, the anomaly score of the HTM for this prediction will be very low and the CS value too, and if one particular action frequently appears with many other different actions, the HTM will keep a low prediction accuracy for this action because it cannot decide which action will appear after it, Additionally, the confidence will decrease. Thus, only stable patterns will be discovered. In the test, the system will start searching with the given action and only select the action with the highest CS value among the candidate values which are greater than 0.6.
The prediction errors for each pattern in each test are listed in Table 5. The prediction accuracy for each pattern is an average value of 10 tests. The prediction errors will increase with the noise level. Table 5. Intrusion pattern discovering results.
For each test, the framework will correlate the actions based on their high CS value. If two actions appear together only once, the anomaly score of the HTM for this prediction will be very low and the CS value too, and if one particular action frequently appears with many other different actions, the HTM will keep a low prediction accuracy for this action because it cannot decide which action will appear after it, Additionally, the confidence will decrease. Thus, only stable patterns will be discovered. In the test, the system will start searching with the given action and only select the action with the highest CS value among the candidate values which are greater than 0.6.
The prediction errors for each pattern in each test are listed in Table 5. The prediction accuracy for each pattern is an average value of 10 tests. The prediction errors will increase with the noise level.  The false positive of the framework can be calculated as FP = #prediction errors/#all actions predicted × 100%. Figure 7 shows the false positives at different noise levels. If each pattern contains less than five noise actions (50% of noise) the false positives can be lower than 3%; if the noise goes up to 80%, the false positives will increase significantly. Thus, the framework can work at a noise level lower than 80%. In the tests, the anomaly score threshold is set to 0.6, 30% of the data are filtered out and not recorded in the correlation matrix, which reduces the size of the matrix and the searching time. When actions 1-10 are correlated by the system, the CS values are shown in Figure 8. With the complex intrusion scenario, the framework tends to correlate multiple intrusion patterns together based on the common actions they are sharing. For instance, the three patterns (17~29, 21~30, 25~24) listed in Table 5 will be correlated because they share the actions 24, 28, and 29. Figure 9 illustrates the correlate graph. In the tests, the anomaly score threshold is set to 0.6, 30% of the data are filtered out and not recorded in the correlation matrix, which reduces the size of the matrix and the searching time. When actions 1-10 are correlated by the system, the CS values are shown in Figure 8. In the tests, the anomaly score threshold is set to 0.6, 30% of the data are filtered out and not recorded in the correlation matrix, which reduces the size of the matrix and the searching time. When actions 1-10 are correlated by the system, the CS values are shown in Figure 8. With the complex intrusion scenario, the framework tends to correlate multiple intrusion patterns together based on the common actions they are sharing. For instance, the three patterns (17~29, 21~30, 25~24) listed in Table 5 will be correlated because they share the actions 24, 28, and 29. Figure 9 illustrates the correlate graph. With the complex intrusion scenario, the framework tends to correlate multiple intrusion patterns together based on the common actions they are sharing. For instance, the three patterns (17~29, 21~30, 25~24) listed in Table 5 will be correlated because they share the actions 24, 28, and 29. Figure 9 illustrates the correlate graph. To evaluate the whole system based on a real intrusion scenario, the infiltration attack scenario of the CICIDS2017 dataset is used. The scenario is about a Dropbox download attack, which leveraged the technologies of overflow, port scanning, and Nmap. There are 13,700 IDS alerts generated by snort, 1647 intrusion actions, 163 sessions are extracted, and 22 sessions have more than two actions. All the extracted sessions are loaded into the framework. A part of the sessions is listed in the Table 6. The discovered intrusion scenario is shown in Figure 10. Compared with our previous work, the number of nodes in the correlation graph is reduced and the semantic of the correlation graph is closer to the real scenario. However, the samples related to the real scenario are very limited, although there are more than 13,700 alerts, only 720 alerts (5%) are used to construct the actions. To evaluate the whole system based on a real intrusion scenario, the infiltration attack scenario of the CICIDS2017 dataset is used. The scenario is about a Dropbox download attack, which leveraged the technologies of overflow, port scanning, and Nmap. There are 13,700 IDS alerts generated by snort, 1647 intrusion actions, 163 sessions are extracted, and 22 sessions have more than two actions. All the extracted sessions are loaded into the framework. A part of the sessions is listed in the Table 6. Table 6. A part of the extracted intrusion sessions of an infiltration attack.

Sessions
Sessions Sessions The discovered intrusion scenario is shown in Figure 10. Compared with our previous work, the number of nodes in the correlation graph is reduced and the semantic of the correlation graph is closer to the real scenario. However, the samples related to the real scenario are very limited, although there are more than 13,700 alerts, only 720 alerts (5%) are used to construct the actions. To evaluate the whole system based on a real intrusion scenario, the infiltration attack scenario of the CICIDS2017 dataset is used. The scenario is about a Dropbox download attack, which leveraged the technologies of overflow, port scanning, and Nmap. There are 13,700 IDS alerts generated by snort, 1647 intrusion actions, 163 sessions are extracted, and 22 sessions have more than two actions. All the extracted sessions are loaded into the framework. A part of the sessions is listed in the Table 6. The discovered intrusion scenario is shown in Figure 10. Compared with our previous work, the number of nodes in the correlation graph is reduced and the semantic of the correlation graph is closer to the real scenario. However, the samples related to the real scenario are very limited, although there are more than 13,700 alerts, only 720 alerts (5%) are used to construct the actions. Compared with our former work, the new system gains four new better characters: (1) there are no more complex branches in the correlation graph because all the sessions are reconstructed to remove the ambiguous relations; (2) the new system is focused on predicting the future attack trends rather than the particular next attack. The action nodes in the graph are the abstraction of a group of Compared with our former work, the new system gains four new better characters: (1) there are no more complex branches in the correlation graph because all the sessions are reconstructed to remove the ambiguous relations; (2) the new system is focused on predicting the future attack trends rather than the particular next attack. The action nodes in the graph are the abstraction of a group of malicious behaviors with the same purpose; (3) the correlation graph is constructed based on the prediction results of HTM, which means the graph is representing one of the intrusion scenarios previously learned by HTM; and (4) the correlation graph is updated over time, which will be beneficial for learning new patterns and predicting unknown attacks.

Discussion
The discussions of the experimental results are given in Section 3.2. In this section, our works are concluded. The unsupervised online sequence learning and prediction system is proposed. Several measures have been introduced to enhance the performance of the system, such as intrusion action clustering, intrusion session reconstruction, and the optimizations of data processing. These measures can effectively reduce the data noises, redundancies, and false positives, making it easier to discover the potential intrusion behaviors and the correlations between them. The reduction of intrusion action types and unified intrusion sessions have greatly reduced the ambiguous correlations and further simplified the correlation graphs. We introduced the HTM cortical learning algorithms to improve the online sequence learning, based on the predictions of HTM, the correlation graph can be constructed in real-time and keeps updating over time. Offline training is not needed anymore, which makes it lightweight for online work.
Many aspects of the system should be studied and enhanced in the future: (1) the orders of actions in a session are important for the system; if all actions randomly change their relative positions frequently, the association patterns between actions are difficult to discover for the system, thus, a pattern stabilization method (maybe based on the statistics) which can correct the disordered actions before learning is appropriate for the system; (2) if multiple intrusion scenarios occurred in parallel and share some common intrusion actions, the framework will correlate them into one large scenario, which leads to the complex model, so the studies of parallel intrusion scenario construction is a direction; and (3) although the HTM algorithm can learn the unknown patterns of the sequence, it can only predict the next time step, rather than predicting a sequence of future actions. The parameters of the HTM algorithm should be carefully set, though it is easier to set than other machine learning algorithms.