A Graph Representation Learning Algorithm for Low-Order Proximity Feature Extraction to Enhance Unsupervised IDS Preprocessing

Featured Application: We use the proposed packet2vec learning algorithm for IDS preprocessing, the basic steps of IDS are as follows. First, the originally collected trafﬁc is split into packets to be truncated into ﬁxed length. Next, the packet2vec learning algorithm is used to obtain local proximity structure features of the packet for preprocessing. Then, the original features of the packet are combined with the local proximity features as the input of deep auto-encoder for IDS. Finally, the accuracy was evaluated with the detection rate in IDS. In addition, the model proposed in this paper can be deployed to the enterprise gateway, dynamically monitor network activities, and connect with the ﬁrewall to protect the enterprise’s network from attacks. It can be deployed in a cloud computing environment or a software-deﬁned network to classify trafﬁc, and monitor network behavior and alerts in real time. It can be deployed into a network security situational awareness system for prediction and visualization through spatial feature extraction. Abstract: Most existing studies on an unsupervised intrusion detection system (IDS) preprocessing ignore the relationship among packets. According to the homophily hypothesis, the local proximity structure in the similarity relational graph has similar embedding after preprocessing. To improve the performance of IDS by building a relationship among packets, we propose a packet2vec learning algorithm that extracts accurate local proximity features based on graph representation by adding penalty to node2vec. In this algorithm, we construct a relational graph G (cid:48) by using each packet as a node, calculate the cosine similarity between packets as edges, and then explore the low-order proximity of each packet via the penalty-based random walk in G (cid:48) . We use the above algorithm as a preprocessing method to enhance the accuracy of unsupervised IDS by retaining the local proximity features of packets maximally. The original features of the packet are combined with the local proximity features as the input of a deep auto-encoder for IDS. Experiments based on ISCX2012 show that the proposal outperforms the state-of-the-art algorithms by 11.6% with respect to the accuracy of unsupervised IDS. It is the ﬁrst time to introduce graph representation learning for packet-embedded preprocessing in the ﬁeld of IDS.


Introduction
Intrusion detection systems (IDSs) [1][2][3] have been introduced to monitor the network status, assess the security status, and take appropriate precautionary measure before the attack with serious consequences.IDSs are classified into signature-based detection [4] and anomaly-based detection [5].Signature-based intrusion detection has the ability to detect known attack behavior.Anomaly-based detection has the ability to detect known attack behavior and unknown attack behavior by analyzing the features of network packets.Anomaly-based detection requires finding features that accurately characterize network packets.However, the network packet preprocessing methods suffer from low performance to extract features [6].
One of the main reasons to lead the above situation is that the features extracted by the existing unsupervised algorithms [7][8][9] are not accurate to preprocess network packets.The network packets in the existing unsupervised algorithms [7][8][9] are considered to be independent.These algorithms ignore the similarity relationship between network packets.According to homophily hypothesis [10], it can be seen that network packets with strong similarity belong to the same class and are closer in the similarity relational graph.In addition, network packets with strong similarity have similar local proximity embedding [11].Skip-gram [12] can be used to embed the local proximity features of the similarity relationships between network packets.Therefore, it can be concluded that the analysis of the similarity relationship between network packets has the ability to be more flexible and accurately determine whether the network packet has an intrusion.Moreover, effective preprocessing is the basis for improving the performance of the entire IDS because the network packet features obtained by preprocessing directly affect the final performance of the IDS.Graph representation approaches have a good potential for achieving effective similarity relationship representation of network packet.Therefore, this paper proposes a packet2vec learning algorithm preprocessing network packet, which is an unsupervised preprocessing algorithm based on graph representation learning.The algorithm uses packet2vec for preprocessing to obtain local proximity features that describe the similarity relationship between network packets.The local proximity features are combined with the original features of the network packet to describe the network packet.The performance of IDS is improved by the increase in effective information in preprocessing.
The motivation to introduce graph representation for network packets preprocessing is as follows.The graph representation learning algorithm has achieved good results in the application of approximate semantic classification of words [12].An interesting analogy is that we treat a network packet as a word.Then, the problem of distinguishing the class of network packet according to the similarity relationship between network packets can be regarded as the approximate semantic classification problem of words.In the approximate semantic classification of words, similar words tend to appear in similar word neighbors [10].Therefore, network packets belonging to the same class also have similar local neighbors, where the class consists of DDOS, HttpDos, normal, Brute Force SSH, Infiltrating [13][14][15].That is, whether there are similar local neighbors has the ability to characterize the similarity between network packets.Therefore, we believe that the features obtained by preprocessing network packets based on graph representation learning algorithm have the ability to achieve better performance of IDS.
The main contributions of this paper are as follows.(1) In order to improve the performance of the existing unsupervised algorithms of intrusion detection, this paper proposes a packet2vec learning algorithm to extract the local proximity features of network packets.The proposed algorithm based on graph representation by considering the relationship between network packets, and then uses deep Auto-encoder for an intrusion detection system.The system is named by Packet2vec-AE.Compared with the existing preprocessing algorithms for extracting the features of network packets, our proposed packet2vec learning algorithm combines the original features of the network packet and the local proximity features that characterize the similarity between network packets.Therefore, the accuracy of unsupervised intrusion detection is improved.(2) In this algorithm, we construct a relational graph G by using each packet as a node, calculate the cosine similarity between packets as edges, and then explore the low-order proximity of each packet via the penalty-based random walk in G .We use the above algorithm as a preprocessing method to enhance the accuracy of unsupervised IDS by retaining the local proximity features of packets maximally.This is the first time in the intrusion detection to extract features using the graph representation leaning algorithm.(3) In this paper, the penalty is added on the basis of the existing graph representation algorithm node2vec [10].This method can increase the probability of selecting nodes within the local proximity, and the closer the source network packet is, the greater the probability of being selected.Therefore, this method has the ability to extract the local proximity features of the network packet more accurately, and thus more accurately describe the similarity relationship between the network packets.(4) The local proximity features are obtained to accurately characterize the similarity relationship between network packets from the similarity relational graph.The penalty term is used to limit the random walk range of node2vec [10] to k-order proximity.In detail, k is a positive integer that can be customized.This approach has the ability to accurately characterize the neighbor structure of network packets.Experiments have shown that using packet2vec-AE with penalty has the ability to achieve better performance than packet2vec-AE without penalty.In the best case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE are up to 94.7%, 90.9%, 94.3%, and 92.6%, respectively.The proposed algorithm achieves the best performances regarding the accuracy, the precision, and the F 1 exceeding those of the other state-of-the-art algorithms by 11.6%, 11.9% and 8.7%, respectively.In the worst case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE reached 87.4%, 81.1%, 87.3% and 84.1%, respectively.In the worst case, the proposed algorithm achieves the good performances regarding the accuracy, the precision, and the F 1 exceeding those of the other state-of-the-art algorithms by 4.3%, 4.9% and 0.2%, respectively.(5) An empirical formula, i.e., the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η, is designed to calculate the approximately optimal penalty value.The experimental results show that the penalty value calculated by this formula has the ability to obtain better intrusion detection performance.
Section 2 describes related work.Section 3 introduces the IDS based on packet preprocess using packet2vec-Autoencoder.Section 4 introduces the experiment.Section 5 discusses the results.Section 6 concludes the paper.

Unsupervised Intrusion Detection Techniques
Intrusion detection mainly includes signature-based intrusion detection and anomaly-based intrusion detection.In detail, signature-based intrusion detection is also known as rule-based intrusion detection.Signature-based intrusion detection has a high detection rate for known attacks, but has no ability to detect attack behaviors that are not in the rule base [4].Anomaly-based intrusion detection is also known as behavior-based intrusion detection.This algorithm has the ability to detect unknown attack behavior [5].Feature extraction based on accurately preprocessed network packets in anomaly-based intrusion detection is the basis for good performance.At present, it is easy to collect a large amount of unlabeled data in intrusion detection, and it is difficult to obtain a large amount of labeled data [16][17][18].Therefore, an unsupervised algorithm for intrusion detection has been introduced.
In recent years, there have been some studies based on unsupervised intrusion detection techniques.In 2007, Liu et al. used a hierarchical PCA model to detect intrusion behavior on the KDD99 dataset [8].In 2017, Gouveia et al. used RBM to detect intrusion behavior on the ISCX2012 dataset [7].In 2018, Farahnakian et al. used Deep Auto-encoder to detect intrusion behavior on the KDD99 dataset [9].The common point of the above studies [7][8][9] is that the preprocessing ignores the similarity between network packets.Network packets with strong similarity relationship are closer in the similarity relation graph and have similar local proximity embedding [10].The preprocessing algorithm in the above studies ignores the similarity relationship between network packets, resulting in low accuracy of intrusion detection.Therefore, we introduce a graph representation leaning algorithm to solve the above problem.The network packet is preprocessed by using a graph representation leaning algorithm.In the proposed algorithm, the original features of the network packet are combined with the local proximity features describing the similarity relationship between the network packets as the input of the neural network.The proposed algorithm adds effective features by considering the similarities between network packets.

Graph Representation Learning
Graph representation leaning refers to the automatic extraction of proximity features of nodes in a graph [1].This algorithm has achieved effects in the application of approximate semantic classification of words [12], handwritten character classification [19] and so on.The existing graph representation leaning algorithms are mainly the following, such as: Node2vec [10], Deepwalk [20], LINE [21].However, Deepwalk [5] uses a simple unbiased random walk, which does not control the direction of random walks.Therefore, Deepwalk is easy to sample into the higher-order proximity range, which makes it impossible to accurately describe the local proximity features of the current node; LINE [21] does not have the ability to simultaneously sample first-order proximity and second-order proximity, so the algorithm has limitations; Node2vec [10] uses a random walk based on breadth first search (BFS) or depth first search (DFS) to explore the neighbor of the node.However, random walks based on BFS or DFS are blind searches [22].In other words, BFS lacks constraints on the range of random walks.Therefore, the random walk has the probability of sampling to the source node's higher-order proximity, which makes it impossible to accurately extract the local proximity features of the source node that describe the similarity relationship between network packets.Therefore, we propose a node2vec with penalty for unsupervised automatic preprocessing.The proposed algorithm limits the range of random walks so that the sampling of random walks only occurs within the low-proximity of the source node.Therefore, the algorithm has the ability to obtain local proximity features that accurately characterize the similarity of network packets.
The literature that we surveyed has not used the unsupervised automatic preprocessing of graph representation learning algorithm with penalty in the field of intrusion detection.In the proposed algorithm, the original features of the network packet are combined with the local proximity features describing the similarity relationship between the network packets as the input of the neural network.The proposed algorithm adds effective features to get better intrusion detection performance.

IDS multi-classification:
It refers to the problem of determining that the current network packet belongs to one of multiple types including normal, DDoS, Http DoS, brute force SSH, and infiltrating attacks [23].
Relational graph on packets similarity: It refers to treating each packet as a node v i ∈ V. Calculating the cosine similarity between network packets v i , v j as the weight w ij ∈ W. The vector of the network packet used to calculate the cosine similarity is the original features of the network packet.The detailed description of the original features of the network packet is in Section 3.3.An undirected complete graph G is constructed using the above method.In order to reduce the use of memory, the edges whose weights are lower than the threshold ε are pruned.G after pruning is called relational graph on packets similarity, expressed as G (V, E, W).
Pruning threshold: It refers to a value for pruning the edges whose weights are lower than the threshold ε in the graph G .
Source network packet(alias name: Source node): It refers to the starting network packet for random walk.
Packet embedding: Given a relational graph on packets similarity G (V, E, W), it refers to a mapping f : V → R d .In details, f is the mapping function from node to feature representation.
Automatic preprocessing: It refers to the automatic extraction of features from the original packet and eliminates manual intervention [6].
K-order proximity: It refers to a kind of feature that captures the k-hop relationship between each pair of vertices.
First-order proximity: It refers to the local pairwise proximity between two connected vertices, which captures the direct neighbor relationship between the vertices.For each vertex pair (v i , v j ), if v i , v j ∈ E, the first-order proximity between v i and v j is w ij ; otherwise, the first-order proximity between v i and v j is 0 [4].First-order proximity is equivalent to k in the k-order proximity equal to 1.
In a) of Figure 1, nodes v 1 , v 2 , v 3, v 4 , v 5 are the first-order proximity of v i .
Second-order proximity: It refers to capturing a two-hop relationship between each pair of vertices, which describes the proximity of the pair's neighbor structure [5].Second-order proximity is equivalent to k in k-order proximity equal to 2. In Figure 1a, nodes v 6 , v 7 , v 8 , v 9 , v 10 , v 11 , v 12 are the second-order proximity of v i .
Low-order proximity: It refers to a kind of feature that captures the k-hop relationship between each pair of vertices, where k is lower than 3.It contains first-order proximity and second-order proximity.
High-order proximity: It refers to capturing the k-hop relationship between each pair of vertices, which captures a more global structure [5].k is greater than or equal to 3. In Figure 1a, nodes v 13, v 14 , v 15 , v 16 , v 17 are high-order proximity of v i .
Proximity sampling strategy: It refers to sampling the network packets in the neighbor of the source network packet v i by random walk [10], expressed as S. In Figure 1b, the proximity sampling strategy of the existing algorithm [10] is used for sampling to obtain a proximity sampling list of the source network packet.In Figure 1c, the proximity sampling strategy of the proposed algorithm is used for sampling to obtain a local proximity sample list of source network packets.
Local proximity sample list: It refers to the result of sampling with the proximity sampling strategy S in the low-order proximity of the source network packet v i , expressed as N s (v i ).In G , the closer the network packet is, the more similar the local neighbor sample list of the network packet is [10].Therefore, the local proximity sample list has the ability to accurately describe the similarity relationships between network packets.
Local proximity features: It refers to optimizing the local proximity sample list N s (v i ) with Skip-gram [12] to obtain continuous d-dimension features.The local proximity sample list N s (v i ) has the ability to accurately characterize the similarity between network packets.Therefore, local proximity features also have the ability to accurately characterize relationships between network packets.[ , , , , , , , , , , , , ] [ , , , , The number of the gray arrow indicates the order in which the sampling strategy is used for sampling.(c) A local proximity sampling strategy is used to perform a random walk of the constraint range to obtain a local proximity sample list of the source network packet.

Flowchart
Figure 2 is a flow chart of the algorithm for intrusion detection.Intrusion detection is mainly divided into three parts.In the first part, the originally collected traffic is split into packets.The second part is the automatic preprocessing to obtain the features of the network packet.In this part, we combine the original features of the network packet with the local proximity features of the network packet obtained by packet2vec preprocessing.In the third part, the features obtained by the preprocessing are used as the input of the deep Auto-encoder for intrusion detection.
Data preprocessing is the basis for improving the performance of the entire IDS, because the network packet features obtained by data preprocessing directly affect the final performance of the IDS [6].Network packets in the pre-processing of existing unsupervised intrusion detection algorithms are generally considered to be independent.The existing algorithm ignores the similarity relationship between the network packets.Therefore, existing algorithms suffer from low intrusion detection accuracy.The higher the similarity of the network packets, the more similar the local proximity features sampled by packet2vec in the relational graph of network packet similarity.Flowchart of intrusion detection system.Especially, the packet2vec is added to extract low-order proximity feature of network packets in the step of preprocessing to enhance the performance of IDS.

Unsupervised Preprocessing Based on Packet2vec Learning with Graph Representation
As a preparation step for preprocessing of network packets, we convert the original network traffic into network packet features.Network packets are the basic unit used to determine whether network traffic flow is intrusive.Therefore, we use SplitCap [6] to group interacting source IPs and destination IPs in network traffic into the same network packet [6].
The preprocessing steps for network packets are as follows.Constructing a relational graph on packets similarity G (V, E, W).The local proximity features in G obtained by packet2vec is called v i_packet2vec .This step will be detailed in Section 3.4.In addition, we also need to extract the original features v i_original in the network packet.Finally, the feature I describing the network packet is obtained in combination with v i_original and v i_packet2vec .For example, if the original features of the network packet are v i_original = [0a 5c], the local proximity features of the network packet obtained by packet2vec are v i_packet2vec = [de 87].Then the features of the network packet obtained after preprocessing are The original features extraction algorithm of the network packet refers to intercepting the first r bytes of the network packet, and then each byte within [0, 255] in the r bytes obtained from the network packet corresponds to a feature [6].As shown in Figure 3, the originally traffic is split into packets to be truncated into fixed r bytes.This algorithm of extracting the original features in the network packet is the same as the algorithm of extracting the features of the network packet in [6,27].The original features extracted by this algorithm have the ability to characterize network packets.Experiments have proved that the algorithm in [6,27], as the original features extraction of this paper, has achieved better results than most of existing algorithms.Algorithm 1 describes the flow of unsupervised preprocessing based on packet2vec learning with graph representation

Algorithm 1 Unsupervised preprocessing based on packet2vec learning with graph representation
Input: network traffic flow ( f i ).
Output: Vec consists of q packets of network traffic, each packet vector (vec i1 ,vec i2 ,• • • ,vec ir ).1: Step 1:Split network traffic flow 2: Using splitcap to split the network traffic flow into q packets (p 1 ,p 2 ,• • • ,p q ).3: Step 2:Construct relational graph on packets similarity G 4: Each network packet is treated as a node v i .Each byte in the r bytes obtained from the network packet v i corresponding to a feature, then we get vector v i = v i_original = (vec_original i1 , vec_original i2 , ..., vec_original ir ). 5: Calculating the cosine similarity between any two packets (v i , v j ) as the weight w ij between the two packets.Constructing an undirected complete graph G. 6: Pruning the edges with similarities below the threshold ε.The graph G after pruning is called relational graph on packets similarity, which is denoted as G (V, E, W). 7: Step 3:Extract the local proximity features by using packet2vec learning 8: Using packet2vec preprocessing to obtain the local proximity feature of each network packet in G , called v i_packet2vec .This part will be described in detail in Algorithm 2. 9: Step 4:Construct vector as the input of deep auto-encoder 10: while the information of q packets is not extracted do 11: Combining v i_original and v i_packet2vec to get where v i_original is the original feature of the i th network packet, v i_packet2vec is the local proximity feature of the i th network packet preprocessed by the packet2vec, and I is the feature of the i th network packet.12: end while 13: Smoothing.14: Vec consists of q packets, each packet vector I. 15: return vector Vec.

Overview
The proximity sampling strategy S refers to sampling the network packets in the proximity of the source network packet v i by random walk.We have numbered each network packet.Therefore, the result of sampling by the proximity sampling strategy S is a list N S of network packet numbers obtained by random walk.The local proximity sampling list N S (ν i ) of a source network packet v i use the strategy S to sample high-similarity packets within the low-order proximity of v i .Therefore, we want to perform random walks only within the range of low-order proximity of the source network packet v i , as shown in c) of Figure 1.However, the existing graph representation learning algorithm has no ability to control the random walk range, which makes it easy to random walk to a high-order proximity farther away from the source network packet v i .The random walk of the existing graph representation learning algorithm is shown in b) of Figure 1.Therefore, existing algorithms suffer from inaccurate local proximity sampling list N S (ν i ) of source network packet v i .In detail, the result of the local proximity sampling list N S (ν i ) of v i is the basic of the local proximity features of v i .Therefore, the local proximity features obtained by existing preprocessing algorithms does not have the ability to accurately describe the similarity relationship between network packets.
In order to control the random walk range of node2vec within the k-order proximity, we introduced Astar to increase the penalty constraint on the random walk of node2vec.The algorithm uses the penalty term to constrain the proximity sampling range of the source network packet v i to ensure that the random walk is within the k-order proximity, where k is a positive integer and k can be customized.In this paper, the value of k is 2. That is to say, the random walk range of the source network packet v i is within the low-order proximity.Accordingly, the proposed algorithm that uses Astar to increase the penalty constraint on the random walk range of node2vec is called the packet2vec learning algorithm.In detail, the target of the packet2vec learning algorithm is that each network packet is mapped to the feature f = V → R d , where d is the dimension of the features obtained by the mapping, and f is a mapping function.Therefore, we find the mapping function f to map each network packet to obtain d-dimensional features.As a preliminary step of the packet2vec learning algorithm, the relational graph on packets similarity G is constructed.Each network packet v i ∈ V is treated as a node.Two basic steps of the packet2vec learning algorithm are as follows.
First of all, using the proximity sampling strategy S to simulate a random walk process with a length l in the neighbor of v i to obtain a local proximity sample list N S (ν i ).The specific process to obtain the local proximity sampling list N S (ν i ) of the source network packet v i is as follows. (1)The source network packet is v i ∈ V, and then the proximity sampling strategy S is used to perform penalty-based random walk in the proximity of the source network packet; (2) At each step of the penalty-based random walk, the current weight is updated according to the penalty-based weight update method.The updated weight is the new transition probability n vx ; (3) Then select a packet for next step of the penalty-based random walk, which is equivalent to simulating the Alias sampling with time complexity O(1) according to the updated transition probability n vx [10]; (4) The above steps (2) and (3) are continuously repeated until the local proximity sampling list N S (ν i ) of length l is obtained.The local proximity sampling list of v i obtained by the above algorithm is N S (ν i ), also known as walk.In detail, N s (v i ) ⊂ V, and the sampling range of penalty-based random walk is not limited to the direct neighbor, but can be sampled by the proximity sampling strategy S within the low-order proximity of the source network packet v i .
Next, we use Skip-gram [28] to optimize the proximity sample list N S (ν i ) to obtain continuous d-dimension features.The specific process is as follows.We seek to optimize the following objective function, which maximizes the log-probability of observing a network proximity N S (ν i ) for a network packet v i conditioned on its feature representation, given by f [10].The objective function is shown as (1).In particular, the Skip-gram [12] aims to learn continuous feature representations for source network packet v i by optimizing a proximity preserving likelihood objective.The network packet feature representations are learned by optimizing the objective function using SGD with negative sampling [28].Finally, we obtain a continuous d-dimensional local proximity feature that accurately characterizes the similarity between network packets to optimize the local neighbor proximity sample list N S (ν i ) motivated by node2vec algorithm in [10].
Figure 4 is a flow chart of the packet2vec learning algorithm.Algorithm 2 describes in detail the algorithm flow of obtaining the local proximity feature of the source network packet by using packet2vec preprocessing.A description of several key operations involved in Algorithm 2 is as follows.

Algorithm 2 Packet2vec learning algorithm
Input: Relational graph on packets similarity G (V, E, W), Walk length of proximity sampling l, Probability of returning to the previous node p, Probability of moving away from the source node q, Penalty value η.Output: Local proximity features of i th network packet v i_packet2vec , which is d dimensional.

Astar: Penalty-Based Weight Update Method
The traditional method uses BFS random walk to obtain network packets with high similarity to the source network packets.BFS can't limit the range of random walks, and it is easy to cause random walk to high-order proximity, as shown in Figure 1b.Therefore, a penalty-based weight update method is introduced, which is called A star.This method is equivalent to adding a penalty constraint on the BFS, so that the range of random walk is within the k-order proximity of the source network packet.The specific process of the penalty-based weight update method is as follows.First, we calculate the penalty value η according to the empirical formula, i.e., (the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η).In detail, we limit the random walk within the k-order proximity of the source network packet, then depth is equal to k.This article defines the value of depth to be 2 in our experimentation.The mean of the edges is the sum of the weights of all edges in the relational graph on packets similarity G divided by the total number of edges; Next, we calculate the penalty term g vx of the edge according to (5) at each step of the random walk; The sum of the penalty term g vx and the biased weight h vx of the edge is calculated according to (4), which is called the penalty-based weight n vx .Then updates the weight value of the edge in G according to the penalty-based weight n vx .The graph after the weight update is recorded as G _cur_weight.Finally, the penalty-based weights of edge below the threshold ε are pruned.Therefore, this method has the ability to control the random walk range within the k-order proximity of the source network packet.
Several key operations involved in the penalty-based weight update method are described below.
Biased weight h vx : h vx is a biased weight, and the calculation method is as shown in (6), where w vx is the weight of the edge, and search bias α pq v pre , x has the ability to roughly control the direction of random walk.The calculation method of biased weight is same with that of the transition probability from the current node to the next node in the reference [10].
Search bias α pq v pre , x : α pq v pre , x has the ability to roughly control the direction of random walks, such as: approximate DFS, approximate BFS.This does allow us to account for the network structure and guide our search procedure to explore different types of network proximities [10].This paper mainly samples the local proximity of the source network packet, so the random walk of the approximate BFS is used in this paper.The calculation method of search bias α pq v pre , x is shown in (7), where x is the next network packet, v pre is the previous network packet, d v pre,x is the shortest path between the nodes v pre and x.Therefore, the value of d v pre,x is a value in 0, 1, 2. Figure 5 illustrates the method of search bias α pq v pre , x roughly control the direction of random walks.α pq v pre , x defines two parameters p and q to guide the direction of random walk.Intuitively, parameters p and q control how fast the walk explores and leaves the proximity of starting network packet v i [10].
Searching bias α pq v pre , x .The walk just transitioned from v pre to v is now evaluating its next step out of node v.And v pre ,v 1, v 2, v 3 are the nodes that may be reached from node v in the next step.Edge labels indicate searching biases α.It is worth noting that v is the current network packet.v pre is the previous network packet, so searching bias α 1 = 1/p.v 1 is connected to v and connected to v pre , so searching bias α 1 = 1.v 2 and v 3 are connected to v, while v 2 and v 3 are not connected to v pre , so searching bias α 3 = 1/q.
Parameter p: The parameter p controls the possibility of revisiting the previous network packet immediately during the random walk [10].This article mainly uses random walk of approximate BFS, so the value of p is usually (p = max(p, 1)).

Parameter q:
The parameter q controls that random walks tend to access network packets farther away from the network packet v pre .This article mainly uses random walk of approximate BFS, so the value of q is usually (q = min(q, 1)).
If the p value is too large, the random walk may often return to the previous network packet, and it is easy to fall into the local loop search; When the p value is too small or the q value is too large, the random walk is easy to sample to the high-order proximity of the source network packet.From the above analysis, it can be concluded that the random walk with only the biased weight h vx has no ability to accurately characterize the local proximity features of the source network packet.Therefore, we use the penalty g vx to constrain the range of random walks to obtain local proximity features that have the ability to accurately characterize the similarity of network packets.
Penalty g vx : g vx is the penalty for the next network packet, and the calculation method is as shown in (5), where depth is the shortest path length from the source network packet to the next network packet.η is the penalty value, so η <= 0. The role of g vx is to increase the penalty if the next network packet is far away from the source network packet during random walk.The farther the next network packet is from the source network packet, the larger the penalty g vx .The penalty g vx has the ability to control the range of random walk; Penalty-based weight n vx : n vx is a penalty based weight, also known as the transition probability from the current network packet to the next network packet.The penalty based weight n vx is calculated as shown in (4).Considering a random walk that just traversed edge v pre , v and now resides at network packet v [10].When a random walk requires the selection of a network packet for the next step, the penalty-based weight n vx on the edge (v, x) needs to be evaluated.The penalty based weight n vx is the sum of the penalty g vx and the biased weight h vx , where v is the current network packet.If the penalty term g vx is 0, the packet2vec leaning algorithm is the node2vec [10] leaning algorithm.
Node2vec based on penalty constraints has the ability to obtain local proximity features of each network packet more accurately.In the local proximity representation of each network packet, a random walk is used to capture the relationship between network packets.The relational graph on packets similarity G is transformed into a set of network packet lists by random walk.The frequency of occurrence of the network packet pairs in the set measures the structural distance between the network packet pairs [10].In detail, the closer the network packet is, the higher the similarity of the network packet.Algorithm 3 details the weight update strategy of the similarity relationship graph.We consider the proximity of the source network packet from the similarity relational graph G as a local search problem.We propose a flexible proximity sampling strategy based on penalty for random walk, which controls the range of random walk.The proposed algorithm uses random walk similar to BFS.Random walk: The source network packet is v i , and the length of the random walk we need to simulate is l.Our goal is to generate a local proximity sample set N S (ν i ) of the source network packet v i .Assume that random walks are started from the source network packet c 0 = v i , and the m th network packet in the random walk is c m .In detail, the network packet c m is generated by the following distribution [10].
where n vx is the weight based on the penalty.n vx is the weight between the nodes v and x after the update based on the penalty weight.Z is the normalizing constant [10].E is a collection of edges.Algorithm 4 describes the random walk with penalty for packet2vec learning algorithm.cur_weight , G'_cur_weight = PBWeight (G , p, q, η, cur, l, dijkstra_path_length, v cur , ε).

Auto-Encoder for Intrusion Detection
We use the preprocessed network packet features as input to the deep auto-encoder intrusion detection.Deep auto-encoder consists of two parts, encoder and decoder.Encoder compresses the raw data into a low-dimensional representation.Decoder reconstructs the low-dimensional representation of the encoder compression.Algorithm 5 describes the flow of Auto-encoder based on on packet2vec learning for intrusion detection T is the output data vector of encoder.It is worth noting that n is larger than m.Encoder uses tanh as the activation function.
where W represents the encoder weight matrix with size m * n and b is a bias vector of dimensionality m.

Decoder:
The decoder function maps the hidden representation X to a reconstruction Y = [y 1 , y 2 , . . . ,y n ].Decoder uses tanh as the activation function.
where W represents the decoder weight matrix with size n × m and b is a bias vector of dimensionality n.

Object function:
The training goal of deep auto-encoder is to minimize the error between input X and output Y.
In addition, we find the optimal parameter θ through the objective function.
Algorithm 5 Auto-encoder based-on packet2vec learning for intrusion detection Input: vector Vec, which contains q network packets in the network traffic, each network packet is a vector (vec i1 ,vec i2 ,• while training dataset is not empty do 10: Update weights and bias using adadelta gradient descent optimization algorithm.

Evaluation
In this section, the performance of the network intrusion detection algorithm using packet2vec-AE is evaluated based on the ISCX 2012 intrusion detection data set.The purpose of the experiment is as follows.

Dataset
Most of the existing network intrusion detection data sets are based on manual experience to extract network packet features [23], such as NSL-KDD [17], KDD CUP 1999 [29], and Kyoto2009 [30].The datasets of the existing raw network packets are ISCX2012 [27] and DAPAR1998 [31][32][33].The attacks in ISCX2012 are relatively new [6].Therefore, we used the ISCX2012 dataset for experiments.Table 1 is a description of the ISCX2012 data set.The data set contains 7 days of traffic data, including normal traffic and four types of attack traffic, such as brute force SSH, DDoS, Http DoS, and infiltrating.Table 2 shows the statistics of ISCX 2012.It can be seen from Table 2 that ISCX 2012 contains a small amount of attack data, so the data set is unbalanced.To solve this problem, we resampled [6] the data set.That is, we undersample [6] the normal type of data, and we oversample [6] the data of the four types of attacks.Table 3 is the data set after resampling.It can be seen from Table 3 that the data set after resampling is balanced.

Evaluation Metrics
This paper uses accuracy (ACC), detection rate (DR), precision(P), and F 1 as evaluation indicators [34].The accuracy rate is an indicator that describes the correctness of the intrusion detection algorithm to detect whether there is an intrusion.The detection rate is used to measure the detection performance of the intrusion detection system.The precision refers to the ratio of the number of positive samples that are actually predicted to positive samples to the number of positive samples predicted by the intrusion detection system.The F1 is the harmonic mean of the precision and detection rates.The formulas are as follows.The meanings of True Positive (TP), False Positive (FP), False Negative (FN), and True Negative (TN) [35,36] are shown in Table 4.

Two Ways to Preprocess Network Packets
There are two ways to preprocess network packets.In method (a), the original features v i_original of the i th network packet is combined with the local proximity features v i_node2vec of the i th network packet obtained by the packet2vec without penalty.In detail, the local proximity features obtained by packet2vec without penalty, i.e., g(n) = 0 in (2), is equivalent to the local proximity features obtained by node2vec.The features obtained by preprocessing the i th network packet is I = v i_original + v i_node2vec ; In method (b), the original features v i_original of the i th network packet is combined with the local proximity features v i_packet2vec of the i th network packet obtained by the packet2vec with penalty.The features obtained by preprocessing the i th network packet is From Table 5, it can be concluded that the optimal accuracy and detection rate of the method (b) are 7.3% and 9.8% higher than the optimal accuracy and detection rate of the method (a), respectively.As can be seen from Table 5, when the length of the random walk is 1000, the accuracy and detection rate of the method (a) are greatly reduced, and the accuracy of the method (b) is still as high as 90.1%.One possible explanation is that the lack of penalty constraints for method (a) may result in the inability to control the random walk range when local proximity sampling.Therefore, the method (a) causes random walks to high-order proximity that are farther away from the source network packet.Eventually, the local proximity sampling of the source network packet is inaccurate, and the features obtained by the preprocessing cannot accurately describe the network packet.Therefore, in the following we use method (b) for data preprocessing.In addition, it is worth noting that the comparison and selection of the length of the random walk is detailed in Part E of Section 4.

The Effectivess of Penalty in Packet2vec Preprocessing by Autoencoder
This section discusses the impact of penalty values η on the performance of IDS in Packet2vec-AE.Different penalty values η may affect the random walk range of the source network packet, and ultimately affect the accuracy of the local proximity features of the source network packet.Therefore, it is necessary to evaluate the impact of different penalty values η in Packet2vec-AE on IDS performance based on experiments.The IDS performance of the Packet2vec-AE penalty values of −0.01, −0.03, −0.05, and −0.1 was evaluated.Table 6 is a statistical value of the similarity relationship between network packets in G .The statistical value shows that the average of the weights in G is 0.93.Table 7 shows the IDS performance with different penalty values in Packet2vec-AE.It can be concluded from Table 7 that the packet2vec-AE based IDS achieves optimal performance when the penalty value is −0.03.When the penalty values are −0.01 and −0.1, the performance of the IDS is relatively poor.Figure 6 shows the Packet2vec-AE visualizational results for different penalty values.It can be concluded from Figure 6 that the IDS can accurately distinguish DDoS attacks, and it is difficult to accurately distinguish the other four types of network packets, when the penalty values are −0.01 and −0.1;When the penalty value is −0.05, the IDS has the ability to distinguish the following types of network packets more accurately, including normal, Brute Force SSH attacks and DDoS attacks; When the penalty value is −0.03, the IDS has the ability to distinguish five class of network packets more accurately.One possible explanation is that the penalty value η determines the range of local proximity samples for network packets.Experiment shows that the penalty value calculated by the empirical formula, i.e., (the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η), can achieve the optimal effect.In this experiment, the value of the pruning threshold ε is 0.86.The random walk range of the source network packet is within the low-order proximity, that is, the maximum value of depth is 2. It uses an empirical formula, i.e., (the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η), for approximate calculation.When the penalty value is −0.03, the Packet2vec-AE algorithm achieves optimal performance.At the same time, the experimental results in Table 7 and Figure 6 show that the Packet2vec-AE algorithm achieves optimal performance when the penalty value is −0.03.In addition, when the penalty value is −0.1, the penalty for the weight is too large, which leads to too much constraint on the local proximity sampling range.Therefore, an excessive penalty value results in random walk within the first-order proximity of the source network packet when sampling local proximity features of the network packet.The sampling range is too small, so the obtained local proximity features have no ability to accurately describe the similarity relationship between the network packets.When the penalty value is −0.01, the penalty for the weight is too small, which leads to insufficient constraint on the sampling range of the local proximity.When sampling the local proximity features of the network packet, there is a large possibility of random walk to high-order proximity farther from the source network packet.The sampling range is too large, so the obtained local proximity features do not have the ability to accurately describe the similarity relationship between network packets.

Influence of Packet2vec-AE Hyper-Parameters
Different penalty values η in the similarity relationship of network packets have different effects on the performance of IDS.In addition, other parameters have different effects on the performance of IDS, such as optimizer of auto-encoder, the pruning threshold, the length of random walk.Therefore, these parameters are also adjustable.Table 8 shows the performance of the auto-encoder (AE) in packet2vec-AE with different optimizers.From this table, we have found that adadelta as the optimizer for AE has the ability to obtain better performance of IDS.Table 9 shows the effect of different random walk lengths of packet2vec on the performance of IDS.From Table 9, we conclude that the packet2vec-AE algorithm has the ability to achieve optimal performance of IDS when the length of the random walk is 60 or 80.When the random walk length is 10 or 100, the performance of the packet2vec-AE algorithm is relatively poor.One possible explanation is that random walks only sample ten proximity packets of the source network packet when the random walk length is 10.The number of network packets sampled by random walks is small, so the features obtained by random walk do not have the ability to accurately describe the local proximity of the source network packet.Therefore, when the random walk length is 10, the performance of the IDS is not good.When the random walk length is 100, the proximity features obtained by random walk sampling is too redundant.Therefore, when the random walk length is 60 or 80, the features of the local proximity of the source network packet can be accurately characterized.Table 10 shows the architectural parameters of the auto-encoder in the packet2vec-AE algorithm.Table 11 shows a list of parameters for packet2vec in packet2vec-AE.Table 12 shows a list of auto-encoder hyper-parameters and its optimizer in packet2vec-AE.The parameter values given in Tables 11 and 12 are the optimal results obtained by experiments.The hardware used in the experiments are presented below.The configuration of the experimental environment is shown in Table 13.In addition, we used a 12 GHz NVIDIA Tesla K40m GPU as an accelerator.

Comparison with the Latest Techniques
The researchers proposed some unsupervised intrusion detection algorithms such as RBM [7], PCA [8] and deep auto-encoder [9].Those algorithms usually ignore the similarity relationship between network packets when preprocessing network packet extraction features.The lack of information analyzed by those algorithms results in relatively low accuracy.The experiment compares the performance of the proposed algorithm with the existing algorithms.Table 14 is a comparison of the performance of the various algorithms.Deep auto-encoder preforms better than PCA and RBM in Table 14, so we use deep auto-encoder to implement the proposal for further improvement.It can be seen from Table 14 that the proposed packet2vec-AE algorithm achieves the best performances regarding the accuracy exceeding those of the other state-of-the-art algorithms by 11.6%.The proposed packet2vec-AE algorithm achieves the best performances regarding the precision exceeding those of the other state-of-the-art algorithms by 11.9%.The detection rate was only worse than that of the best algorithm and ranks second among all five algorithms.We used the harmonic mean F 1 to comprehensively assess the detection rate and precision.The proposed packet2vec-AE algorithm achieves the best performances regarding the F 1 exceeding those of the other state-of-the-art algorithms by 8.7%.The proposed node2vec-AE algorithm achieves the good performances regarding the accuracy, the precision, and the F 1 exceeding those of the other state-of-the-art algorithms by 4.3%, 4.9%, and 0.2%, respectively.We consider node2vec-AE as packet2vec-AE without penalty, so node2vec-AE is considered to be the worst case of the packet2vec-AE algorithm.Figure 7 is a visualization of unsupervised intrusion detection algorithms.It can be seen from Figure 7 that the performance of the PCA algorithm is the worst.The PCA algorithm does not have the ability to accurately distinguish between five different types of network packets.Deep auto-encoder has the ability to accurately distinguish DDoS attacks, but it is difficult to accurately distinguish the other four types of network packets.The proposed node2vec-AE algorithm has the ability to accurately distinguish between several types of network packets, including normal, Brute Force SSH attacks and DDoS attacks.The proposed packet2vec-AE algorithm has the ability to accurately distinguish between five types of network packets.One possible explanation is that existing unsupervised intrusion detection algorithms only use the original features of the network packet as input.The features obtained by packet2vec-AE and node2vec-AE preprocessing include the original features of the network packet, and also include the local proximity features that characterize the similarity between network packets.Due to the increase of effective information in the preprocessed data, the accuracy of the proposed algorithm is higher than the existing three intrusion detection algorithms.From Table 14 and Figure 7, we can see that the accuracy and detection rate of node2vec-AE is slightly lower than packet2vec-AE.One possible explanation is that the node2vec-AE algorithm does not constrain the local proximity sampling range of random walk.Therefore, the node2vec-AE algorithm has a high probability of random walk to high-order proximity far from the source network packet.Finally, the proximity sampling of the source network packet does not have the ability to accurately characterize the local proximity.However, the accuracy of the node2vec-AE algorithm is higher than that of RBM, PCA, and deep auto-encoder.This shows that combining the original features of the network packet with the local proximity features that describe the similarity between network packets has the ability to achieve better intrusion detection performance.It also proves the effectiveness of our proposed algorithm.
Table 14 contains some supervised algorithms, such as SVM-IDS 2017 [37], J48-IDS 2017 [38], and C4.5-IDS 2016 [39].The performance of the SVM-IDS 2017 [37], J48-IDS 2017 [38], and C4.5-IDS 2016 [39] algorithms is for reference only.The algorithm we propose does not require performance comparisons with these three algorithms.The algorithm we propose is unsupervised, but the three algorithms are supervised.So this comparison is unfair.In intrusion detection, labels are difficult to obtain.The advantage of an unsupervised algorithm is that it can perform intrusion detection on all network traffic without being restricted by labels.
For training and testing time, all of my experiments were able to run in 24 h under the server configuration shown in Table 13.The experiment of obtaining the local proximity features of the network packet using the Packet2vec algorithm can be completed in 16 G memory within 12 h.The experimental time for intrusion detection using the extracted network packet characteristics as input to Deep Auto-encoder is within 1 h.The overall operating time is within an acceptable range.The ISCX 2012 dataset appeared later.Therefore, we could not find enough literature on training, testing time, and memory size, and we were not able to evaluate it [6].

Discussion
In this paper, the packet2vec leaning algorithm is used to obtain the local proximity features that accurately describe the similarity relationship of network packets.Next, the features extracted by the packet2vec leaning algorithm are combined with the original features of the network packet to be used as input of the deep auto-encoder for intrusion detection.This article focuses on the impact of similarity relationships between network packets on intrusion detection performance.Future work will further explore the features of network packets obtained by preprocessing.Network packets will be analyzed from multiple dimensions such as timing [40][41][42] and protocol type [43][44][45].
This paper only uses the network packet in the train set to construct a relational graph on packet similarity G .When extracting local proximity features of the network packet in the test set, it is not necessary to add the network packet in the test set as a node to G .It is only necessary to calculate the similarity between the current network packet and each network packet in G .Then the proposed packet2vec leaning algorithm is used to extract the local proximity features of the current network packet.When this algorithm is used to detect a new network packet, there is no need to update G or re-train.In other words, network packets in the test set do not have to be added to G .Therefore, the algorithm can save time and memory.In addition, our proposed algorithm has generalization effectiveness for network packets outside the train set, increasing the scalability of the proposed model to the number of test set samples.
In this paper, we use penalty terms to limit the range of random walks, which causes random walks to sample only network packets in the k-order proximity of the source network packet.The advantage of the above method is that we can customize the value of k according to the needs.The empirical formula, i.e., the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η, is designed to calculate the approximately optimal penalty value.The maximum value that depth can be taken in this empirical formula is k.In addition, we use (depth * penalty value η) for weight penalty in the empirical formula.In the weight penalty, the closer the source network packet is, the smaller the penalty for the network packet.That is to say, the method has a high probability of preferentially sampling network packets with high similarity to the source network packet.
In addition, the local proximity sample list of the source network packets obtained by random walk has the ability to capture the structural relationship and distance relationship between network packets.The source network packet local proximity sample list can measure the distance between network packet pairs based on the frequency of network packet pairs [10].If the local neighbor sample list of the source network packet is obtained in other ways (for example, all first-order proximity nodes and second-order proximity nodes of the source network packet are directly used to form a local proximity list), the method does not have the ability to obtain structure and distance information between the network packets.Therefore, it is very meaningful to capture the local proximity nodes of the source network packets in a random walk.
In addition, this article focuses on the relationship between network packets and extracts the features of network packets.In the future work, network flows and network packets will be analyzed hierarchically [46][47][48][49].
We analyzed the sensitivity of the proposed algorithm.From Table 14, we can draw the following conclusions.In the best case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE are up to 94.7%, 90.9%, 94.3%, and 92.6%, respectively.In the worst case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE reached 87.4%, 81.1%, 87.3%, and 84.1%, respectively.In the worst case, the proposed algorithm achieves the good performances regarding the accuracy, the precision, and the F 1 exceeding those of the other state-of-the-art algorithms.Therefore, the proposed algorithm is still superior to the latest algorithms available even if the selected parameters are not suitable.From the experimental results of Tables 8 and 9, it can be concluded that fluctuation range of the accuracy, the detection rate, the precision, and the F 1 is not large regardless of how these parameters are changed.In other words, the experimental results are not sensitive to the parameters.Therefore, we have reason to believe that the proposed algorithm has the potential to be extended to other data sets.In future work, we will apply this algorithm to other data sets.
In order to highlight the novelty and contribution of this paper, we compare the similarities and differences between our algorithm and existing algorithms in Tables 15 and 16.Table 15 compares the similarities and differences between the latest three unsupervised IDS algorithms and our proposed algorithms.Most existing studies on unsupervised IDS preprocessing ignore the relationship among packets.As a result, the performance of existing unsupervised IDS is not high.According to homophily hypothesis, the local proximity structure in the similarity relational graph has similar embedding after preprocessing.Our proposed algorithm pre-processing combines the local proximity feature of the network packet with the original features of the network packet as input to the Deep Auto-Encoder.Our proposed algorithm is equivalent to the use of local proximity features of network packets to enhance the original features of network packets.From Table 15, we can conclude that the performance of the existing PCA-IDS 2017 [8] and RBM-IDS 2017 [7] depends on the features of manual experience extraction.Our proposed algorithm uses graph representation learning for automatic preprocessing.Therefore, our proposed algorithm is suitable for raw network traffic data.Table 16 compares the latest graph representation of the similarities and differences between the learning algorithm and our proposed algorithm.Deepwalk [20] and Node2vec [10] are easy to sample into the higher-order proximity range, which make it impossible to accurately describe the low proximity features of the current node; LINE [21] does not have the ability to simultaneously sample first-order proximity and second-order proximity, so the algorithm has limitations; Our proposed algorithm has the ability to extract first-order proximity, second-order proximity, and low-order proximity.This is the first time in the intrusion detection to extract features using the graph representation leaning algorithm.
We have increased the analysis of the effect of random walk length on the running time cost of the algorithm.Intuitively, the length of the random walk is proportional to the runtime overhead of the proposed algorithm.The length of a random walk is the number of times the next node needs to be selected during a random walk.The transition probability needs to be calculated each time the next node is selected during a random walk.
Future research work mainly considers two aspects, including intrusion detection [50,51] with a small number of network packet labels [52][53][54] and intrusion detection for unknown malicious traffic [55].The purpose of intrusion detection with only a small number of network packet labels is to build a security baseline based on the full use of existing small amounts of label data.Intrusion detection of unknown malicious traffic is very important in practical applications.We will continue to research the application of deep neural networks in the IDS field with the hope of further improving the IDS performance.

Algorithms
First-Order Proximity

Conclusions
In this paper, packet2vec learning algorithm is used to preprocess the network packet to obtain local proximity features that describe the similarity relationship between network packets.The local proximity features of the network packets are combined with the original features as the input of the deep auto-encoder for intrusion detection.The experiment proves that our proposed algorithms achieve higher accuracy than three of the state-of-the-art algorithms.In addition, it can be concluded from the experiment that the empirical formula, i.e., the pruning threshold ε ≈ the mean of weights in G − depth * penalty value η, can be used to calculate the approximately optimal penalty value η.The optimal penalty value η is used to constrain the random walk range to extract features that accurately describe the local proximity of the network packet.Finally, these features are used to achieve optimal performance of IDS.In the best case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE are up to 94.7%, 90.9%, 94.3%, and 92.6%, respectively.In the worst case, the accuracy, the detection rate, the precision, and the F 1 of packet2vec-AE reached 87.4%, 81.1%, 87.3% and 84.1%, respectively.In the worst case, the proposed algorithm still achieves higher accuracy than three of the state-of-the-art algorithms.

Figure 1 .
Figure 1.Proximity and comparison based on graph representation.The proximity sampling strategy cannot accurately characterize the local proximity features of the network packet with unlimited proximity.The local proximity sampling strategy has the ability to accurately characterize the local proximity features of the network packet when the sampling range of the source network packet is limited to a low-order proximity.Section 3.4 will introduce the detailed algorithm of graph representation learning for low-order proximity feature extraction.(a) Proximity nodes of source network packet v i .The red node represents the source network packet v i .(b) A proximity sampling strategy is used for random walk to obtain a proximity sample list of the source network packet.The number of the gray arrow indicates the order in which the sampling strategy is used for sampling.(c) A local proximity sampling strategy is used to perform a random walk of the constraint range to obtain a local proximity sample list of the source network packet.

Figure 2 .
Figure2.Flowchart of intrusion detection system.Especially, the packet2vec is added to extract low-order proximity feature of network packets in the step of preprocessing to enhance the performance of IDS.

Figure 3 .
Figure 3. Splitting the originally traffic into packets.The original traffic is split into packets to be truncated into fixed length.

Figure 4 .
Figure 4. Algorithmic flowchart of packet2vec learner based on graph representation with penalty.

Algorithm 4 1 :
Packet2vecWalk: Random walk with penalty for packet2vec learning Input: Relational graph on packets similarity G (V, E, W), Start node v i , Penalty value η, Walk length of proximity sampling l.Output: Local proximity features of source network packet v i obtained by packet2vec walk.Initialize walk to [v i ].2: dijkstra_path_length = Dijstra(G).3:for walk_iter = 1 to l do 4:

Figure 6 .
Figure 6.Packet2vec-AE visualizational results with different penalty values.Gray points are normal network packets; The yellow point is the network packet containing the DDoS attack; The blue point is the network packet containing the Brute Force SSH attack; The purple point is the network packet containing the Http DoS attack; The green point is the network packet containing the Infiltrating attack.(a) Penalty value η = −0.01.(b) Penalty value η = −0.03.(c) Penalty value η = −0.05.(d) Penalty value η = −0.1.

Table 5 .
Comparison of IDS performance of Node2vec and Packet2vec preprocessed under different random walk lengths (%).

Table 6 .
Statistical value of the weight in the G .

Table 7 .
Compare the impact of different penalty values on the performance of Packet2vec-AE (%).

Table 8 .
Compare the impact of optimizer on the performance of Packet2vec-AE (%).

Table 9 .
Compare the effects of random walks of different lengths on the performance of Packet2vec-AE (%).

Table 10 .
Architecture Parameters of auto-encoder in the packet2vec-AE algorithm.

Table 11 .
List of Parameters for Packet2vec in Packet2vec-AE.

Table 12 .
List of Auto-encoder Hyper-parameters and Its Optimizer in Packet2vec-AE.

Table 16 .
Comparison with Other Published Graph Representation Learning Algorithms (%).