1. Introduction
With the fast growth of Internet of Things (IoT) techniques, the emergence of the Internet has significantly changed people’s lives. The Internet has penetrated almost every aspect of our daily life. This means anyone can easily access any network at a low cost without any complicated procedures. However, with this comes an increasing number of attack tactics by hackers and the increasing complexity of cyber-security issues that people face. The emergence of any vulnerability or threat can affect the entire network [
1]. The diverse categories of cyber attacks can severely impact individuals and risk property damage to society and the country. Cyber-security thus becomes an issue that cannot be ignored. In order to avoid the impact of cyber-attacks as much as possible, effective means of defense are needed. Currently, many methods are used to prevent hacking, such as firewalls, anti-virus software, access control mechanisms, and other technologies. However, most of these technical means are based on passive defense strategies and rely heavily on historical traffic databases, which require more proactive countermeasures in the face of ever-changing network attacks. Network intrusion detection can actively identify and determine the security vulnerabilities in the network and promptly respond to achieve adequate network protection [
2], which is a crucial technique in cybersecurity. Network intrusion detection usually involves feature extraction and network traffic classification. First, features need to be extracted from the dataset to minimize data complexity and construct meaningful information; second, benign and attack traffic is classified using feature information. In this way, network intrusions can be detected and prevented. Therefore, network intrusion detection has become a hot direction of interest for many researchers [
3].
As artificial intelligence technology advances rapidly, machine learning (ML) has achieved outstanding results in many fields [
4]. Furthermore, ML techniques have become prevalent in network intrusion detection to identify network attacks. However, this method is shallow learning, which usually requires manual intervention for feature selection, tends to ignore the correlation between feature data, and does not efficiently address the issue of classifying massive data traffic, leading to low accuracy in identifying network attacks. Later, with the advance of computer hardware, deep learning (DL) was proposed and widely used in image classification and emotion recognition. DL can autonomously learn the feature information of traffic data, which solves the problem of manual selection of traffic features in ML and has an excellent performance in handling large amounts of data. Applying the advantages of DL to the network intrusion detection task can enable better network traffic classification.
Since the relationship between traffic samples collected in adjacent periods is very close, the temporal relationship of network traffic should be considered in the network traffic classification task. Temporal Convolutional Network (TCN) and Bidirectional Gated Recurrent Unit (BiGRU) are DL models for processing time series data, and they have different structures and advantages. However, using a single structure results in models that can only focus on local or global temporal features of network traffic. In addition, TCN and BiGRU do not consider the weights of the features when extracting them. Based on the above analysis, the following are the primary contributions of this paper:
This paper proposes an algorithmic model based on TCN, BiGRU, and attention mechanism (TGA). The temporal features of traffic sequences are extracted using both TCN and BiGRU models simultaneously, which can improve the effectiveness of sequence data modeling by using their respective advantages to some extent. After that, their outputs are fused, and the correlation between different positions in a sequence is captured by adding a self-attention mechanism to enhance the model’s expressiveness.
The presented method was evaluated on the CSE-CIC-IDS2018 dataset at 97.83% accuracy. Compared with existing methods, our proposed model has shown much better results.
The remainder of the paper is shown below: in
Section 2, we introduce the related work on network intrusion detection; in
Section 3, we present a novel network intrusion detection model with TCN, BiGRU, and the self-attention mechanism. Additionally, we describe the dataset and the data pre-processing process; in
Section 4, we present the experimental process and performance results. The experimental process includes the experimental environment, hyperparameter settings, and the analysis of experimental results; lastly, we discuss the conclusions and the directions of future works in
Section 5.
2. Related Work
Network intrusion detection technology is a network security mechanism that monitors network traffic and system activity, then identifies and responds to unauthorized access, malicious behavior, or attacks [
5]. It protects computers from network intrusion attacks and is one of the essential tools for protecting network security. ML approach techniques have found extensive application in network intrusion detection in the last two decades. Tavallaee et al. [
6] evaluated the capabilities of some ML algorithms, which included decision tree (DT), Naive Bayes (NB), and SVM. Hamed et al. [
7] used principal component analysis and LDA to obtain four-dimensional data. During the classification stage, the four-dimensional samples are first fed to a Bayesian classifier, and then the samples predicted to be normal are utilized for secondary detection using CFKNN. Mahfouz et al. [
8] comprehensively evaluated contemporary ML classifiers for detecting network traffic intrusions. They assess the classifiers in terms of various dimensions, such as feature selection, sensitivity to hyperparameter selection, and the challenge of class imbalance in intrusion detection. Zwane et al. [
9] used seven ML classifiers, such as multilayer perceptron, SVM, Bayesian network, RF, AdaBoost, Bootstrap aggregation, and DT, using Weka in their analysis.
Applying ML techniques to network traffic classification tasks can achieve high accuracy. However, these ML methods still have some limitations, such as manually selecting features. In addition, ML methods do not produce satisfactory results when handling high volumes of data. With the fast growth of artificial intelligence technology, DL is prevalent in many fields. DL techniques can autonomously learn feature information of traffic data, which solves the problem of manual selection of traffic features in ML and has an excellent performance in processing large volumes of data. DL techniques are currently receiving significant attention in research on network intrusion detection.
Vinayakumar R et al. [
10] presented a CNN-based network intrusion detection model. This paper evaluates the effectiveness of various DL models, including MLP, CNN, and CNN-RNN. The study uses the KDDCup99 dataset for experimentation, and the results indicate that CNN can automatically extract helpful feature information from the data. However, they did not consider that network traffic has contextual relationships. Yan et al. [
11] applied NSL-KDD and UNSW-NB15 datasets, and four DL models were analyzed for their performance. These include restricted Boltzmann machine (RBM), multilayer perceptron (MLP), sparse autoencoder (SAE), and MLP with feature embedding. However, their experiments were not evaluated on the newer intrusion dataset. Liu et al. [
4] presented a CNN combined with CBAM for a network intrusion detection model, and this paper used the newer dataset: CSE-CIC- IDS2018. Finally, the presented model was compared with DNN and CNN, proving the proposed model’s validity. Usama et al. [
12] introduced a generative adversarial network (GAN), which can be utilized for intrusion detection with solid resistance to adversarial attacks. However, the method could be more challenging to parameterize in most cases and suffers from training instability.
Due to the contextual relationship of network traffic, the temporal relationship should be considered when extracting features. TCN and RNN frameworks are widely used to process time series data. Kong et al. [
13] proposed an integrated deep generative model, which captures time-series dependencies by combining a generative adversarial network built by a BiLSTM and an attention mechanism. M S et al. [
14] presented a network intrusion warning prediction approach based on GRU. This model learns dependencies in a series of security alerts and outputs possible future alerts based on the history of alerts from attack sources. Zhang et al. [
15] introduced a new detection approach built on TCN. The TCN is utilized as a predictor, allowing for the mapping of series-to-series and the projections of a user’s future behavior based on their current actions. This paper presents a network intrusion detection algorithm combining TCN, BiGRU, and self-attention mechanism.
4. Experimental Method
We present a traffic classification model using TCN, BiGRU, and self-attention mechanisms to boost the network intrusion detection accuracy. To validate the effectiveness of our proposed model, we outline the experimental procedure, we followed for network intrusion detection using the CSE-CIC-IDS2018 dataset, as well as the results obtained from the experiments conducted in this section.
4.1. Experimental Environment
The experiments were conducted with Python version 3.8 on an Ubuntu 20.04 operating system. The hardware specification is a single 3080 GPU with 12 GB of memory. Tensorflow version 2.9.0 and Keras version 2.9.0 were used for all experiments. Furthermore, the other major model parameters are given below: the batch size is 512, the epoch number is 30, and Adam is selected as the optimizer for parameter learning. Our model uses a cross-entropy loss function. Minimizing the cross-entropy loss enables the model to predict results closer to the actual labels.
4.2. Evaluation Indicator
To evaluate the effectiveness of our proposed model, we visualize the detection outcomes by utilizing a confusion matrix, which is given in
Table 3. TP refers to the count of correctly classified benign samples, while FP represents the number of attack samples incorrectly classified as benign. TN denotes the number of correctly classified attack samples, and FN represents the count of benign samples incorrectly classified as an attack. Furthermore, we also employed four evaluation metrics: accuracy, precision, recall, and F1-score [
31].
4.3. Effect of Hyperparameters
4.3.1. Learning Rate
In our experiments, the learning rate is a critical hyperparameter, which controls the number of updates to the model parameters in each iteration. A suitable learning rate can accelerate the model training process and improve the model’s performance. To find a suitable learning rate, multiple sets of controlled trials are usually required for hyperparameter tuning.
We compared the performance of various learning rates on the traffic classification results. We set the learning rate to four values: 0.1, 0.01, 0.001, and 0.0001. As shown in
Figure 6, the model’s performance is improving as the learning rate decreases. When the learning rate was set to 0.001, precision, recall, and F1-score were the highest; 97.85%, 97.83%, and 97.57%, respectively, reached the best performance. Which were higher than the learning rate was 0.1, increased by 0.37%, 0.57%, and 0.7%, respectively. When the learning rate was 0.01, the values of precision, recall, and F1-score were 97.58%,97.44%, and 97.08%, respectively. When it decreases to 0.0001, the model’s performance decreases, and the four evaluation metrics are 97.66%, 97.67%, and 97.39%, respectively. Therefore, the learning rate is set to 0.001.
4.3.2. Nb_Stacks
The Nb_stacks parameter specifies the number of stacks in TCN.
Figure 7 displays the capabilities of the proposed model in various Nb_stacks. The figure indicates that the model outperformed when Nb_stacks is 2, and there is a slight improvement in precision, recall, and F1-score compared with the rest two groups. Therefore, we choose Nb_stacks equal to 2 for our experiments.
4.4. Result and Analysis
This section compares the proposed algorithm with the single algorithms TCN, BiGRU, and TCN-BiGRU. After careful analysis, we find that using a single TCN is capable of extracting the local temporal features, but it ignores the global temporal features. By combining TCN with BiGRU, we are able to extract both local and global temporal features of the traffic, improving the model’s performance. Finally, by adding self-attention, the model can focus on more essential feature data, and the model’s performance is further improved.
As shown in
Figure 8, when combining TCN and BiGRU, the performance of precision, recall, and F1-score are 97.71%, 97.68%, and 97.40%, respectively, which are higher than those of TCN and BiGRU alone. Adding self-attention improves the value of four evaluation metrics by 0.14%, 0.14%, 0.14%, and 0.15%, respectively. The experiments demonstrate that our proposed algorithm model combining TCN, BiGRU, and self-attention can efficiently enhance the traffic performance of multi-classification.
To better demonstrate the capabilities of the presented algorithm in network intrusion detection, We compare the performance of the TGA model with that of recently developed algorithms, for example, LSTM + AM [
32], DNN [
33], CNN [
34], DAE + DNN [
35], CNN + CBAM [
4], and ID-RDRL [
20], all of them using various DL approaches. Experiments indicate that our proposed algorithm is superior to other baseline models regarding accuracy, precision, recall, and F1-score.
As illustrated in
Table 4, the value of four evaluation metrics for LSTM + AM [
32] are 96.2%, 96%, 96%, and 93%, respectively. Moreover, our method improves these four metrics by improving 1.63%, 1.85%, 1.83%, and 4.57%. In addition, the accuracy value of our proposed algorithm improves by 0.55% over DNN [
33]. ID-RDRL [
20] only published the accuracy and F1-score, which are 96.2% and 94.9%. The TGA model’s accuracy increased by 1.63%, and the F1-score increased by 2.67%. The above analysis shows that the proposed approach has yielded better results in all four evaluation metrics than in other papers, which validates the proposed model’s effectiveness in intrusion detection tasks.
Figure 9 shows the confusion matrix, and
Table 5 illustrates the classification report of our presented model. Combining
Table 5 and
Figure 9, the detection capacity of the presented model in the multi-category case can be further determined. Among the samples in the benign category, precision of 98.73%, recall and F1-score values over 99%. The precision, recall, and F1-score values for DDOS and Bot attack categories exceeded 99%, indicating that the model can detect these two attack types very well. The detection results for the Dos and the Bruteforce attack category samples show that some samples are difficult to distinguish, which may be due to the similar attack principles of these two types of attack samples. Besides, the Web-attack samples are a minority of the total samples. The proportion of data samples in the training process is tiny, which causes the model to be biased toward the majority of samples, also known as a benign sample. Therefore, the classification of infilteration is not effective. An imbalance problem in the dataset can cause infilteration to misclassify into benign samples. This indicates that sample imbalance is still the primary problem faced in network intrusion detection, and we will focus on treating sample imbalance in future research. The problem of data imbalance can be solved in future research using oversampling methods such as random oversampling or SMOTE methods.
5. Conclusions
In this work, we present a model that combines TCN, BiGRU, and a self-attention mechanism to boost the performance of traffic classification. Our model utilizes TCN and BiGRU to extract local and global temporal features of network traffic simultaneously. First of all, these two features are integrated in order to broaden the feature information. Second, our model correspondingly assigns different weight parameters to the features through a self-attention mechanism, whereby the most prominent features are preserved to the utmost and then fed into the fully connected layer. Finally, the softmax layer is a classifier to identify the category to which the traffic belongs. The results of the experiments indicate that our presented model achieves relatively good effects for multi-classification on the dataset. The value of the accuracy reached 97.83%, exceeding that of using individual TCN and BiGRU methods. However, it also has some limitations; the proportion of benign samples to malicious samples is imbalanced in the dataset, leading to the less effective classification of small samples. For future research, we will pay attention to treating sample imbalance, for example, introducing the random oversampling or SMOTE method to achieve the effect of a balanced dataset. Furthermore, we regard a more lightweight intrusion detection system as needing to be applied in the real network environment.