Multi-Layered Filtration Framework for Efficient Detection of Network Attacks Using Machine Learning

The advancements and reliance on digital data necessitates dependence on information technology. The growing amount of digital data and their availability over the Internet have given rise to the problem of information security. With the increase in connectivity among devices and networks, maintaining the information security of an asset has now become essential for an organization. Intrusion detection systems (IDS) are widely used in networks for protection against different network attacks. Several machine-learning-based techniques have been used among researchers for the implementation of anomaly-based IDS (AIDS). In the past, the focus primarily remained on the improvement of the accuracy of the system. Efficiency with respect to time is an important aspect of an IDS, which most of the research has thus far somewhat overlooked. For this purpose, we propose a multi-layered filtration framework (MLFF) for feature reduction using a statistical approach. The proposed framework helps reduce the detection time without affecting the accuracy. We use the CIC-IDS2017 dataset for experiments. The proposed framework contains three filters and is connected in sequential order. The accuracy, precision, recall and F1 score are calculated against the selected machine learning models. In addition, the training time and the detection time are also calculated because these parameters are considered important in measuring the performance of a detection system. Generally, decision tree models, random forest methods, and artificial neural networks show better results in the detection of network attacks with minimum detection time.


Introduction
In today's digital world, cybersecurity is becoming an essential need for military and government organizations, as well as for small enterprises and even individuals. Threat prevention is the epitome of digital security, which requires threat detection and threat management capabilities [1]. Security information and event management (SIEM) is being implemented by a large number of organizations and becoming a standardized approach to handle information security issues [2]. Due to the recent rise in cyberattacks and the strict security regulations required by governments, organizations have been investing in the security domain [3]. The core of any SIEM solution is the detection capability of the system. Information security experts have developed multiple network intrusion detection tools and techniques for the detection and prevention of evolving network attacks [4].
A computer network is a set of computers connected with each other for resource sharing. Any unauthorized action on the hardware or software of the systems connected with the network is called a network attack. In other words, a network attack is any action • It provides a multi-layered filtration framework for feature reduction to systematically reduce the number of features using statistical methods. • It provides a mechanism to effectively reduce the detection time without compromising the accuracy of the detection system. • It shows the accuracy, precision, recall, F1 score and detection time against selected machine learning models for CIC-IDS2017.
The rest of the paper is organized as follows. Section 2 covers the related work and provides information about available datasets and the evaluation metrics for the detection system. Section 3 provides the methodology and the proposed framework. Section 4 deals with the results and provides a discussion; these aspects are performed through experiments and comparative analysis. Finally, Section 5 covers the conclusions and recommendations for future work.

Related Work
Research on the detection of network attacks has been conducted using different publicly available datasets. These datasets play a vital role in the validation of the detection approach and are used as benchmarks [7]. The initial work of creating a dataset for an IDS was carried out by DARPA (Defence Advanced Research Project Agency); they generated the KDD98 (Knowledge Discovery and Data Mining) dataset in 1998. This was created by modeling a small US Air Force base network connected to the Internet. It had 41 features that were categorized as normal or abnormal [8]. This dataset plays an important contribution in the research of IDS. However, in [12], the author criticized the accuracy and capability of KDD98 to contemplate realistic environments. Although KDD98 had multiple reported problems, even then, it was being used by the research community [13,14].
Ref. [15] identified numerous issues in the KDD98 dataset, due to which a new dataset NSL-KDD was published in 2009 [16]. The dataset was created by eliminating duplicate records to overcome issues of bias in machine learning models.
Several other IDS datasets have been created. In 2007, a dataset named CAIDA was proposed [17]. This dataset contains network traffic of distributed denial-of-service (DDoS) attacks. This dataset lacks attack diversity. A labeled dataset for flow-based intrusion detection was also proposed in 2009 [18]. The dataset was based on a honeypot deployed over the Internet to maximize exposure to attacks. In 2015, Moustafa and Slay proposed a dataset called UNSW-NB15, which addresses the issue of the unavailability of a network benchmark dataset [19]. The dataset was generated by simulating network attacks and suggesting nine different attack families.
Multiple techniques were combined by Yulianto et al. [28] to improve the performance of IDS using the CICIDS-2017 dataset. The hybrid feature selection method was used by Tama et al. [29] and reduced the number of features to 37 with an accuracy of 96.46%.
Gupta et al. [30] suggested that the class imbalance problem can be tackled with the help of ensemble algorithms. Deep neural network, eXtreme gradient boosting, and random forest algorithms were used in three different stages and achieved an accuracy of 92% for the CICIDS-2017 dataset. Doaa et al. [31] also worked on feature reduction along with ensemble learning techniques. The results show 99% accuracy with 30 features from the CICIDS-2017 dataset.
A context-aware feature extraction method was proposed by Shams et al. [32] for convolutional neural networks (CNN); these authors concluded that CNNs showed better results as compared to an ordinary neural network. Birnur et al. [33] proposed an approach using optimal feature selection and finding multivariate outliers for the improvement of the performance of an IDS. The NSL-KDD dataset was used for experiments.
In [34], the author proposed a hybrid optimization scheme to improve the rate of precision in the detection of an intrusion. Qureshi et al. [35] proposed a transfer learning technique to train deep neural networks. The original and extracted features were combined to improve the performance of an intrusion detection system.
Venkatesan et al. [36] suggested a model for intrusion detection and worked on feature selection using a modeling approach. Similarly, in [37,38], the authors worked on reducing the number of network parameters, resulting in decreased time and cost In [39], dimensionality reduction was carried out using PCA, and subsequently, SVM was employed for the detection of DDoS attacks in SDN. Ref. [40] also worked on SDN and proposed an allocation-based approach using a multi-criteria decision-making (MCDM) strategy for a multi-domain SDN-enabled IoT network.
In [41], the authors proposed a detection framework based on 16 features for DDoS attack detections. However, this method was not designed to cater to imbalanced data. Yi et al. [42] proposed a multi-objective evolutionary convolutional neural network for an IDS. However, these authors' results show that the detection performance was affected by the use of very few neurons and layers.
Although much work has been conducted using the available datasets for intrusion detection systems, the major focus remains on the improvement of the accuracy of the detection. In addition to the accuracy, the detection time is also an important factor, especially in the case of AIDS deployed in an inline mode. Our main focus is to propose a framework for feature reduction to effectively reduce the detection time without affecting the accuracy of the system.

Evaluation Metrics
A confusion matrix is used to evaluate the performance of an intrusion detection system. Table 1 shows a confusion matrix for a binary class classifier. True positives (TP) represent the number of attacks that are correctly predicted as attacks, and true negatives (TN) represent the number of normal events or instances that are correctly predicted as normal behavior. False positives (FP) represent the number of normal instances that are incorrectly predicted as attacks, and false negatives (FN) represent the number of attacks that are incorrectly classified as a normal instance. In a good detection system, TP and TN should remain high, and FP and FN should remain low [43].
The accuracy, precision, recall, and F1-score are the performance metrics used to evaluate intrusion detection systems. They are derived from the information given in the confusion matrix and calculated as per the following formulas [7]: Accuracy measure how accurate the detection system is at detecting normal and attack traffic. It is the percentage of all correctly predicted instances against all instances. Precision is the accuracy of positive predictions. Recall is the measure of the true-positive rate and is also called the detection rate or sensitivity. F1 score is the harmonic mean of precision and recall. In addition to these performance metrics, we also calculated the training time and the detection time and compared these values against different ML and DL models. In the case of an intrusion detection system, the training time is not considered a very important factor as the model is trained only once. However, the detection time is of prime importance with respect to an intrusion detection system. Both the training time and the detection time are calculated using the formulas below. Table 2 shows a list of notations and their explanations.

Methodology
The dataset CICIDS2017 was selected as it has been widely used among the research ca ommunity and is also publically available. The dataset was generated on a real network that contained an attacker network and victim network. On the attacker side, four machines were connected, having Kali and Windows 8.1 operating systems. The victim side was protected with a firewall and contained multiple machines with Windows, Linux, and Macintosh operating systems. Multiple services, including domain controller and domain name system (DNS), were also running on the servers so that an attacker could perform real attacks [21]. Almost 50 GB of captured traffic in PCAP files were provided. Along with PCAP files, 8 CSV files were also provided, along with a set of 84 features and a label. The dataset consisted of a total of 15 classes. One of the classes was related to "benign" or normal traffic, and the other 14 corresponded to different "attack" classes. These attack classes were DDoS, Portscan, Bot, Infiltration, Web Attack Brute Force, Web Attack XSS, Web Attack SqlInjection, FTP-Pataor, SSH-Patator, DoS Slowloris, DoS SlowHttp, DoS Hulk, DoS Goldeneye, and Heartbleed.

Proposed Framework Architecture
The proposed multi-layered filtration framework (MLFF) for attack detection is demonstrated in Figure 1. It consists of multiple phases of preprocessing, three layers of feature reduction, the creation of the final dataset, training models on a training dataset, and validating the results on a test dataset.

Merge
As a first step, the 8 CSV files were merged together in order to create a single file that contained the traffic from all classes. This combined dataset was generated from 8 files that were collected from traffic captured from Monday to Friday. A total of 2,520,798 instances were created. Table 3 shows the complete distribution of data among the classes. After that, the dataset was cleaned from NaN (missing) and infinity values; duplicate values and white spaces were also checked and removed.

Reducing the Imbalance Problem
The results clearly show that the dataset is huge and highly imbalanced. Applying a machine learning model directly may lead to inefficiencies. To overcome the problem of imbalanced data, instances from 4 classes (i.e. Benign, DoS Hulk, DDoS, and Portscan) were removed using the Pandas dataframe.sample method, which returns random samples according to the specified percentage. A total of 31,213 rows were removed because of duplication. After that, a total of 207,908 instances were retained in the dataset. The distribution of these instances among 15 classes is shown in Figure 2.

Filter-1
The first filter was a manual filter in which features were dropped on the basis of domain knowledge. Some features are purely environment-dependent; for example, an IP address can be changed according to the network configurations. Similarly, port numbers can also vary in different scenarios. When the sender wants to communicate with the receiver, it uses a random port number as a source port. Therefore, it is necessary to eliminate such features. If we train our model without removing these features, the model may perform well on test data; however, it will not achieve paramount results in realworld networks.
A total of 83 features and a label are present in the dataset, and of them only 6 features, which were named FlowID, Source IP, Source Port, Destination IP, Destination Port, and Timestamp, were totally dependent on the architecture and the time of the test performed. Therefore, these features were removed from the dataset, and 77 features were left. Table 4 shows the names of the feature dropped in the first layer. The main objective of our study is to minimize the detection time while maintaining accuracy so that the framework can be used in real networks. In this layer, first, we identified the insignificant features, and then, we dropped these features from the dataset. Therefore, in order to identify the insignificant features, a statistical approach of testing the significance using the p-value method was adopted [44]. That is, we wished to test the hypotheses as follows: Null Hypothesis H 0 : Alternate Hypothesis H a : The significance of correlation coefficients can be checked with the help of a t-test [45]. In general, we tested the degree of the deviation of the correlation coefficient from zero.
Here, r is the correlation coefficient calculated from the sample, and n is the sample size. With t and the sample size, we can calculate the p value. If the p value is greater or equal to the significance level, which was set to an alpha equal to 0.025 in our case, we retain the null hypothesis and conclude that the variable is insignificant. If the p value is less than an alpha of 0.025, the null hypothesis is rejected, and we conclude that the variable is significant. Thus, we will not drop it [46].
Because, in our case, the relationship among the features is non-linear, Spearman's rank correlation coefficient method [47] was used to find the value of r. where, The results show that 13 features are insignificant and can be dropped because they do not contribute to the learning of the model. After this test, the number of features was reduced to 64. Table 5 shows the names of the features dropped in the second layer. The main objective of this filter is to test whether independent variables in a model are correlated among themselves. During this test, we found and removed those independent variables that were highly correlated to avoid the problem of overfitting the model. We selected the variation inflation factor (VIF) method for the detection of multicollinearity [48,49] among independent variables and also calculated the tolerance rate.
Here, R 2 is the coefficient of determination, which indicates the amount of proportional change in the dependent variable due to the change in the independent variable.
In this case, we selected variables one by one, calculated the VIF against the included variables in a model, ran tests for multicollinearity, and kept the variables that have VIF less than 5. After completing the test iteration, 38 variables were dropped because of their high collinearity with others.

Dataset after Filtration
After passing the parameters through the filtration framework, a new dataset was created with only 26 selected parameters. The dataset was split into training and test data with a ratio of 70/30. The models were trained with training data and evaluated on test data. The training time and the detection time were also recorded against the selected ML models.

Results and Discussion
This section describes the results obtained after the implementation of a multi-layered feature reduction framework. All the implementation was carried out in Python on Jupyter Notebook (Anaconda3). A Dell OptiPlex 7060 PC with an Intel Core i7-7800 CPU@ 3.40 GHz with 32 GB RAM was used to conduct the experiment.

Selected Parameters
The final selected parameters, variance inflation factor (VIF), and tolerance against each selected feature are shown in Table 6. Among 26 selected parameters, packet length is identified as an important parameter as usually, attack traffic consists of irregular packet length. Flow bytes/s is the packet flow per second. The time between two packets in a flow in a forward and backward direction is also considered an important parameter. Moreover, the number of packets per second in both directions is also selected. Table 6. Names of the selected feature.

S. No
Feature Name VIF Tolerance  There are different types of flags in a packet header. Each flag has a distinct role in communication. Attackers manipulate the value of these flags to launch attack traffic. The SYN flag is used to initiate a TCP connection. ACK is used to recognize the successful delivery of a packet. The URG flag is used to prioritize the packet. The FIN flag specifies the end of a TCP session. ECE is used to send the congestion indication. The value of these flags and the total number of bytes in both the forward and backward directions sent in an initial window are also selected for the model.
The active mean is the mean time for which the flow remains active, and active std is the standard deviation time before the flow becomes idle. Idle Min is the minimum time a flow remains idle before going active. All these parameters contributed to the efficient detection of network attacks.

Comparison of Results
The main focus of our research is to develop an efficient attack detection system that maintains a high detection rate with a minimum detection time. In this paper, we have calculated the accuracy, precision, recall and F1-score along with the training time and detection time against the selected machine learning models. We compared the results with the findings presented by [24]. The results are also compared with the findings of other researchers and are shown in Table 7.   Figure 3 shows the accuracy and detection time of the proposed framework against selected models. As our main focus is on the reduction of detection time, the results show that in the case of the decision tree (DT) model, the detection time is 0.02 s, as compared to 1.12 s in [24]. Additionally, we still manage to maintain an accuracy of 99.27% as compared to 99.49% in [24]. In addition, random forest (RF) also produces 99.42% accuracy as compared to 99.30% in [24]. The detection time of random forest is calculated as 0.78 s as compared to 6.76 s in [24]. The results of the artificial neural network (ANN) show 98.15% accuracy, which is a little less as compared to 99.28% in [24]. However, the detection time has been significantly reduced, from 48.03 s [24] to 0.03 s. The results of a K-nearest neighbor (K-NN) classifier show an accuracy of 97.95%, but the detection time is 91.3 s, which makes it impractical for use in an intrusion detection system. Similarly, a support-vector machine (SVM) architecture has a detection time of 201 s, which is also considerably high. The detection time in the case of logistic regression (LR) is very low, at only 0.01 s, but the accuracy is 81.84%. The naïve Bayes (NB) classifier shows an accuracy of only 36.14% with a detection time of 0.21 s. The results clearly show that in our approach, the random forest classifier gives better accuracy with less detection time, and the decision tree classifier gives almost the same accuracy with significantly less detection time.
As the RF model outperforms all others, the confusion matrix for the random forest model is shown in Figure 4. We can observe that Web Attack Brute-Force and Web Attack XSS have many false positives and false negatives in common. This is due to the high degree of similarity between these attacks. We have also calculated the accuracy over 2, 5, 7, and 10 folds of cross-validation for our top 3 models, i.e., decision tree, random forest, and ANN. The results of the average accuracy against each fold are shown in Figure 5. The results show that the decision tree and random forest models perform better with 10-fold cross-validation, and the ANN performs best with 2-fold cross-validation. However, after the analysis of the results of both holdout and cross-validation, it is clear that holdout validation gives better results. The CIC-IDS2017 dataset used in our experiments is a relatively large dataset with a substantial number of samples. This allowed us to split the dataset into separate training and testing sets, ensuring a sufficient amount of data for model training and evaluation. Additionally, our focus was primarily on reducing detection time without compromising accuracy, and holdout validation provided a straightforward and efficient way to assess these metrics. This allowed us to directly compare our results with existing benchmarks and demonstrate the effectiveness of our proposed multi-layered filtration framework (MLFF).

Ablation Study
Our proposed multi-layered filtration framework consists of three essential filters: filter 1 (F1), filter 2 (F2), and filter 3 (F3). In this subsection, we carry out an ablation study to validate their effectiveness by removing or changing the order of these filters. F1 is a mandatory filter and cannot be removed because it is based on environmental parameters, as discussed earlier. However, F2 and F3 can be reordered or removed one by one. Table 8 shows a comparison of the results obtained from this process. For this ablation study, we only selected the random forest ML algorithm because the results in Table 7 clearly show that RF outperforms all other models with respect to accuracy and time. From Figure 6, we can see that each proposed filter plays an important part in reducing the number of features and improving the performance in terms of accuracy and detection time. Furthermore, it can be clearly observed that the best performance is achieved when all three filters are used altogether in the sequence proposed in the multi-layered filtration framework (MLFF).
Using the proposed framework, we managed to maintain a high detection rate while ensuring the minimum detection time by reducing the selected parameters. However, this approach has its own limitations; for example, in the case of high traffic volume, the system cannot be used at an adequate speed. Additionally, this study only focuses on the network attacks used in the CICIDS2017 dataset

Conclusions
This paper proposed a multi-layered filtration framework (MLFF) for the efficient detection of network attacks. Using this framework, the number of features were systematically reduced without compromising the performance of the intrusion detection system. The main aim of this paper is to minimize the detection time without affecting the accuracy of the system. The proposed framework contains three filters that are connected in such a way that the output of the first filter becomes the input of the second filter and the output of the second filter becomes the input of the third filter. A total of 26 features were selected out of 83 features. The model was trained using only the selected features, and detection was performed on the test data. The accuracy, precision, recall, and F1-score, along with the training time and detection time, were calculated against the selected machine learning models. The results were compared with the available benchmarks and demonstrated a significant improvement after the implementation of the proposed framework.
Random forest (RF) produced 99.42% accuracy, and the detection time was calculated as 0.78 s. The results clearly showed that random forest outperformed all other models. In addition, the decision tree model and artificial neural networks also performed well. The experiment showed that the detection time was reduced significantly without compromising the accuracy of the system. Because we have managed to reduce the detection time, the proposed framework can be deployed in intrusion detection systems running in real networks.
In order to further enhance the accuracy and timing of the proposed multi-layered filtration framework (MLFF) for the efficient detection of network attacks, future research can focus on leveraging advanced datasets, i.e., cicids2018, to optimize the model. Firstly, expanding the scope of the attack spectrum by incorporating additional attacks from multiple layers would be a valuable direction. By including a wider range of attack types and techniques, the MLFF can improve its ability to detect sophisticated and multifaceted attacks.
Secondly, future work should explore the utilization of increased computing resources to boost the performance of the MLFF. More powerful hardware, such as high-performance servers or specialized processing units, can significantly accelerate the training and detection processes. This would result in reduced detection times and enable near-real-time analysis of network traffic data. Additionally, the integration of multiple detection frameworks into a centralized security information event management system will improve the security echo system. Data Availability Statement: The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: IDS