A Consolidated Decision Tree-Based Intrusion Detection System for Binary and Multiclass Imbalanced Datasets

The widespread acceptance and increase of the Internet and mobile technologies have revolutionized our existence. On the other hand, the world is witnessing and suffering due to technologically aided crime methods. These threats, including but not limited to hacking and intrusions and are the main concern for security experts. Nevertheless, the challenges facing effective intrusion detection methods continue closely associated with the researcher’s interests. This paper’s main contribution is to present a host-based intrusion detection system using a C4.5-based detector on top of the popular Consolidated Tree Construction (CTC) algorithm, which works efficiently in the presence of class-imbalanced data. An improved version of the random sampling mechanism called Supervised Relative Random Sampling (SRRS) has been proposed to generate a balanced sample from a high-class imbalanced dataset at the detector’s pre-processing stage. Moreover, an improved multi-class feature selection mechanism has been designed and developed as a filter component to generate the IDS datasets’ ideal outstanding features for efficient intrusion detection. The proposed IDS has been validated with state-of-the-art intrusion detection systems. The results show an accuracy of 99.96% and 99.95%, considering the NSL-KDD dataset and the CICIDS2017 dataset using 34 features.


Introduction
Due to the extensive proliferation of network and communication devices in datacentric environments, security experts' managing security becomes an utmost challenge. The challenge is the evolvement of newfangled network threats that sneak into the computing environments to compromise the security policies, privacy, and even locking down the system indefinitely. Intrusion Detection System (IDS) plays a crucial role in countering incoming network threats before it starts its harmful behavior. Intrusion detection consists of identifying the malevolent activities in a host, which eventually propagate to the other hosts over the network. The harmful behavior of these activities is visible once it starts affecting the target hosts. An efficient IDS acts as a second line of defense and comes into action when a firewall fails to detect a threat. The objective of IDS is to analyze, detect model is essential. A new multi-objective optimization approach [16] plays a crucial role in efficient intrusion detection. The bagging and boosting approach of multiple detection models on the top of features selected through Naïve Bayes (NB) detects intrusions with a detection rate of 92.7%. Similarly, an unsupervised machine learning-based IDS [17] categorizes network traffic into standard and suspicious profiles without prior knowledge about the attack events. The unsupervised approach is adaptive and a distributed structure for intrusion detection. The distributed structure of intrusion detection is appealing as compared to the centralized model of intrusion detection.
Apart from NB and unsupervised learning, the decision tree is also significantly used for designing IDS. A Snort based intrusion detection approach [18] and decision tree have been designed for high-speed networks. The Snort detection model trained and tested three features of the ISCXIDS2012 dataset that reveal a detection accuracy of 99%. A C4.5 decision tree and Multilayer Perceptron (MCP) combined to form a hybrid detection model [19], which demonstrated 99.50% accuracy with a lower false alarm rate of 0.03%. This performance is associated with the discernibility function-based feature selection that the author employed during the preprocessing stage. The high-speed big data networks also influenced the researchers to design parallel machine learning-based intrusion detection systems. A cutting-edge machine learning-based technique known as XGBoost specifically designed for big data acts as an IDS [20] in a parallel computing environment. The XGBoost IDS achieves a detection rate of 99.60% and an accuracy rate of 99.65%, with a low false alarm rate of 0.302%. However, the system should be validated on other datasets to understand the true capability of the XGBoost based IDS.
Several other binary intrusion detection models have been proposed. A Bayesian network-based IDS using a flow-based validation to detect network worms and brute force attacks is proposed by [21]. The authors of [22] present a multilayer feedforward Neural Network in collaboration with the decision tree to detect P2P Botnets. A bigram technique on the top of Recursive Feature Addition (RFA) feature selection to detect stealthy and low profile attacks is presented [23].

Multiclass Detection Engines
A multi-class intrusion detection model provides detailed attack information as compared to binary IDS. Similar to a binary IDS, a multi-class IDS identifies an instance either as an attack or benign. Numerous authors proposed multiple variations of multi-class IDS. A multi-class IDS has been proposed using an ensemble of Support Vector Machine (SVM) [24] to detect four categories of attacks such as R2L, U2R, DoS, and Probe. The SVM ensemble IDS shows a detection rate of 93.40% on the NSL-KDD dataset. Though this multi-class detection model reveals an impressive detection rate, at the same time, it suffers from a substantial false alarm rate of 14%. SVM is also hybridized with Genetic Algorithm (GA) [25] and Multiple Criteria Linear Programming (MCLP) [26] for intrusion detection, where both GA and MCLP extracted suitable features from CICIDS2017 and NSL-KDD intrusion dataset respectively. The CICIDS2017 and NSL-KDD datasets are highly imbalanced, where the CICIDS2017 dataset contains a huge instance set representing up-to-date attack features. Therefore, an appropriate sampling technique should have been deployed to generate a suitable balanced sample, which is not clear in [25]. Similarly, an updated version of SVM called Ramp Loss K-Support Vector Classification-Regression (Ramp-KSVCR) [27] has been proposed as an intrusion detector, which proved to be robust and intelligently takes care of imbalanced and skewed attack distributions, where the Ramp Loss function handles the noise present in the intrusion dataset. The Ramp-KSVCR detection model is silent about any feature selection mechanisms. Adopting a feature selection mechanism may be beneficial in improving the detection rate and accuracy further. Another variation of SVM called Least Square Support Vector Machine (LSSVM) [28] acts as an SDE where LSSVM reveals the accuracy of 99.94% on the features selected through a mutual information-based feature selection mechanism. The NB classifier also plays an imperative role in intrusion detection. NB-based IDS has been proposed to tackle HTTP attacks [29], where NB acts as both feature selector and intrusion detector. The NB detection model successfully achieved a 99.38% detection rate, 1% false-positive rate, and 0.25% false-negative rate on the NSL-KDD dataset.
Similar to supervised learning, unsupervised learning principles have been used extensively to design cutting-edge IDSs. Growing Hierarchical Self-Organizing Maps (GHSOMs), as an unsupervised intrusion detection scheme [30], employs a multi-objective approach for extracting suitable features. The detector makes it possible to differentiate between normal and anomalous traffic and different anomalies. The GHSOMs approach on multi-objective feature selection shows detection rates up to 99.8% and 99.6% with normal and anomalous traffic and accuracy values up to 99.12%. Furthermore, an IDS approach is proposed [31] using a modified version of Optimum-Path Forest (OPF) and K-means unsupervised learning. The K-means algorithm is used for producing different homogeneous training subsets from original heterogeneous training samples. The pruning module of MOPF uses centrality and the social network analysis's prestige concepts for finding attack instances. The experiment is conducted on the NSL-KDD dataset, and the forestalling results reveal that the method shows superior results in terms of detection and false alarm rate.
Supervised and unsupervised techniques are also combined to design intrusion detection engines. For instance, a Non-symmetric Deep AutoEncoder (NDAE) and Random Forest classifiers [32] have been used on the top of NDAE based unsupervised feature learning. The stacked classifiers have been implemented in the Graphics Processing Unit (GPU) -enabled TensorFlow and evaluated using the benchmark KDD Cup '99 and NSL-KDD datasets. The proposed architecture [32] of NDAE has demonstrated high accuracy, precision, and recall and reduced training time. Though the approach appears to be stable and accurate, the authors acknowledged that it is not perfect, and there is further room for improvement.

Materials and Methods
The proposed approach includes three broad logical modules: preprocessing, feature ranking and selection, and decision making. The issue of class imbalance has been reduced in three stages in all the modules [33]. Figure 1 presents the proposed framework block diagram.
Data preprocessing starts with first removing duplicate and missing value instances of the dataset on which the system will be trained. Once the duplicate and missing values are removed, the related attack labels are merged with new class labels. By forming the new attack labels, it reduces the class imbalance issue significantly. A supervised sampling approach has been proposed to generate class-wise samples. Therefore, the class imbalance issue of the IDS datasets has been improved. A suitable normalizer has been applied to fix the dataset values in the range of 0 and 1.
In the feature selection phase, a suitable feature selector is deployed to retrieve the essential features by eliminating redundant features of the dataset. In the final stage, an intelligent C4.5 classifier is deployed, which resumes the training samples using CTC. The detailed procedure from dataset selection to intrusion detection is described as follows.

IDS Datasets
The preparation of data is critical for the training and testing of the IDS model. The candidate datasets NSLKDD [34], ISCXIDS2012 [35], CICIDS2017 [36] provided by the Canadian Institute for Cybersecurity are the basis of the proposed IDS. On the one hand, the NSLKDD and CICIDS2017 datasets are multiclass and contain benign and multi-attack instances. On the other hand, the ISCXIDS2012 is a binary IDS dataset containing a mixture of benign and attack instances. These datasets' features contain normal and the most recent frequent attacks resembling the real-world network environment. These datasets contain a considerable number of instances and feature sets, which is sufficient enough to be a bottleneck for any IDS. Therefore, these datasets can be considered reliable candidates for evaluating the proposed IDS architecture's actual performance.
The system has been designed to select a required number of features with a reasonably small number of samples from these datasets for training and testing purposes. Before sampling and feature selection, the duplicate instances have been removed using Weka's unsupervised RemoveDuplicates filter, and the unique instances are considered for feature selection and sampling. Furthermore, biases of the detector towards majority classes happen if the dataset is a high-class imbalance in nature. A reliable IDS detector must be prepared for such an adverse situation. The three datasets considered here are prone to high-class imbalance.
The prevalence ratio of normal labels and attack labels is 51.882% and 48.118%, respectively, for the NSLKDD dataset. Though the prevalence ratio seems to be convincing by just keeping normal instances on one side and attacking instances on the other side but

IDS Datasets
The preparation of data is critical for the training and testing of the IDS model. The candidate datasets NSLKDD [34], ISCXIDS2012 [35], CICIDS2017 [36] provided by the Canadian Institute for Cybersecurity are the basis of the proposed IDS. On the one hand, the NSLKDD and CICIDS2017 datasets are multiclass and contain benign and multi-attack instances. On the other hand, the ISCXIDS2012 is a binary IDS dataset containing a mixture of benign and attack instances. These datasets' features contain normal and the most recent frequent attacks resembling the real-world network environment. These datasets contain a considerable number of instances and feature sets, which is sufficient enough to be a bottleneck for any IDS. Therefore, these datasets can be considered reliable candidates for evaluating the proposed IDS architecture's actual performance.
The system has been designed to select a required number of features with a reasonably small number of samples from these datasets for training and testing purposes. Before sampling and feature selection, the duplicate instances have been removed using Weka's unsupervised RemoveDuplicates filter, and the unique instances are considered for feature selection and sampling. Furthermore, biases of the detector towards majority classes happen if the dataset is a high-class imbalance in nature. A reliable IDS detector must be prepared for such an adverse situation. The three datasets considered here are prone to high-class imbalance.
The prevalence ratio of normal labels and attack labels is 51.882% and 48.118%, respectively, for the NSLKDD dataset. Though the prevalence ratio seems to be convincing by just keeping normal instances on one side and attacking instances on the other side but observing the individual attack labels, the ratio seems discouraging. There is a considerable gap between majority class labels (Normal) and minority class labels (Spy, udpstorm, worm, SQL attack). This prevalence gap of attack labels makes the dataset imbalanced. By combining a few attack labels through forming a new label is possible to solve the imbalance issue.
In the ISCXIDS2012 dataset, data of normal and malicious instances are scattered in seven different XML files. The data from those XML files are merged into a single CSV file for analyzing the characteristics of the whole dataset. An XML file named "TestbedThuJun17-1Flows.xml" was found to be corrupted at the source during the extraction process. Therefore, it has been decided to drop that file from the analysis. The rest of the data files of the ISCXIDS2012 dataset are so large that the idea of excluding the file "TestbedThuJun17-1Flows.xml" had a negligible contribution to the entire set of data and hence will not affect the detection process. The ISCXIDS2012 is a high-class imbalanced dataset. The majority class (Normal) has a 96.98% prevalence rate. By considering this, the dataset directly may bias the detection model towards the majority class. Therefore, an efficient sampling technique is needed that can generate a balanced sample from this unbalanced dataset.
Finally, the most recent dataset, named CICIDS2017, is considered. The dataset contains a mixture of the most up-to-date attacks and normal data. The dataset claims to fulfill all the 11 criteria of an IDS described by Gharib et al. [37]. By analyzing these IDS dataset design criteria, CICIDS2017 appears to be the most prominent dataset in evaluating the proposed IDS. Physically inspecting the dataset, it has been found that the dataset contains 3,119,345 records. Out of which, 288,602 instances have missing class labels, and 203 instances have missing values. Therefore, it has been decided to remove these outliers before conducting any further experiments. After removing 203 missing values and 288,602 missing class labels, a dataset is reduced to 2,830,540 distinct records. Furthermore, it is found that the dataset contains 15 attack labels and 83 features. It is also observed that there is a considerable class imbalance between the majority class and other classes. In this situation, if a detection model is created considering this CICIDS2017 dataset directly, then a false alarm might be generated for any incoming instance of attack class Heartbleed or Infiltration. Therefore, the dataset must be sampled in a balanced manner before training the IDS detector.
All the datasets NSLKDD, ISCXIDS2012, and CICIDS2017 are highly class imbalanced. Therefore, the challenge is to design a sampling model and detector, which can work efficiently on these imbalanced datasets.

Attack Relabeling
The class imbalance problem is widely cited in [11,38,39], and its countermeasures have been addressed elaborately in [40]. The problem of class imbalance lies more with the multiclass intrusion datasets. Numerous attack labels are found in a multiclass intrusion dataset that needs to be relabeled by merging two or more similar kinds of attacks either in terms of similar characteristics, features, or behaviors. Therefore, the NSLKDD and CICIDS2017 multiclass intrusion datasets have been considered to merge the respective minor class labels to form the new class information.
The NSLKDD dataset contains 39 types of attack and benign instances. The normal labels have more than 51% occurrence, whereas many attacks have a very low prevalence rate of 0.001%. Various similar attack labels of the NSLKDD dataset have been merged to generate new attack labels to reduce such imbalances. The selection of new attack labels has been considered per the guideline provided in [41,42]. The newly formed attack labels are presented as follows.

•
Denial of Service Attack (DoS): It is an attack in which the attacker makes some computing or memory resource too busy or too full to handle legitimate requests or denies legitimate users access to a machine. The NSLKDD dataset's various attacks that fall Once the new attack labels are identified, the old labels are mapped to form new attack labels. The characteristics of new attack labels in the NSLKDD dataset with their prevalence rate are presented in Table 1. The imbalance ratio of newly created attack labels has been improved significantly as compared to the old attack labels. The prevalence rate of majority to minority class becomes 51.88:0.17, which is far better than earlier 51.88:0.001. Moreover, comparing the majority benign label (Normal) with other attack labels, it can be realized that the imbalance ratio has also been improved to a great extent.
Multiclass dataset CICIDS2017 has 15 different types of attack information. The normal label (Benign) has more than 83% occurrence, whereas many attacks have a very low prevalence rate of 0.00039%. To reduce such imbalances, various similar attack labels of this dataset have to be merged to generate new attack labels. The selection of new attack labels has been considered as per the guideline provided by the publisher of the CICIDS2017 dataset. The newly formed attack labels with their characteristics are presented in Table 2.
The imbalance ratio of newly created attack labels has been improved significantly compared to the old attack labels of the CICIDS2017 dataset. The majority's prevalence rate to minority class becomes 83.34%:0.001%, which is far better than earlier 83.34%: 0.00039%. Moreover, comparing the majority label (Normal) with other attack labels, it can be realized that the imbalance ratio has also been improved to a great extent.

Supervised Relative Random Sampling (SRRS)
The random sampling procedure is either a probability sampling or nonprobability in nature. In probability sampling, the probability of an object being included in the sample is defined by the researcher. On the other hand, there is no tactic of estimating the probability of an item being included in the sample in nonprobability sampling. Suppose the interest is to infer that a sample is in line with the original data's finding. In that case, probability sampling is the better approach to consider. Random sampling is popularly known as a probability sampling mechanism [43].
Random sampling ensures each item of the original item set stands a chance to be selected in the sample. The n samples are selected tuple-by-tuple from an original dataset of size N through random numbers between 1 and N. By signifying the dataset having N tuples as F in -the focusing input and the desired samples as F out -focusing output, and the random sampling procedure has been represented in Algorithm 1. In this algorithm, the sampling is done with replacement, i.e., each tuple has the same chance at each draw regardless of whether it has already been sampled or not. However, this kind of simple random sampling is purely unsupervised. In the case of a high-class imbalanced dataset, it does not guarantee a specific class label tuple will fall in the sample set. By observing the datasets considered here, especially the CICIDS2017 dataset, it is evident that the minority class contains only 36 tuples, whereas the primary class contains a vast volume of 2,359,087 tuples. In such a scenario, merely drawing a random sample will not help retrieve a balanced sample consisting of instances of all the class labels. Therefore, a specialized sampling mechanism needs to be developed, which should guarantee all class labels' equal chances to participate in the sample space.
Keeping in view this requirement a supervised sampling technique has been designed that generates random samples for each class label of the dataset. Each instance of each class label has an equal priority and probability of participating in the sample space. The proposed sampling algorithm generates a sample of each class by assigning weight to each class label based on the frequency it holds. The number of random samples of a class label is generated according to the allocated weight at each iteration. The iteration continues until the desired samples of the specified size are generated. The allocated weight is relative and depends upon the frequency of the class label in the current sample set. The more the frequency, the less the weight allocated. This strategy has been imposed deliberately to give more weight to the class, having low frequency. The detailed step of the SRRS has been presented in Algorithm 2.
The main logic behind sample generation is generating class-wise random samples. The class-wise random sample is possible through where, W C [P] = desired sample weight for class number p, stepS c = stepwise total instances for all classes. Once the desired weight is on hand, the random sampling algorithm (Algorithm 1) is called to get the required sample from each attack class instance. It should be noted that the sampling generation holds the principle k ≤ |F out |. The proposed Supervised Relative Random Sampling (SRRS) has been validated using NSLKDD, ISCXIDS2012, and CICIDS2017 datasets through- The margin of sampling error.
Class imbalance of a class is measured as the ratio of the number of instances of a class with the total number of instances of the dataset. On the other hand, the margin of sampling error is calculated through the Yamene formula as where, n = required sample size, N = total number of instances in a dataset, e = Margin of error. Simplifying the formulae, the margin of error e is The output of the SRRS algorithm is presented in Tables 3-5.

Algorithm 2 Supervised Relative Random Sampling (SRRS)
Mathematics 2021, 9, x FOR PEER REVIEW 2 of 5  The SRRS algorithm performs consistently for all three datasets for varying sampling thresholds. The sampling thresholds considered here are 20,000, 60,000, and 100,000. In the case of the NSLKDD dataset for these sampling thresholds, SRRS generates 19,080, 56,032, and 87,312, respectively. This sample set leads to a very low sampling error of 0.007, 0.003, and 0.002, respectively. A similar kind of performance outcome is found for the ISCXIDS2012 and CICIDS2017 datasets.

Input:
Furthermore, considering class prevalence, it is found that the SRRS maintains a consistent prevalence ratio for all the attack labels. The improvement of prevalence (%) for all three datasets are summarized in Table 6.

Feature Ranking and Selection using IIFS-MC
The principle of feature selection falls into three types [44]. i.e., wrapper based, embedded and filter based. In wrapper-based feature selection, classifiers are used to generate feature subsets. Similarly, in embedded methods where feature selection is an inbuilt approach within the classifier, and the filter methods where properties of instances are analyzed to rank features followed by a feature subset selection. In the ranking phase, the reputation of each feature is evaluated through weight allocation [45]. Moreover, in the subset selection phase, only those ranked features are selected for which a classifier shows the highest accuracy [46][47][48][49][50][51]. However, the features can also be chosen, ignoring ranks [52]. In most cases, the subset selection procedure is supervised in nature.
There are several variations of filter-based feature selection mechanisms found in the literature. These feature selection mechanisms have their outcomes and limitations. The IFS is one of the recent unsupervised filter-based feature selection schemes that proved to be a magnificent feature selector over traditional popular schemes such as Fisher score [52], Relief [53], Mutual information (MI) [49,54], and Laplacian Score (LS) [55]. As a filter-based algorithm, the feature selection process in IFS [12] takes place in two steps. First, each feature of the underlying dataset is ranked in an unsupervised manner, and then the best m ranked features are selected through a cross-validation strategy. The distinguishing characteristic of IFS over other peer FS schemes is that all the features participate in estimating each feature's weight. The idea is to construct an affinity graph from the feature set where the subset of features is realized as a path connecting them. The detailed steps of the IFS have been outlined in Algorithm 3.

Begin
Step 1: Feature score initialization Step 2: Building the graph Step 3: Feature score matrix Step 4: Calculate feature weight . . , f c }, x represents the random set of samples of the instance set R, i.e., x ∈ R (where |x| = t). Now the target is to construct a fully connected graph G = (V, E) so that V represents the set of vertices representing each feature of sample x. The graph G is nothing but an adjacency matrix A, where E represents the weighted edges through pairwise relation of the feature distribution. In other words, each element aij of matrix A(1 ≤ i, j ≤ t), represents a pairwise energy term. Therefore, the element a ij can be represented as a weighted linear combination of two features f i and f j is where, where σ i and σ j are the standard deviation of f i and f j , respectively.
Once the matrix A has been determined, the score of each feature can be estimated as: where ρ(A) denotes spectral radius and can be calculated as Here, The authors found that there is considerable scope for improvement in the IFS algorithm. Equation (4) is the IFS algorithm's heart, where the correlation matrix C ij has been generated in an unsupervised manner. It should be noted that the correlation between the features of intraclass instances is close to each other. Similarly, the correlation between the features of inter-class distances hugely deviates. Therefore, analyzing features using a correlation matrix for each class will provide better insight than the overall correlation matrix of all the instances. Algorithm 3 can be used for each class of the sample and a weighted matrix should be prepared to contain weights of features of all the classes, where the total number of rows represents the number of classes and the columns represent the number of features, respectively. As a final step, the real weight of features can be realized by calculating each column of the weight matrix's average. The improved version of IFS has been named IIFS-MC has been represented in Algorithm 4. The idea behind IIFS-MC is to calculate the weight of features based on the class information of instances. The class-wise feature weights improve classification accuracy to an impressive level.
As the class-wise weights of features have been calculated, therefore the complexity of this algorithm would be T is the number of samples, n is the number of initial features, and C is the number of classes.
The proposed IIFS-MC analysis has been conducted similar to the guideline provided in [12], where the mechanisms have been analyzed through a variety of datasets. Unfortunately, the analysis [12] missed the standard intrusion detection datasets such as NSLKDD or CICIDS2017. Therefore, it has been decided to analyze the FS mechanisms through the most widely used NSLKDD, ISCXIDS2012, and CICIDS2017 datasets. In this regard, 5000 random samples of the NSLKDD dataset have been generated using the proposed Supervised Relative Random Sampling (SRRS) consisting of a mixture of normal and intrusion instances.
Furthermore, six popular supervised classifiers such as SVM, NB, Neural Network, Logistic Regression, C4.5, and Random Forest has been analyzed to judge the performance of the FS mechanisms discussed in this chapter along with the improved version of the infinite multiclass feature selection scheme. The classification accuracy of these supervised classifiers has been observed considering the varying size of features. Table 7 reflects the performance of SVM on varying feature size. It can be seen that the accuracy of SVM improves with a change in feature size.

Begin
Step 1: Feature score initialization Step 2: Retrieve unique classes Step 3: Calculate the feature weight matrix for all the classes.
for i := 1 to c do Step 3.1: Classwise feature graph generation  Using five features of NSLKDD, the SVM method shows the highest accuracy of 88.237% when the features are selected using IIFS-MC. Nevertheless, with the increase in feature size, the IFS magnificently improves the classifier's accuracy, leading to an accuracy of 92.844%. However, IIFS-MC consistently shows significantly better accuracy for varying feature subsets among all other feature selection schemes.
A similar outcome has been observed for IIFS-MC when the classification has been conducted with NB. The adequate class information and class-wise feature weight calculation enable IIFS-MC to boost the accuracy of NB (Table 8). For Neural Network classification (Table 9), IIFS-MC again performs better as compared to other FS schemes. The IIFS-MC shows a distinct improvement over LSFS for almost all feature sizes. On the other hand, IIFS-MC shows distinctive accuracy only between 10-20 features. However, for all other feature sizes, both IFS and IIFS-MC produce a similar amount of accuracy. The logistic regression results FS schemes results are presented in Table 10. Though the IIFS-MC scheme shows better accuracy as compared to other peer schemes, at the same time, IFS shows equivalent classification accuracy along with IIFS-MC. Similarly, Logistic Regression suffers from the original five features through MIFS. However, the situation becomes comfortable with an increase in feature size. Slowly, MIFS shows Logistic Regression's performance at par with other FS schemes with 30 features in hand. The accuracy output of Logistic Regression for all the feature selectors has been presented in Table 10. Similarly, with all the ranked features in hand, the IFS, RelieF, and IIFS-MC show improved accuracy than that of the Fisher, MIFS, and LSFS schemes.
All the feature selection schemes show a close accuracy rate for Naïve Bayes and Function-based classifiers. However, the decision tree shows a distinct result and outperforms the other classifiers (Table 11). According to Table 11, it is evident that IIFS-MC shows better accuracy for a little number of feature segments. However, with the increase in several features, the accuracy of C4.5 becomes close for all the feature selectors.
The Random Forest also reveals a similar accuracy rate for all the FS schemes except the Fisher score method. Random Forest's accuracy improves with the Feature score, which was not visible earlier in the case of other decision trees (Table 12). Furthermore, up to the 20th feature, there was a close accuracy observed between IFS and IIFS-MC approaches. After the 20th feature to 30th, Random Forest's accuracy deviates to a better position due to IIFS-MC. However, all the feature selectors show equivalent results while attaining the 37th feature of the NSLKDD dataset.
While analyzing the accuracy of supervised classifiers with various feature selection schemes, the following broad inferences have been observed.
(i) The improvised version of the IFS scheme ranks the features better to boost supervised classifiers' accuracy to the maximum extent possible. (ii) Moreover, it is observed that from the 20th feature onwards, the supervised classifiers show a similar accuracy as it is achieved with the whole set of features. Therefore, 20 features of the NSLKDD dataset are viable to achieve a similar accuracy level to the original feature set.
In this way, it has been observed that 20 ranked features of the NSLKDD dataset provide optimum detection results for a variety of supervised classifiers. Therefore out of all the ranked features of NSLKDD, the top 20 features are considered as feature subsets. All the ranked features of the NSL-KDD dataset have been outlined in Table 13. A similar kind of analysis on ISCXIDS2012 and CICIDS2017 datasets was also conducted, and the ranks of features for these two datasets are outlined in Tables 14 and 15, respectively. Similarly, observing the drifting of the accuracy of various classifiers similar to inference (ii), an attempt has been made to generate a feature subset of NSLKDD, ISCXIDS2012, and CICIDS2017 dataset, which will be taken into account to improve the performance of IDS detector in the subsequent stages of detection. The ideal feature subsets of IDS datasets are presented in Tables 16-18.  It should be noted that, before the features ranking and subset selection process, all the identification attributes, such as Source and destination IP address, protocol name, system name, etc. have been removed from the dataset. This is because the feature selection technique used here is designed to work on numerical features only. Once the required numbers of features are selected, the training and testing data have been extracted from the samples. To achieve an unbiased experiment, both train and test data have been selected from the samples randomly in such a way that, T r ∩ T s = 0, where T r represents the training and T r represents the testing instances. In this case, 66% of the sample has been used for training, and 34% of the sample has been used for testing [56,57], the proposed detection model. The generated training and test samples that have been used to train and test the IDS detection engine are presented in Table 19.

IDS Detector
The J48Consolidated is a C4.5 supervised classifier, which is based on CTC [14,15,58] algorithm to counter the class imbalance problem. Instead of using several samples to build a classifier model, the CTC builds a single decision tree [15]. The CTC procedure used in J48Consolidated has been described in Algorithm 5.

Algorithm 5 CTC of J48Consolidated
Mathematics 2021, 9,  The algorithm attracts the researchers for its inherent ability to be trained on class imbalance datasets. Initially, the CTC-based classifier was used in car insurance fraud detection [58]. From an architectural point of view, the technique of J48Consolidated is fundamentally different from boosting and bagging. Only one tree is built, and the agreement is achieved at each step of the tree building process. However, the different subsamples are used to select suitable features that ultimately split in the current node. Information gain ratio criterion, Gini Index, or χ 2 (CHAID) are used as the split function during the tree building process. The splitting decision of the tree is achieved node by node voting process. The resampling methodology [15] undertaken by the CTC classifier helps to achieve the notion of coverage. The notion of coverage in a sense, considering the class-wise lowest number of sample instances from training data having a different class distribution, to identify the number of subsamples required. Therefore, the class distribution, type of subsample, and the coverage value chosen jointly determine the number of subsamples to be selected. The subsamples to be generated are directly proportional to the degree of class imbalance in the dataset. Subsequently, a consolidated tree has been built with the similar principle of a C4.5 decision tree.
The J48Consolidated is built upon the CTC algorithm described in Algorithm 5 and employs a C4.5 classification algorithm to classify test instances. It has been seen that the CTC algorithm resample the data to a balanced form and classifies the data using the C4.5 decision tree, hence making the detection mechanism remains stable in case of high-class imbalanced training data. This unique feature, J48Consolidated, is best suited as the base detector in the proposed IDS scenario.

Results and Discussion
In Section 3, the proposed SRRS algorithm has been used to generate class-wise true random samples from NSLKDD, ISCXIDS2012, and CICIDS2017 datasets. Furthermore, the IIFS-MC has been used with the samples to rank features and to generate feature subsets. In this section, to validate the proposed model, both the features subset and all the features (as per ranking given by IIFS-MC) have been considered separately for individual datasets. The outcome of the proposed system has been described in the following sections.

Performance of Proposed IDS on NSL-KDD Dataset
When the proposed IDS model is validated on the NSL-KDD dataset separately using the feature subset (20 features) and all the ranked features generated by IIFS-MC, the proposed IDS model reveals a decent detection output. For the best 20 features obtained out of the NSL-KDD dataset, the proposed CTC detector's overall performance remains consistent as that of the performance of the same detector on all features. The performance of the proposed model combining CTC, IIFS, and SRRS is outlined in Table 20, and detection output has been depicted in Figures 2 and 3. By observing the overall performance outcomes outlined in Table 20 of the proposed model, it can be realized that the IDS detection engine has an impressive accuracy and detection rate of 99.9562% with a low misclassification rate of 0.0438%. Out of the testing instances of 29,686, the proposed model cannot detect attack labels of 13 instances correctly, which is considered very low in the field of intrusion detection. The model also consumes a very lower amount of training and testing time of 11.8 and 0.25 s because of fewer features. Similarly, the model also reveals a very low FPR and FNR of 0.0004. Extending, the validation process on the NSL-KDD dataset, the entire features of the NSL-KDD sample arranged according to the rank given by IIFS-MC has been used for training and testing purposes. In this regard, it is observed to have a little better overall accuracy of the model. An accuracy of 99.9629% has been achieved but with the cost of a higher model build time of 19.41 s. It should be noted that the average testing time for each instance consumes 0.07 s due to the additional feature information. Again, the proposed model also achieves a significantly low misclassification rate of 0.0371%.
Comparing the performance of the proposed model, both for 20 and all features of IIFS, the detection accuracy of the detector was almost the same as approximately 99.96%. The false-positive rate and false-negative rate also remain the same for both cases. This shows the detector remains stable even in the presence of 20 features. On the other hand, the detector takes a convincing amount of testing time per instance when all the features ranked as per IIFS-MC are fueled for training.     Similarly, visualizing the detection output of the model on the NSL-KDD dataset separately for 20 prominent and all the ranked features the classification and misclassification output appears to be promising. In both cases, the detector swiftly detects the event of intrusions. However, in very few cases the model struggles to detect the intrusion, Similarly, visualizing the detection output of the model on the NSL-KDD dataset separately for 20 prominent and all the ranked features the classification and misclassification output appears to be promising. In both cases, the detector swiftly detects the event of intrusions. However, in very few cases the model struggles to detect the intrusion, which is the main reason behind the FPR and FNR of 0.0004%. Out of all incoming attacks, the probe attacks are detected brilliantly by the model.

Performance of Proposed IDS on ISCXIDS2012 Dataset
With the similar guideline of the NSL-KDD dataset, the proposed IDS model has also been validated through the ISCXIDS2012 dataset separately using the feature subset (3 features) and features ranked according to their weights generated by IIFS-MC. The performance outcome for this dataset has been recorded in Table 21; whereas, the detection output has been depicted in Figure 4 (for best 3 features) and Figure 5 (for all the ranked features). It should be noted that, while considering the ISCXIDS2012 dataset, the proposed SRRS algorithm generates 87,906 instances randomly as training and testing instances. However, the ratio of training to testing instances has remained the same at 66% and 34%, respectively. Only three features provided by IIFS-MC have been selected to build the detection model. For 29,888 testing instances, a sum total of 162 misclassified instances has been generated; thus, producing a false positive rate and misclassification rate of 0.0054 and 0.5420%, respectively. At the same time, the mean absolute error (MAE) generated by the detector is 0.0083. Furthermore, the model's training time lies at 6.06 s, and the testing time of the model is 0.06 s. Overall accuracy and detection rate of the system achieved consistently with 99.4580%. It should be noted that the proposed system can detect the underlying attacks with such an appealing detection rate that too considering only three features (Table 21 and Figure 4). The rates of MA and RMS errors generated by the system are 0.0083 and 0.0719, respectively. On the other hand, the proposed model's RA and RRS error rates are 1.6552 and 14.3441, respectively. While considering all the features, it is observed that the performance of the detection model improves significantly. The detection model generates only 19 false positives and 19 false negatives, with improved accuracy of 99.9364%. Similarly, the system also exhibits a low misclassification rate of 0.0636%. Even with additional features, the training time remains low at 5.08 s. The testing time per instance was recorded as 0.05 s ( Figure 5). One unique observation found in the case of the ISCXIDS2012 dataset is that the model shows a distinguished detection result with a higher number of features. In other words, the model shows superior results on all the features but ranked as per IIFS-MC for the ISCXIDS2012 dataset. This proves that any feature subsets on the IIFS-MC feature selection are not admirable for binary detection scenario.   Table 21. The detected and undetected attacks and normal instances are shown in Figures 4 and 5. Figures 4 and 5 show the detection output of detected and missed attacks. It can be seen that with all the ranked features of the binary attack environment, the detector identifies almost all the attacks leaving few false alarms.

Performance of the Proposed IDS on CICIDS2017 Dataset
In this section, the recent CICIDS2017 dataset has been taken into consideration for validating the proposed model. It is interesting to see the proposed model's performance as this dataset is a high-class imbalance in nature compared to other datasets considered previously. A similar evaluation procedure that was followed for NSL-KDD and IS-CXIDS2012 has also been followed for the CICIDS2017 dataset. This dataset's features   The visualization of the CTC IDS model shows similar output in line with Table 21. The detected and undetected attacks and normal instances are shown in Figures 4 and 5. Figures 4 and 5 show the detection output of detected and missed attacks. It can be seen that with all the ranked features of the binary attack environment, the detector identifies almost all the attacks leaving few false alarms.

Performance of the Proposed IDS on CICIDS2017 Dataset
In this section, the recent CICIDS2017 dataset has been taken into consideration for validating the proposed model. It is interesting to see the proposed model's performance as this dataset is a high-class imbalance in nature compared to other datasets considered previously. A similar evaluation procedure that was followed for NSL-KDD and IS-CXIDS2012 has also been followed for the CICIDS2017 dataset. This dataset's features  Table 21. The detected and undetected attacks and normal instances are shown in Figures 4 and 5. Figures 4 and 5 show the detection output of detected and missed attacks. It can be seen that with all the ranked features of the binary attack environment, the detector identifies almost all the attacks leaving few false alarms.

Performance of the Proposed IDS on CICIDS2017 Dataset
In this section, the recent CICIDS2017 dataset has been taken into consideration for validating the proposed model. It is interesting to see the proposed model's performance as this dataset is a high-class imbalance in nature compared to other datasets considered previously. A similar evaluation procedure that was followed for NSL-KDD and ISCXIDS2012 has also been followed for the CICIDS2017 dataset. This dataset's features have been ranked, and 34 optimum features having no similarity with each other have been retrieved. When the proposed IDS model is validated on the CICIDS2017 dataset, separately using the feature subset (34 features) and feature ranking of all the features generated by IIFS-MC, the performance outcomes observed are listed in Table 22 and visualized in Figures 6 and 7, respectively. By observing the proposed detector's overall performance, it is realized that the IDS detection engine has an attractive accuracy and detection rate of 99.9552% with a low misclassification rate of 0.0004%. Out of the testing instances of 31,222, the proposed model cannot detect attack labels of 14 instances correctly, which again proves to be very low. The model also consumes a lower amount of testing time of 0.41 s 34 features. It is clearly observed that the model's performance quickly boost even with a little number of features in the adverse class imbalance condition. The proposed model also generates an MAE with a rate of 0.003. have been ranked, and 34 optimum features having no similarity with each other have been retrieved. When the proposed IDS model is validated on the CICIDS2017 dataset, separately using the feature subset (34 features) and feature ranking of all the features generated by IIFS-MC, the performance outcomes observed are listed in Table 22 and visualized in Figures 6 and 7, respectively. By observing the proposed detector's overall performance, it is realized that the IDS detection engine has an attractive accuracy and detection rate of 99.9552% with a low misclassification rate of 0.0004%. Out of the testing instances of 31,222, the proposed model cannot detect attack labels of 14 instances correctly, which again proves to be very low. The model also consumes a lower amount of testing time of 0.41 s 34 features. It is clearly observed that the model's performance quickly boost even with a little number of features in the adverse class imbalance condition. The proposed model also generates an MAE with a rate of 0.003. Graphically the detected and undetected instances of the CICIDS2017 testing sample can be seen in Figure 6. The figure shows that almost all attack instances are detected correctly, leaving only 14 instances, which leads to a little misclassification rate of 0.0448%.  Extending the validation process on samples of the CICIDS2017 dataset using all the features ordered as per their ranks, it is observed that the performance of the model is slightly decreased. The overall accuracy was found to be 99.9488%, with a misclassification rate of 0.0512%.

Analysis of the Proposed Model with Existing IDSs
The proposed IDS model shows a great extent in all three datasets. However, the model itself alone cannot claim a good IDS model unless until it is compared with existing detection models in the literature. Therefore, it has been decided to compare the proposed approach of intrusion detection with the existing intrusion detectors described in the literature review section. As the proposed IDS model has been validated across three datasets, it is, therefore, essential to compare and analyze the model with the present works based on those datasets. Furthermore, several researchers evaluated their models based on a variety of performance measures. Only those parameters are considered for comparison, which is mostly used by most existing IDS.
The output of the proposed model is compared with 12 existing IDS models for the NSL-KDD dataset. The performance measures used for comparison are detection rate, false-positive rate, and accuracy (Table 23). Table 23. Comparison of the proposed approach with existing approaches for the NSL-KDD dataset.

IDS Approaches
Year of Release  Graphically the detected and undetected instances of the CICIDS2017 testing sample can be seen in Figure 6. The figure shows that almost all attack instances are detected correctly, leaving only 14 instances, which leads to a little misclassification rate of 0.0448%.
Extending the validation process on samples of the CICIDS2017 dataset using all the features ordered as per their ranks, it is observed that the performance of the model is slightly decreased. The overall accuracy was found to be 99.9488%, with a misclassification rate of 0.0512%.

Analysis of the Proposed Model with Existing IDSs
The proposed IDS model shows a great extent in all three datasets. However, the model itself alone cannot claim a good IDS model unless until it is compared with existing detection models in the literature. Therefore, it has been decided to compare the proposed approach of intrusion detection with the existing intrusion detectors described in the literature review section. As the proposed IDS model has been validated across three datasets, it is, therefore, essential to compare and analyze the model with the present works based on those datasets. Furthermore, several researchers evaluated their models based on a variety of performance measures. Only those parameters are considered for comparison, which is mostly used by most existing IDS.
The output of the proposed model is compared with 12 existing IDS models for the NSL-KDD dataset. The performance measures used for comparison are detection rate, false-positive rate, and accuracy (Table 23).
Several inferences have been deduced while comparing the proposed model for samples of the NSL-KDD dataset. These are-(i) The proposed model leads the IDS models pool with the highest amount of accuracy and detection rate of 99.9629%. (ii) The proposed model proves to be best by revealing the lowest false alarm rate of 0.004%. (iii) DLANID+FAL model performs very poorly in the IDS pool with a low detection rate and accuracy of 85.42%, while the system generates false alarms with a rate of 14.58%. (iv) The reason behind the poor performance of DLANID+FAL is that the model is based on 13 attack labels where the class imbalance ratio is very poor. At the second stage of the analysis, the proposed IDS is compared with 11 existing stateof-the-art intrusion detection models. The models that have been taken for comparison are recent and well-validated through ISCXIDS2012. The performance outcome of these models, along with the proposed IDS, are tabulated in Table 24. The inferences observed through the comparison are as follows. Finally, in the CICIDS2017 dataset, an attempt has been made to compare the proposed IDS with three existing cutting-edge intrusion detection models. These models are based on the CICIDS2017 dataset; hence, they are good candidate models to compare with the proposed IDS. As the CICIDS2017 dataset is very recent, the detection models that have been taken for comparison are also developed recently. These are the only three intrusion detection systems available and published recently while writing this thesis. The performance outcome of those models is silent about the detection rate. Therefore, the False Negative Rate (FNR) is considered in the detection rate for comparing the proposed detector. The performance outcome of these detection models and the proposed IDS are tabulated in Table 25. In this case, the proposed work also performed well ahead of GA + SVM, MI + SVM, and SVM intrusion detection models. The proposed detection model successfully achieves the highest accuracy and the lowest equal amount of false-positive and false-negative rates. By just considering 34 features, the proposed model detects the underlying threats more efficiently than using all the features.
We compared our approach of IDS with many other supervised and unsupervised approaches including decision trees and Bayes oriented approaches. It has been found that the proposed approach shows a significantly better detection result. For an instance, the proposed approach shows 0.5% better detection accuracy as compared to Logitboost+RF [59] based decision tree approach on the NSL-KDD dataset, 0.93% and 0.7% more than the DT + SNORT [18] and AMNN + CART [22] decision tree approaches respectively on IS-CXIDS2012 datasets. It has been observed earlier that the class-imbalance issue lies with both NSLKDD and ISCXIDS2012 datasets, which is the main reason for the Logitboost + RF [59] decision tree approach slightly lacks while detecting attacks. On the other hand, the class-imbalance issue has been addressed in a dual stage within our approach. At first, the class-imbalance issue has been addressed through the SRRS down sampling scheme, where an attempt has been made to arrange attack-wise random samples. Secondly, the J48Consolidated scheme generates synthetic samples for attacks keeping in view the majority attack instances. In this way, the proposed IDS gets balanced samples for training the model, which overall improves the detection result. Another aspect is that the Logitboost+RF [59] approach took all the features of NSL-KDD datasets as compared to 20 features of our proposed approach. This makes our proposed approach to be a better choice when it comes to handle Intrusions. Not only that, the proposed IDS also outperformed other state-of-the-art approaches presented in this decades. Therefore, it is proved that, in both multi attacks and binary attack scenarios, the proposed approach shows reasonably better detection results as compared to other intrusion detection approaches.

Analysis of the Proposed Model across Datasets
The proposed IDS model's performance considering feature subset and feature ranking suggested by IIFS-MC on three high-class imbalance datasets performs consistently well for all three datasets. Furthermore, a comparison of the proposed IDS with existing models has also been conducted. In that comparison also, the proposed IDS performs consistently well over other existing models. In this section, the proposed IDS to come across the best setting specific to each dataset is analyzed. More emphasis is given to errors generated by the detector along with both training and testing time. Figure 8 shows the error of the proposed model for three datasets. It is observed that the model generates a very low amount of errors on the CICIDS2017 dataset. It is advisable to use 34 features to detect all the attacks most precisely as this setting reveals the very least amount of errors. models has also been conducted. In that comparison also, the proposed IDS performs consistently well over other existing models. In this section, the proposed IDS to come across the best setting specific to each dataset is analyzed. More emphasis is given to errors generated by the detector along with both training and testing time. Figure 8 shows the error of the proposed model for three datasets. It is observed that the model generates a very low amount of errors on the CICIDS2017 dataset. It is advisable to use 34 features to detect all the attacks most precisely as this setting reveals the very least amount of errors.  (i) The model works best with the ISCXIDS2012 dataset. With the ISCXIDS2012 dataset, the system quickly trained and detected the attacks. (ii) The system will be fast if deployed considering all the features of the ISCXIDS2012 dataset both for training and detecting. (iii) The system is fast for a binary attack scenario.  (i) The model works best with the ISCXIDS2012 dataset. With the ISCXIDS2012 dataset, the system quickly trained and detected the attacks. (ii) The system will be fast if deployed considering all the features of the ISCXIDS2012 dataset both for training and detecting. (iii) The system is fast for a binary attack scenario.  Finally, the proposed system has been tested through overall accuracy and false-positive rate. The outcome has been depicted in Figure 10. The following inferences have been outlined: (i) NSL-KDD is the ideal dataset for building the intrusion detection model as it exhibits the highest amount of accuracy significantly. (ii) If the NSL-KDD dataset is used, the system should be trained considering all the features. (iii) On the other hand, if the CICIDS2017 dataset is used, the system should be trained, considering 34 features generated by IIFS-MC. It is because the proposed system shows the highest ever accuracy under this feature set. (iv) It is observed that the proposed model works brilliantly with multiclass datasets (NSL-KDD, CICIDS2017).  Finally, the proposed system has been tested through overall accuracy and falsepositive rate. The outcome has been depicted in Figure 10. The following inferences have been outlined: (i) NSL-KDD is the ideal dataset for building the intrusion detection model as it exhibits the highest amount of accuracy significantly. (ii) If the NSL-KDD dataset is used, the system should be trained considering all the features. (iii) On the other hand, if the CICIDS2017 dataset is used, the system should be trained, considering 34 features generated by IIFS-MC. It is because the proposed system shows the highest ever accuracy under this feature set.
(iv) It is observed that the proposed model works brilliantly with multiclass datasets (NSL-KDD, CICIDS2017).
(i) NSL-KDD is the ideal dataset for building the intrusion detection model as it exhibits the highest amount of accuracy significantly. (ii) If the NSL-KDD dataset is used, the system should be trained considering all the features. (iii) On the other hand, if the CICIDS2017 dataset is used, the system should be trained, considering 34 features generated by IIFS-MC. It is because the proposed system shows the highest ever accuracy under this feature set. (iv) It is observed that the proposed model works brilliantly with multiclass datasets (NSL-KDD, CICIDS2017).

Analysis of the Proposed Model Specific to Attacks in Datasets
The proposed model is suitable for NSL-KDD multiclass dataset. In this subsection, the comparison process to come across a conclusion specific to attacks is presented. The proposed IDS performance outcomes for various attacks have been analyzed to identify the specific attacks for which the system works considerably. Therefore, future researchers can design that attack specific detection engines. It should be noted that both NSL-KDD and CICIDS2017 are multiclass datasets, which contain varieties of attacks.

Analysis of the Proposed Model Specific to Attacks in Datasets
The proposed model is suitable for NSL-KDD multiclass dataset. In this subsection, the comparison process to come across a conclusion specific to attacks is presented. The proposed IDS performance outcomes for various attacks have been analyzed to identify the specific attacks for which the system works considerably. Therefore, future researchers can design that attack specific detection engines. It should be noted that both NSL-KDD and CICIDS2017 are multiclass datasets, which contain varieties of attacks. Therefore, it is relevant to consider these two datasets to undertake an attack-specific comparison. Therefore, being a binary dataset, ISCXIDS2012 has been ignored in this analysis. The ROC curves of the proposed models' attacks are shown in Tables 26 and 27. Considering the 20 features NSL-KDD dataset, it is observed that the proposed model works well for R2L attacks with 100% accuracy, detection rate, and precision. The Probe attacks are also detected considerably well with an accuracy and detection rate of 99.9899% and 100%, respectively. The traditional performance measures such as accuracy, detection rate, and precisions are not enough to understand a detection model's real performance built upon a high-class imbalanced dataset. Therefore, the ROC curves of the NSL-KDD dataset's attacks have been analyzed to observe the performance of the proposed IDS. The AUC value of the ROC curve of R2L attacks proves that the R2L attacks were detected well by considering 20 features of the NSL-KDD dataset.
Similarly, when all the features are used to build the detection model, it is observed that the U2R attacks are nicely detected with 100% accuracy, detection rate, and precision. The ROC curve of U2R attacks also supports the claim. The AUC value of U2R lies at 1, indicating the IDS detector is a perfect detector for U2R attacks.
In the CICIDS2017 dataset, when the proposed model is built upon 34 features, the model correctly detects attacks such as BruteForce, Infiltration, BotnetARES, and WebAttack. The model on 34 features also detects other attacks such a DoS/DDoS and PortScan brilliantly with 99%+ accuracy. In a nutshell, if the target is to detect BruteForce, Infiltration, BotnetARES, and WebAttack attacks, the proposed IDS model is ideally suited and hence can be trained on 34 features.
The proposed model using 34 features of the CICIDS2017 dataset presents an AUC of BruteForce, Infiltration, BotnetARES, and WebAttack also justifies the inference about the model for these attacks. In this case, the model is not that much convincing as that of 34 features of the CICIDS2017 dataset, considering all the features of the CICIDS2017. It is because BruteForce and WebAttacks are detected with lesser accuracy through all the features. Overall, though the model seems to be efficient for the CICIDS2017 dataset considering all the dataset features, it is advisable to consider only 34 stated features to achieve better accuracy for a maximum number of attacks.
The detection of new cyberattacks and the discovery of system intrusions can be automated to predict future intrusion patterns based on machine learning methods that can be tested in available historical datasets [61]. Future cyber-security research must focus on the development of novel automated methods of cyber-attack detection. Furthermore, machine learning methods must be used to automatically classify malicious trends and predict future cyber-attacks for enhanced cyber defense systems. These systems can support police officers' decision-making and enable prompt response to cyber-attacks, and, consequently, provide an enhanced response to cyber-crimes.

Conclusions
This paper validates the proposed IDS through NSL-KDD, ISCXIDS2012, and CI-CIDS2017 datasets. A C4.5 based algorithm with the facility of CTC has been deployed to detect attacks quickly and efficiently. The model has been validated separately, considering the feature subset and all the features ordered as per the rank generated by IIFS-MC. The highest accuracy of 99.96% has been achieved for the NSL-KDD dataset for all the features and 99.95% for the CICIDS2017 dataset only for 34 features. The proposed model is best suitable for a binary class dataset. However, a multiclass environment also shows promising results in terms of detection and classification accuracy. The research works carried out here also tried to provide insight to choose the best dataset for the model. The NSL-KDD dataset has been identified as the best dataset for the proposed model.
Detailed performance analysis of the proposed IDS for each attack reveals that an attack-specific IDS provides a better detection rate and classification accuracy as compared to the IDS for all attack instances. The proposed model was also compared and validated through the new state-of-the-art intrusion detection systems separately for separate datasets. In the event of comparison, the proposed IDS stands firm with the highest ever detection rate and accuracy.
The proposed method has limitations, which can be addressed to improve the detection process further. A feedback approach in the proposed IDS is missing, which can be incorporated to strengthen the system towards more dynamism. The feedback approach helps the administrative host to isolate the malicious host out of the main network. Moreover, the proposed system is a standalone signature-based system, which can be incorporated along with an anomaly detection engine to improve the detection rate. Furthermore, the attack correlation strategies can be implemented to understand the severity of attacks, which helps the security managers to take preventive steps. It should be noted that the proposed system has been trained and tested on the samples of the two multiclass IDS datasets, where the sample contains a mixture of standard and various attack instances.
However, it is observed that the SRRS sampling algorithm generates a perfect balanced sample for the binary ISCXIDS2012 dataset. Therefore, instead of generating a mixture of a sample of all types of attacks and benign instances, the sample can be realized on a mixture of benign and specific attack instances; thus, generating a binary attack sample set for each attack class. A corresponding IDS engine can be built for each sample set of benign and specific attack classes. The incoming testing instances must be passed through all these engines to be detected by at least one detector somewhere in the detection process, thus expected to reduce the detection time to a certain level.