Detection of Unknown DDoS Attacks with Deep Learning and Gaussian Mixture Model

: DDoS (Distributed Denial of Service) attacks have become a pressing threat to the security and integrity of computer networks and information systems, which are indispensable infrastructures of modern times. The detection of DDoS attacks is a challenging issue before any mitigation measures can be taken. ML/DL (Machine Learning/Deep Learning) has been applied to the detection of DDoS attacks with satisfactory achievement. However, full-scale success is still beyond reach due to an inherent problem with ML/DL-based systems—the so-called Open Set Recognition (OSR) problem. This is a problem where an ML/DL-based system fails to deal with new instances not drawn from the distribution model of the training data. This problem is particularly profound in detecting DDoS attacks since DDoS attacks’ technology keeps evolving and has changing trafﬁc characteristics. This study investigates the impact of the OSR problem on the detection of DDoS attacks. In response to this problem, we propose a new DDoS detection framework featuring Bi-Directional Long Short-Term Memory (BI-LSTM), a Gaussian Mixture Model (GMM), and incremental learning. Unknown trafﬁc captured by the GMM are subject to discrimination and labeling by trafﬁc engineers, and then fed back to the framework as additional training samples. Using the data sets CIC-IDS2017 and CIC-DDoS2019 for training, testing, and evaluation, experiment results show that the proposed BI-LSTM-GMM can achieve recall, precision, and accuracy up to 94%. Experiments reveal that the proposed framework can be a promising solution to the detection of unknown DDoS attacks.


Introduction
By flooding malicious traffic, DoS (Denial of Service) attacks deplete the network bandwidth and computing resources of a targeted system, preventing the target system from offering regular services to legitimate users. DDoS (Distributed Denial of Service) [1] goes even further on a much larger scale. DDoS attacks take over the control of a large number of comprised systems, called a botnet, and launch coordinated attacks on the victim system, as illustrated in Figure 1. Along with the emergence and advancement of disruptive Internet technologies, DDoS attacks are evolving and proliferating in scale, frequency, and sophistication. Organizations face potential threats to their network environment that may cause severe impacts to their operations, such as business downtime, data breaches, or even ransom demands from hackers [2].
Upon the occurrence of DDoS attacks, actions for DDoS mitigation should be taken, as suggested in [3]. The detection of DDoS attacks is essential before any mitigation approaches can be taken. In the early era, the alarm of DDoS attacks was triggered by rules programmed by traffic engineers. This approach apparently failed to catch up with the dynamic and evolving natures of DDoS attacks. As ML/DL (Machine Learning/Deep Learning) unleashes their great potential in different fields, academics and industries are exploring the possibility of applying ML/DL to DDoS detection. Traditional manual methods suffer from low accuracy and long latency problems in risk identification. With ML approaches, such as Naive Bayesian, KNN, and Random Forest, threats can be captured more quickly and more accurately [4]. In ML, features for classification must be selected by human experts or by certain feature selection schemes. On the other hand, feature selection is an integral part of DL. Deep Learning models such as CNN and RNN are built based on a series of nonlinear processing layers to learn many levels of data representation from a large volume of labeled samples. Therefore, DL can serve as a powerful tool for DDoS detection [5]. Some successful stories of ML/DL use for DDoS detection will be reviewed in Section 2. After exploring several alternatives, we employ the Bi-Directional Long Short-Term Memory (BI-LSTM) [6] DL architecture in this study. BI-LSTM is capable of capturing essential characteristics of DDoS traffic, in particular, the time domain correlation. Experiments show that it serves well for the intended purpose. Upon the occurrence of DDoS attacks, actions for DDoS mitigation should be taken, as suggested in [3]. The detection of DDoS attacks is essential before any mitigation approaches can be taken. In the early era, the alarm of DDoS attacks was triggered by rules programmed by traffic engineers. This approach apparently failed to catch up with the dynamic and evolving natures of DDoS attacks. As ML/DL (Machine Learning/Deep Learning) unleashes their great potential in different fields, academics and industries are exploring the possibility of applying ML/DL to DDoS detection. Traditional manual methods suffer from low accuracy and long latency problems in risk identification. With ML approaches, such as Naive Bayesian, KNN, and Random Forest, threats can be captured more quickly and more accurately [4]. In ML, features for classification must be selected by human experts or by certain feature selection schemes. On the other hand, feature selection is an integral part of DL. Deep Learning models such as CNN and RNN are built based on a series of nonlinear processing layers to learn many levels of data representation from a large volume of labeled samples. Therefore, DL can serve as a powerful tool for DDoS detection [5]. Some successful stories of ML/DL use for DDoS detection will be reviewed in Section II. After exploring several alternatives, we employ the Bi-Directional Long Short-Term Memory (BI-LSTM) [6] DL architecture in this study. BI-LSTM is capable of capturing essential characteristics of DDoS traffic, in particular, the time domain correlation. Experiments show that it serves well for the intended purpose.
ML and DL have proven themselves effective solutions to the detection of DDoS attacks. However, they are trained to recognize only instances drawn from the distribution models constructed from the training set. Thus, they could fail in cases they have never learned. To know what one doesn't know is the problem called the Open Set Recognition (OSR) problem [7]. This problem has a severe impact on the detection of DDoS attacks ML and DL have proven themselves effective solutions to the detection of DDoS attacks. However, they are trained to recognize only instances drawn from the distribution models constructed from the training set. Thus, they could fail in cases they have never learned. To know what one doesn't know is the problem called the Open Set Recognition (OSR) problem [7]. This problem has a severe impact on the detection of DDoS attacks since DDoS attacks' technology keeps evolving and results in changing traffic characteristics. A Gaussian Mixture Model (GMM) [8] is incorporated into our framework to differentiate trained samples and novel instances. Unknown traffics captured by the GMM are then subject to discrimination and labeling by traffic engineers, and then fed back to the framework as additional training samples in the incremental learning phase. The main contributions of this paper are summarized as follows:

•
We identify the detection of unknown DDoS attacks as an Open Set Recognition problem and demonstrate its impact on conventional detection approaches.

•
We propose a new BI-LSTM-GMM model to detect the unknown network attack. The proposed framework can successfully differentiate novel instances from samples drawn from trained models.

•
Using the data sets CIC-IDS2017 and CIC-DDoS2019 for training, testing, and evaluation, experiment results show that the proposed BI-LSTM-GMM can achieve recall, precision, and accuracy up to 94%. The rest of this paper is organized as follows. Previous works and related technologies are briefly reviewed in Section 2. The framework of the proposed approach is presented in Section 3. Then, experiment results are reported in Section 4. Finally, some conclusions are drawn in Section 5.

Related Works
Various ML technologies have been employed, mainly as classifiers, in the detection of DDoS attacks. These include Support Vector Machines (SVM), k-Nearest Neighbors (KNN), the Naïve Bayes Classifier, Random Forest (RF), Density-Based Spatial Clustering of Applications with Noise (DBSCAN), and Artificial Neural Networks (ANNs), to name a few. With SVM, based on labeled training data, a hyperplane is constructed in the transform domain to classify unseen data. Cheng has built an IAI (IP Address Interaction Feature) model that can effectively discriminate normal from abnormal flows in traffic streams and helps to identify attack flows quickly and precisely [9]. In KNN, k nearest neighbors of incoming data are located. A majority of these k neighbors decide the classification of the incoming data. Vu has applied KNN [10] to classify the state of networks according to each stage of DDoS attack and obtained optimific results. A Naïve Bayes classifier is a classification technique based on Bayes' theorem assuming independence among predictors. A Naïve Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. Fadil used the NB method [11] to predict the existence of DDoS attacks based on the mean and standard deviation of network packets and achieved precise results. An RF is a collection of decision trees. The majority of the outcomes of individual decision trees determine the classification. In [12], Wang et al. proved that with well-computed key features in DDoS data, experimental results show an RF algorithm can attain valid classification performance and an optimal feature subset. DBSCN finds core samples of high density and expands clusters from them. It particularly suits data that contains clusters of similar density. Dincalp used a DBSCAN clustering algorithm [13] to deal with diversity in attack vectors. The proposed system worked well with chosen attributes in their experiments. ANNs emulate biological neural networks. Given label data, ANNs learn the mapping function using the back-propagation algorithm. Ahanger proposed an ANN-based DDoS detection method that provided 99.8% detection accuracy in recognizing DDoS attacks [14].
There are also successful stories for DL in DDoS detection. Li and Lu [15] combined Long Short-Term Memory (LSTM) and Bayesian methods to detect DDoS attacks. LSTM is suitable for events with long intervals and delays in the time domain. In other words, LSTM can control the value of the indefinite length of time and decide whether the information should be retained or removed. The author used LSTM to identify the confidence index of DDoS attacks and further used the Bayesian method to make a second judgment to improve detection accuracy. Yang et al. [16] adopted the autoencoder for the detection of DDoS attacks. Autoencoder is a multi-layer neural network with an unsupervised training algorithm. It removes less relevant information and noise during the training process and retains essential information. Doriguzzi-Corin et al. [17] employed CNN in their detection system, named LUCID, featuring a dataset-agnostic preprocessing mechanism and an activation analysis. The proposed LUCID has 40 times of reduction in the processing time, rendering it well-suited in resource constrained environments. Yong et al. [18] applied machine learning models to detect webshell to build secure solutions for IoT networks. Ensemble methods, including random forest (RF), extremely randomized trees (ET), and Voting, are used to improve the performances of these machine learning models. Their findings show that RF and ET are suitable for lightweight IoT scenarios, and the Voting method is effective for heavyweight IoT scenarios. Hemalatha et al. [19] propose an efficient malware detection system based on deep learning. The system uses a reweighted classbalanced loss function in the final classification layer of the DenseNet model to achieve significant performance improvements in classifying malware by handling imbalanced data issues. Comprehensive experiments performed on four benchmark malware datasets showed that the proposed scheme has superior performance with a higher detection rate and lower computational cost.
Despite the success attained by the ML/DL-based schemes, a critical issue, the OSR problem, is overlooked. Bendale et al. [20] and Sabeel et al. [21] point out that, without proper measures, ML/DL could forcibly classify instances from new sample space into the wrong category. The correct action is to differentiate novel instances from training samples, which is the strategy adopted by [20,22]. They examined the hyper distance from incoming data to known classes and marked it as a new instance if the distance exceeded a certain threshold. In [20], a new model layer, OpenMax, is introduced, which estimates the probability of an input being from an unknown class. In [22], the Extreme Value Machine (EVM) is proposed, which uses the Weibull function for model construction from the training data set. Given a new sample, EVM will estimate the inclusion probability of each existing class for classification purposes. A new instance, distant from all classes, is considered belonging to a new class. This characteristic renders EVM a good candidate for OSR problems, including the detection of known DDoS attacks.
We follow the same philosophy but with different technology. To be of practical significance, this study deals with the OSR problem by incorporating the GMM into the proposed DDoS detection framework. The OSR problem can then be adequately addressed, as shown later.

Proposed Framework-BI-LSTM-GMM
As a solution to the OSR problem in DDoS attack detection, this study proposes a framework integrating Bi-Directional Long Short-Term Memory (BI-LSTM), a Gaussian Mixture Model (GMM), and incremental learning. The functional block diagram of the proposed BI-LSTM-GMM framework is given in Figure 2.
malware datasets showed that the proposed scheme has superior perfo higher detection rate and lower computational cost.
Despite the success attained by the ML/DL-based schemes, a critical problem, is overlooked. Bendale et al. [20] and Sabeel et al. [21] point ou proper measures, ML/DL could forcibly classify instances from new sampl wrong category. The correct action is to differentiate novel instances from ples, which is the strategy adopted by [20,22]. They examined the hyper incoming data to known classes and marked it as a new instance if the dis a certain threshold. In [20], a new model layer, OpenMax, is introduced, w the probability of an input being from an unknown class. In [22], the Extre chine (EVM) is proposed, which uses the Weibull function for model con the training data set. Given a new sample, EVM will estimate the inclusion each existing class for classification purposes. A new instance, distant from considered belonging to a new class. This characteristic renders EVM a g for OSR problems, including the detection of known DDoS attacks.
We follow the same philosophy but with different technology. To be o nificance, this study deals with the OSR problem by incorporating the GM posed DDoS detection framework. The OSR problem can then be adequa as shown later.

Proposed Framework-BI-LSTM-GMM
As a solution to the OSR problem in DDoS attack detection, this stu framework integrating Bi-Directional Long Short-Term Memory (BI-LSTM Mixture Model (GMM), and incremental learning. The functional block proposed BI-LSTM-GMM framework is given in Figure 2.  With supervised learning, the BI-LSTM is employed for the discrimination of regular traffic and DDoS attacks. Based on a concept similar to unsupervised learning, the GMM is adopted to construct the training data's distribution model. With GMM, new attacks or new legitimate traffic can be captured if they fall beyond a specific range of the constructed distribution model. Captured new traffics are subject to the identification and labeling of the traffic engineer and then used to update the BI-LSTM and the GMM in the incremental learning phase. To a certain extent, the GMM can be regarded as a classifier of differentiating learned samples and novel instances and the BI-LSTM as a classifier for the discrimination of good and bad traffics.
Our framework uses historical traffic data for deep learning training to generate models that can be used to detect traffic. We also introduce the Gaussian mixture module to make the model capable of identifying unseen attacks. Finally, unknown attacks or traffic classified by the Gaussian mixture model will be identified and labeled by experts, and the deep learning model will be updated through incremental learning.

BI-LSTM Module
The LSTM is a Recurrent Neural Network (RNN) architecture with a memory cell and gates for the explicit control of input, output, and forget, as shown in Figure 3. With its unique design, LSTM is particularly suited for applications related to data sequences. The operation of an LSTM is governed by (1), as follows where t is the index of LSTM stage; x t is the input; h t is the output; W is the weight; b is the bias, C t and C t are the current and the last states, respectively; and σ(·) is the sigmoid function. With supervised learning, the BI-LSTM is employed for the discrimination of regular traffic and DDoS attacks. Based on a concept similar to unsupervised learning, the GMM is adopted to construct the training data's distribution model. With GMM, new attacks or new legitimate traffic can be captured if they fall beyond a specific range of the constructed distribution model. Captured new traffics are subject to the identification and labeling of the traffic engineer and then used to update the BI-LSTM and the GMM in the incremental learning phase. To a certain extent, the GMM can be regarded as a classifier of differentiating learned samples and novel instances and the BI-LSTM as a classifier for the discrimination of good and bad traffics.
Our framework uses historical traffic data for deep learning training to generate models that can be used to detect traffic. We also introduce the Gaussian mixture module to make the model capable of identifying unseen attacks. Finally, unknown attacks or traffic classified by the Gaussian mixture model will be identified and labeled by experts, and the deep learning model will be updated through incremental learning.

BI-LSTM Module
The LSTM is a Recurrent Neural Network (RNN) architecture with a memory cell and gates for the explicit control of input, output, and forget, as shown in Figure 3. With its unique design, LSTM is particularly suited for applications related to data sequences. The operation of an LSTM is governed by (1), as follows = , ℎ , where is the index of LSTM stage; is the input; ℎ is the output; is the weight; is the bias, and are the current and the last states, respectively; and • is the sigmoid function. A BI-LSTM is a DL architecture with dual LSTMs, one for the forward direction and the other for the reverse direction, as shown in Figure 4. A BI-LSTM becomes a non-causal system in the sense of data sequences. It is capable of gripping correlations on both sides along the time axis, which is a desired attribute for the detection of DDoS attacks. A BI-LSTM is a DL architecture with dual LSTMs, one for the forward direction and the other for the reverse direction, as shown in Figure 4. A BI-LSTM becomes a non-causal system in the sense of data sequences. It is capable of gripping correlations on both sides along the time axis, which is a desired attribute for the detection of DDoS attacks. Appl. Sci. 2021, 11, x FOR PEER REVIEW 6 of 13 The output of the BI-LSTM is directed to a fully connected layer serving as a classifier for the discernation of normal and malicious traffic. This classifier is gated by the outcome of the GMM. The classifier's output is counted only when the GMM reports the given instance falling within the training data distribution.

GMM Module
It is critical for an ML/DL-based system to know what it does not know. To endow the DDoS attack detection system with the ability to pick up instances not drawing from the distribution model of the training set, we adopt the GMM in our design, which is an extension of the single Gaussian model. With GMM, multiple Gaussian probability density functions are employed to model clustered data, which is a typical scenario in DDoS attack detection. Different types of legitimate traffic or DDoS attacks bear similar characteristics and form individual clusters in the feature space. With unsupervised learning, GMM can establish a model matching the distribution of the training data set. A GMM is a statistic model consisting of Gaussian distributions. Each kernel can be specified by its weighting factor , mean , and covariance matrix , as follows The likelihood of a given instance belonging to a GMM can be evaluated according to (3), as follows where The output of the BI-LSTM is directed to a fully connected layer serving as a classifier for the discernation of normal and malicious traffic. This classifier is gated by the outcome of the GMM. The classifier's output is counted only when the GMM reports the given instance falling within the training data distribution.

GMM Module
It is critical for an ML/DL-based system to know what it does not know. To endow the DDoS attack detection system with the ability to pick up instances not drawing from the distribution model of the training set, we adopt the GMM in our design, which is an extension of the single Gaussian model. With GMM, multiple Gaussian probability density functions are employed to model clustered data, which is a typical scenario in DDoS attack detection. Different types of legitimate traffic or DDoS attacks bear similar characteristics and form individual clusters in the feature space. With unsupervised learning, GMM can establish a model matching the distribution of the training data set. A GMM is a statistic model consisting of K Gaussian distributions. Each kernel can be specified by its weighting factor w k , mean → µ k , and covariance matrix Σ k , as follows The likelihood of a given instance → x belonging to a GMM can be evaluated according to (3), as follows Appl. Sci. 2021, 11, 5213 Given a training data set and K, the expectation-maximization algorithm is the technique most commonly used to estimate the mixture model's parameters. Once the model is constructed, the GMM can be used to reject unseen instances. According to common practice in statistics, an instance falling within three times of the standard deviation, i.e., σ , is considered an instance drawing from the training data distribution. In this case, the instance will be accepted, and the output of the BI-LSTM will be enabled. Otherwise, the instance will be rejected, and the BI-LSTM will be disabled. A rejected instance will be logged and subject to identification and labeling by data engineers. The control flow of the GMM in the proposed framework is presented in Figure 5.
Given a training data set and , the expectation-maximization algorithm is the technique most commonly used to estimate the mixture model's parameters. Once the model is constructed, the GMM can be used to reject unseen instances. According to common practice in statistics, an instance falling within three times of the standard deviation, i.e., − 3 ≤ ≤ + 3 , is considered an instance drawing from the training data distribution. In this case, the instance will be accepted, and the output of the BI-LSTM will be enabled. Otherwise, the instance will be rejected, and the BI-LSTM will be disabled. A rejected instance will be logged and subject to identification and labeling by data engineers. The control flow of the GMM in the proposed framework is presented in Figure 5.

Incremental Learning
An entity can keep improving its ability if it can keep learning. Given the existing training set and newly obtained training data, retraining the system from scratch is a possible yet highly time-consuming process. On the contrary, incremental learning is a more plausible and more efficient alternative. Incremental learning is the procedure that keeps refining an existing model with newly obtained data. This capability is essential for DDoS attack detection since the attacking technology and resulting traffic keeps evolving.
In the proposed framework, traffic rejected by the GMM module will be logged and labeled by traffic engineers. The labeled traffic will be fed to the BI-LSTM and the GMM, serving as the training data in the incremental learning phase. The control flow for the incremental learning procedure is given in Figure 6.

Incremental Learning
An entity can keep improving its ability if it can keep learning. Given the existing training set and newly obtained training data, retraining the system from scratch is a possible yet highly time-consuming process. On the contrary, incremental learning is a more plausible and more efficient alternative. Incremental learning is the procedure that keeps refining an existing model with newly obtained data. This capability is essential for DDoS attack detection since the attacking technology and resulting traffic keeps evolving.
In the proposed framework, traffic rejected by the GMM module will be logged and labeled by traffic engineers. The labeled traffic will be fed to the BI-LSTM and the GMM, serving as the training data in the incremental learning phase. The control flow for the incremental learning procedure is given in Figure 6. Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 13 Figure 6. The control flow of the incremental learning procedure.

Experiments
A series of experiments were conducted to validate the feasibility and effectiveness of the proposed framework. The experiments were implemented using Python language on TensorFlow 2.0 and Keras platforms and executed on Intel i5-9500 CPU with 32 GB RAM.

Data Set
As shown in Table 1, two well-recognized datasets, CIC-IDS2017 [23] and CIC-DDoS2019 [24], are adopted in our experiments. The datasets were collected by the Canadian Institute for Cybersecurity with Wireshark in emulated environments. They are generated using two types of usage profiles and multistage attacks, such as Heartbleed, and a variety of DoS and DDoS attacks. Collected traffic is then pre-processed with the CICFlowMeter [25]. It has 80 network traffic features, aiming to generate the varied DoS and DDoS traffic data. The resulting data set contains records of traffic features in CSV format. To fit in the numeric nature of the proposed framework, the non-numeric fields are transformed using One-Hot encoding. All fields are then normalized for the rescaling of their dynamic ranges.

Experiments
A series of experiments were conducted to validate the feasibility and effectiveness of the proposed framework. The experiments were implemented using Python language on TensorFlow 2.0 and Keras platforms and executed on Intel i5-9500 CPU with 32 GB RAM.

Data Set
As shown in Table 1, two well-recognized datasets, CIC-IDS2017 [23] and CIC-DDoS2019 [24], are adopted in our experiments. The datasets were collected by the Canadian Institute for Cybersecurity with Wireshark in emulated environments. They are generated using two types of usage profiles and multistage attacks, such as Heartbleed, and a variety of DoS and DDoS attacks. Collected traffic is then pre-processed with the CICFlowMeter [25]. It has 80 network traffic features, aiming to generate the varied DoS and DDoS traffic data. The resulting data set contains records of traffic features in CSV format. To fit in the numeric nature of the proposed framework, the non-numeric fields are transformed using One-Hot encoding. All fields are then normalized for the rescaling of their dynamic ranges. Performance indices include the confusion matrix, as shown in Table 2, and the Accuracy, Precision, and Recall, as defined in (5)

BI-LSTM Module
With certain efforts of investigation, we arrived at a BI-LSTM architecture with configuration in Figure 7 and parameter settings in Table 3. The kernel, bias, and activation are constrained by L2 regularization. We adopted a 10-fold cross-validation procedure in the training and testing. The dropout mechanism was invoked to avoid the problem of over-fitting. Performance indices include the confusion matrix, as shown in Table 2, and the Accuracy, Precision, and Recall, as defined in (5). Precision attempts to answer the question as to what proportion of positive identifications are actually correct. Recall concerns what proportion of actual positives are identified correctly. Precision measures the percentage of identified instances that are correctly classified.

BI-LSTM Module
With certain efforts of investigation, we arrived at a BI-LSTM architecture with configuration in Figure 7 and parameter settings in Table 3. The kernel, bias, and activation are constrained by L2 regularization. We adopted a 10-fold cross-validation procedure in the training and testing. The dropout mechanism was invoked to avoid the problem of over-fitting.   We first examined the effectiveness of the BI-LSTM in serving as a DDoS detection module. "CIC-IDS2017/Wednesday" was used as the training data set to train the BI-LSTM. After training, the same data set, "CIC-IDS2017/Wednesday" was used to test the BI-LSTM. As indicated in the first row of Table 4, the BI-LSTM was well-trained, well-behaved, as intended. The BI-LSTM is highly capable of discrimination between malicious traffics and legitimate ones from the training data set. We now turn our attention to the impact of the OSR problem. That is, we investigated how a well-trained BI-LSTM reacts to novel instances never learned beforehand. "CIC-IDS2017/Friday" was used to test the BI-LSTM trained with "CIC-IDS2017/Wednesday". As shown in the second row of Table 4, the performance, recall and accuracy dropped dramatically to approximately half, which is essentially a random guess. The reason behind this phenomenon is that the BI-LSTM has no built-in mechanism for knowing what it does not know. For a novel instance not drawn from the distribution model of the training set, the BI-LSTM compulsorily determines the category based on a mismatched model, and in general, this results in incorrect classification.
Notably, the precision does not significantly degrade in Table 4. Because of that, the legitimate traffic in "CIC-IDS2017/Friday" and "CIC-IDS2017/Wednesday" share a similar distribution pattern in the feature space.

GMM Module
GMM is our choice of solution to the OSR problem. With GMM, K Gaussian probability density functions are weighted to model the distribution of a given training data set. The number of Gaussian distributions, K, is a critical system parameter. There is an inevitable tradeoff between the model accuracy and the computational cost for the K value. Based on information entropy, the AIC (Akaike Information Criterion), as defined in (6), is commonly used for the evaluation of the fitting accuracy. The lower the AIC is, the better the accuracy is.
AIC 2K − 2 ln(L) As we constructed the GMM model with "CIC-IDS2017/Wednesday", the change in AIC for different K values is presented in Figure 8. It is apparent that a larger K value leads to better accuracy. However, on the other hand, a larger K value implies a higher computational cost. Figure 8

BI-LSTM-GMM with Incremental Learning
With GMM, traffics never learned can be picked up and logged. In the proposed framework, an instance three times of deviation away from the mean, i.e., ≤ − 3 or

BI-LSTM-GMM with Incremental Learning
With GMM, traffics never learned can be picked up and logged. In the proposed framework, an instance three times of deviation away from the mean, i.e., x ≤ µ − 3ρ or x ≥ µ + 3ρ, is considered a novel instance. It will be logged and labeled by traffic engineers. The intervention of data engineers is required since a novel instance can be malicious traffic, or equally likely, a new type of legitimate traffic. Collected, labeled new instances are fed back to the BI-LSTM and the GMM in the incremental learning phase. Table 5 reports the results after the completion of the loop. Compared with Table 4, the detection rate of unknown traffic, the "CIC-IDS2017/Friday" is greatly improved, both in its recall and accuracy. Notice that the detection of the old attack, CIC-IDS2017/Friday, slightly degrades. This is a side effect of incremental learning. An existing model will be biased by the incremental training data set and move in the feature space.
We also let data sets in "CIC-DDoS2019" play the role of unknown attacks to examine the general behavior of the proposed framework. The results are summarized in Table 6. Unknown traffic is the primary source of mis-classification. As shown in Table 5, the pure BI-LSTM is unable to address unknown attacks correctly. The recall indicator declines dramatically due to unknown traffic. There is a consistent tendency that the BI-LSTM alone failed to handle unknown traffics adequately. However, the GMM is capable of filtering new instances that were not learn beforehand. As indicated in Table 6, the integration of BI-LSTM and GMM can be an effective solution to the Open Set Recognition (OSR) problem in detecting unknown attacks. With the help of traffic engineers, labeled new instances are fed back to the BI-LSTM and the GMM for incremental learning. The updated model can then deal with both the old traffic and the new traffic correctly and gracefully. Table 6. Performance of the BI-LSTM and BI-LSTM-GMM with "CIC-IDS2017/Wednesday" as the old traffic and "CIC-DDoS2019" as the new traffic.

Model
Test Referring to Table 6, the performance degradation of CIC-DDoS2019/NetBIOS is not so profound. It could be that its traffic pattern resembles that of CIC-IDS2017/Wednes-day. The performance of CIC-DDoS2019/NTP and CIC-DDoS2019/LDAP is much more evident. However, with the help of the proposed BI-LSTM-GMM framework together with the incremental learning scheme, all performance indices return to satisfactory levels.
The proposed framework has demonstrated optimistic results in the detection of unknown DDoS attacks. However, several issues demand further investigation for its universal applicability. The first issue relates to the test datasets. Validation on more datasets is required before a solid conclusion can be reached. The second issue is that, for the current design, the BI-LSTM configuration and the number of kernels in the Gaussian Mixture model are subject to a certain amount of trial-and-error. Automatic setting of these essential system parameters will be a fruitful research direction. The third problem is that the intervention of traffic engineers is still demanded. A possible solution to this issue is a global center for the collection of ever evolving attack traffic.

Conclusions
This study is a proof of concept for the detection of DDoS attacks with Deep Learning (DL) and a Gaussian Mixture Model (GMM). As a solution to the Open Set Recognition (OSR) problem in DDoS detection, the proposed framework consists of Bi-Directional Long Short-Term Memory (BI-LSTM), GMM, and incremental learning. The Bi-LSTM has demonstrated itself as a practical approach for the discrimination of malicious and legitimate traffic sampled from the distribution of the training data set. The GMM has shown to be an effective measure for the differentiation of novel instances and trained samples. Unknown traffic can be captured by the GMM and labeled by data engineers, and then fed back to the BI-LSTM and the GMM for incremental learning. Both the new traffic and the old traffic can be handled by the updated model correctly and gracefully. The feasibility and effectiveness of the proposed framework has been validated by a series of experiments on data sets CIC-IDS2017 and CIC-DDoS2019.
BI-LSTM is fully capable of performing what it has been trained to do, such as detecting known DDoS attacks. However, when confronted with novel attacks, the system performance degrades severely. The recall drops from 99.8% to 41.2% for Dataset CIC-IDS2017/Wednesday and Friday. Unknown traffic can be captured by GMM and then labeled by traffic engineers. After that, incremental learning regains detection rates up to 95.3% and 99.8% for Dataset CIC-IDS2017/Wednesday and Friday.
To explore the full potential of the proposed BI-LSTM-GMM framework there are some research directions that deserve further investigation: validation on more datasets, autoconfiguration of BI-LSTM and GMM, and the elimination of intervention traffic engineers.