1. Introduction
Machine learning models, specifically deep learning networks, have proven themselves to be capable tools in a wide range of challenging domains, such as computer vision, healthcare, industry, finance, and so forth. For instance, machine learning applications are leveraged for network traffic analysis and in particular botnet [
1] detection in ever growing number of Internet of Things (IoT) devices (28.1 billion devices in 2020 and expected to reach trillions in 2025 [
2]). Traditional machine learning-based botnet detection approaches, including statistical methods and behavioral techniques, rely on the extraction of patterns from empirical data using a set of selective features. However, these techniques require expert domain knowledge for best feature engineering. Recently, researchers in the community have proposed representation learning for automatic learning of representative features from raw data as a technique to overcome to problems of traditional machine learning methods. Convolutional Neural Networks (CNNs) are one of the well-established methods for representation learning, where network traffic data can be fed to the model as either numeric or image format [
3]. The later is known as visualization-based botnet detection approach, which are utilized for data characteristics understanding and embedding features visualization [
3,
4]. These deep learning models not only are capable of learning the representative properties of the network traffic data but also can be used for prediction of the label of the future network traffic data in an automated manner.
With great usage and wide adoption of neural networks in sensitive domains, the reliability of their results is a necessary. However, this widespread usage of deep learning models has became an incentive for adversarial entities to manipulate the inference of these learning models such that the output is not correct [
5]. In last few years, it has been proven that deep learning models are vulnerable in making inference from manipulated input data [
6,
7]. When presented with inputs containing minor perturbation, that is, adversarial examples, deep learning networks would be forced to misclassify the input and be fooled into giving wrong results [
6,
7,
8,
9]. As deep learning models learn the inherent pattern of input dataset, introducing a small yet carefully designed perturbation into the dataset would adversely impact the output of the model. In the concept of adversarial machine learning, the generated adversarial examples are very similar to the original ones and not necessarily output of the input data distribution. Thus, detection of adversarial examples is a challenging task. There exist several algorithms to systemically generate adversarial examples [
10]. Although, there exist algorithms that work based on evolutionary-based optimization methods [
11], most of the adversarial example generator algorithms are based on the gradient analysis of the model output with regard to the input. For instance, Goodfellow et al. proposed Fast Gradient Sign Method (FGSM) [
12], which generates adversarial examples based on only sign of the gradient. Kurakin et al. proposed Basic Iterative Method (BIM) [
13] that basically iterates FGSM method for several steps. DeepFool [
8] is proposed by Moosavi-Dezfooli et al. to find the closest distance between a given input and decision boundary of adversarial examples. Papernot et al. presented Jacobian-based Saliency Map Attack (JSMA) [
6] that tries to generate an adversarial attack through perturbing a limited number of input features. Although adversarial learning attacks are initially developed in the image domain, they are actively applied on other sensitive domains, as well. For example, Grosse et al. [
14] have expand on existing adversarial example generating algorithms to create a highly-effective attack that utilizes adversarial examples against malware detection models. Osadchy et al. [
15] have utilized adversarial examples to generate a secure CAPTCHA scheme.
In response to the vulnerability of deep learning networks to adversarial examples, there has been huge interest in building defensive systems to improve the robustness of deep learning models [
12,
16,
17,
18]. However, in a recent work it was shown that approximately all the proposed approaches to defend against adversaries can be circumvented, with the approach of
Retraining being an exception [
19]. Note that any defense mechanism not only should be robust against existing adversarial attacks, but also performs well against future and unseen adversarial attacks within the specified threat model. Thus, any defense proposal should be adaptive, as well.
In this work, we propose a novel defense system leveraging the unique power of Generative adversarial networks (GANs) [
20] to generate adversarial examples for retraining. We approach attacking the deep learning network in an automated way, using an external GAN, that is, Pix2Pix [
21] conditional GAN, to understand the transformations between adversarial examples and clean data, and to automatically generate unseen adversarial examples. After attacking the neural network, we generate a plethora of adversarial examples in an iterative manner and use them to automate defense against adversaries. Via doing so, we develop a defense mechanism which is practical in real-world application with pre-trained deep learning network. Since the vulnerability against adversaries is inherent to the world of neural networks, using an iterative offensive approach to generate new attacks to help strengthen the neural network is the best defense.
Particularly, the main contributions of this work are structured as follows:
Developing a novel (attacking) method for adversarial example generation, which learns the initial data distribution of common adversarial examples and shifts it to fool a pre-trained deep learning model.
Creating new adversarial examples that can pass undetected to models trained on the initial common adversarial examples.
Attaching a pre-trained CNN to a Pix2Pix GAN and learning the generation with the goal to fool the attached network, a technical feat which has not been done before.
Implementing a novel iterative pipeline in order to strengthen the model, attack, and defense in an iterative manner. After each iteration the attacker generates stronger adversarial examples, while the robustness of the model increases through retraining and updating associated weights.
Conducting extensive experiments to evaluate the performance of the proposed method, with demonstrating an application for visualization-based botnet detection systems.
The rest of the paper is organized as follows: Background information and preliminaries are discussed in
Section 2. Our developed robust defensive system against adversarial attacks leveraging the power of generative adversarial networks is presented in
Section 3. We evaluate the performance of the proposed method and discuss the results in
Section 4. Finally, conclusion and future work is placed in
Section 5.
3. Methodology
The detail of our proposed attack and defense system is proposed in this section. As it can be seen in
Figure 1, the proposed method is composed of four main components, including DL-based botnet detector, gradient-based adversarial attacks, GAN-based adversarial attacks, and defensive mechanism.
3.1. Victim Model
Convolutional neural networks are one of the well-known representation learning techniques, which is widely employed in a wide range of sensitive domains ranging from computer vision to visualization-based botnet detection. Thus, in this study we employed a convolutional neural network as our baseline botnet detection model. We refer to this baseline model as victim model, as we assumed that this model is under adversarial attacks. Note that, as we explain in more details in next sections, the main goal of this work is to develop a defensive system to improve the robustness of this model against adversarial attacks leveraging unique power of generative adversarial attacks.
To build our victim model, network traffic data is converted into gray-scale images. These images contain representative information about nature of the traffic, whether they are normal traffic data or related to malicious activity. Our botnet detector is composed of two consecutive convolutional layers as feature extractor and a fully connected layer as classifier.
3.2. Attack Engines
In this section, we described our attack engines that have been employed to generate adversarial examples to attack our victim model. First, we utilized generic gradient-based attack engines to generate adversarial examples. These adversarial attack methods are well-established and include, fast gradient sign method [
12], DeepFool [
8], and projected gradient descent [
31], which will be discussed in
Section 3.2.1. Second, we employed a custom generative adversarial network that is able to generate unlimited number of adversarial examples, which will be described in
Section 3.2.2.
3.2.1. Gradient-Based Attack Engine
In recent couple of years, researchers have introduced several adversarial attack methods on deep learning models. Most of these attack methods are working based on gradient descent. In this study, we employed three most common attack engines, including FGSM, DeepFool, and PGD, as examples to conduct adversarial attacks on DL-based botnet detection system. General structure of our gradient-based adversarial attacks on DL-based botnet detection system is shown in
Figure 2. In the following, we describe each of these attack engines briefly, and refer interested readers to original papers for more information.
FGSM: FGSM is a fast method introduced by Goodfellow et al. in 2015 [
12], which updates each feature in the direction of the sign of the gradient. This perturbation can be created after performing back-propagation. Being fast makes this attack a predominant attack in real world scenarios. FGSM can be formulated using (
2).
Here, X is the clean sample, is the adversarial example, J is the classification loss, Y is the label of the clean sample, and is a tunable parameter that controls the magnitude of the perturbation. Note that, in FGSM only direction of the gradient is important not its magnitude.
DeepFool: This attack is introduced by Moosavi-Dezfooli et al. [
8] as an untargeted iterative attack based on the
distance metric. In this attack the closest distance from the clean input to the decision boundary is found. Decision boundaries are the boundaries that divide different classes in the hyper-plane created by the classifier. Perturbations are created in a manner that pushes the adversarial example outside of the boundary, causing it to be misclassified as another class, as demonstrated in Algorithm 1.
Algorithm 1: The process of generating adversarial examples based on DeepFool method [8]. |
- 1
input: Image classifier f - 2
output: Perturbation - 3
Initialize - 4
while do - 5
- 6
- 7
- 8
end while - 9
return
|
PGD: This attack is proposed by Madry et al. [
31] as an iterative adversarial attack that creates adversarial examples based on applying FGSM on a data point
, in an iterative manner, that is obtained by adding a random perturbation of magnitude
to the original input
x. Then the perturbed output is projected to a valid constrained space. The projection is conducted by finding the closet point to the current point within a feasible region. This attack can be formulated based on the following equation.
where
is the perturbed input at iteration
and
S denotes the set of feasible perturbations for
x.
3.2.2. GAN-Based Attack Engine
Generative adversarial networks have the capability to produce synthetic images that look perfect to human beings [
28], compose music [
29], and even generate new molecules [
30]. In this work, we utilize a conditional generative adversarial network specifically designed for image domain, that is, Pix2Pix [
21]. In conditional GANs additional information, such as source image, is provided. Thus, loss function of a conditional GAN can be determined as (
4).
This fact makes Pix2Pix a perfect fit for the image classification and botnet detection domain to be used to understand the transformations between normal data and malicious data and generating unlimited amounts of new adversarial examples. In Pix2Pix, the generator is build based on the U-net structure, while the discriminator has built based on PatchGAN architecture. We refer the interested readers to the source paper for more information about Pix2Pix structure [
21].
Pix2Pix is a type of conditional GAN employs a loss function to train the mapping from input image to output image. This adversarial loss function, shown in (
5), forces the generator to create a sample that resembles the conditioning variable
x. Finally, this adversarial loss function is added to (
4).
The loss function in this work is as (
6).
where
is a hyper-parameter that controls the weight of the term.
The general architecture of our Pix2Pix-based adversarial example generator is shown in
Figure 3. After the transformations are fond, any given clean data from the dataset can be transformed into a malicious data. In other words, the Pix2Pix learns how to add perturbation to the data in an automated manner. This automated attacking is leveraged, and a huge corpus of new attack data is generated. To enable the Pix2Pix to generate better attacks, each generated image is fed to the victim model and is tested to see if it fools the model or not. This can be formulated via adding a loss function for the attack on the model, which can help the training of the Pix2Pix. By doing so, the Pix2Pix autonomously attacks the model using the understanding of the perturbation distribution and is able to generate huge corpus of unseen adversarial examples.
3.3. Defense Mechanism
Having generated a substantial number of adversarial examples in an automated manner, we can defend against them using retraining approach. Retraining consists of feeding the adversarial data back to the victim model and training it again. Having done so, we can convey to the neural networks that it is making mistakes in certain inputs. The neural network then learns how to avoid these mistakes by improving its decision boundaries, as shown in
Figure 4. This method has been proven to withstand adversarial attacks when other defenses, such as change in structure, have failed.
General structure of our proposed defensive method is shown in
Figure 5. This can be done in the following steps; Step 1—Fine-tune the neural network using the generative adversarial examples, improving the model. Step 2—Generate more examples via using the formulated automated attack on the improved model. Step 3—Iterate by going back to step 1 until desired performance metric is reached.
This iterative approach will cause the neural network to evolve over time. The impact of this defense can be significant on neural networks in security domain. With the example of malware detection in mind, this technique allows for new adversarial examples to be generated on each iteration for a specific pre-trained neural network. This model which carries out the task of malware detection in real life is then retrained on the new attack data and becomes more robust towards them. This task iterates, and the model evolves until certain accuracy is reached.
4. Evaluation and Discussion
We evaluate the performance of the proposed method for both generating new adversarial examples and defending against them through comprehensive experiments. In the following, first we give a brief description about utilized dataset, experimental setup, and evaluation metrics in
Section 4.1,
Section 4.2 and
Section 4.3, respectively. Finally, the obtained results are discussed in
Section 4.4.
4.1. Dataset
To evaluate the performance of the proposed approach, we utilized CTU-13 dataset [
32] that is a dataset of botnet, normal, and background traffic with corresponding labels. In this study, we only used normal and botnet traffic data.
Table 1 shows the distribution of the normal and botnet samples in this study. We randomly selected a smaller subset of train and test data, while maintaining the original distribution, totalling into 50,000 samples as train samples and 10,000 samples as test data, to conduct our experiments. All train and test data are converted into gray-scale images based on the technique described in
Section 2.1.
4.2. Experimental Setup
For reproducibility of our proposed method, in the following we provide detailed information about the experimental setup we employed in this study.
Implementation To implement our attack algorithms we utilized a Python library, Cleverhans [
33], which is a library to benchmark machine learning systems’ vulnerability to adversarial examples. It is an adversarial example library for constructing attacks, building defenses, and benchmarking [
33].
Parameters The proposed method has several methods for each attack and defense scenario. Several experiments were conducted with different values for each parameter to achieve desirable results, that is, higher misclassification rate in attack phase, while lower fooling rate in defence phase. For instance, PGD has two parameters to be tuned, and number of iterations, which were set to 0.15 and 10, respectively. With this setting we were able to achieve a fooling rate of 99.98%.
Evaluation System All of the experiments are conducted on a Lambda Quad deep learning workstation with Ubuntu 18.04 OS, Intel Xeon E5-1650 v4 CPU, 64 GB DDR4 RAM, 2TB SSD, 4TB HDD, and 4 NVIDIA Titan-V Graphics Processing Units (GPUs).
4.3. Evaluation Metrics
In order to find out how well the attacks and defense mechanisms are performing, a set of proper performance metrics need to be defined. In classification tasks this metrics can be accuracy or F1 score as defined in (
7) and (
8), respectively. Although in the adversarial example generation domain accuracy score is a mandatory metric, we need more evaluation metrics to measure the quality of the synthesized examples and their fooling rate. The quality of generated adversarial examples is examined via calculating the distance metric as average of their
norm distance between the normalized form of clean data and the synthesized data. Furthermore, the fooling rate is considered to be the number of samples that are not classified correctly.
where True Positive (TP) and True Negative (TN) are correctly identified botnet and normal samples, respectively. While False Positive (FP) is normal samples that are classified as botnet, False Negative (FN) is botnet samples that are classified as normal. These metrics are used to create the confusion matrix, as shown in
Table 2.
4.4. Results
In this section, we present our experiments with the systematic approaches described above to see the proposed approach’s capabilities in generating new adversarial examples and defending against them. All our experiments are conducted on the CTU-13 dataset [
32], which is a dataset with given labels for botnet, normal and background traffic. In this study, we only employed the botnet and the normal traffic data. The collected set are transformed into gray-scale images using the technique described in
Section 2.1.
Victim Model Performance The generated images are utilized to build our victim model based on convolutional neural network architecture, as described in
Section 3.1. The evaluation results of these experiments are shown in
Table 3. As it can be seen from this table, our model is able to achieve an accuracy rate of 99.99% and an F1 score of 99.98%. Note that this model will be used as our victim model (baseline) model for generating adversarial examples and defending against them.
Gradient-based Adversarial Attacks In order to examine the robustness of our victim model against gradient-based adversarial attacks, we used three well-established adversarial methods for experiments. Our obtained results demonstrate that all FGSM, DeepFool, and PGD methods are able to fool the classifier with high success rate. The obtained results for these experiments are presented in
Table 4. For instance, PGD is able to achieve a fooling rate of 99.98% with an average distortion rate of 18.93%, outperforming the other two attacks. We assume that lower fooling rate of the DeepFool is due to its inherent presumptions, where classifiers are considered to be linear with hyperplanes separating each of the classes from another.
GAN-based Iterative Attack and Defense By generating a plethora of adversarial examples using gradient-based attack methods, the main goal of the GAN-based iterative attack and defense is to generate both more powerful adversarial examples and improve the robustness of the model at the same time. We evaluated performance of this attack and defense strategy with all three gradient-based adversarial attacks. In the nutshell, we synthesize adversarial examples similar to that of gradient-based approach to retrain the victim model and then test the robustness of the victim model against original adversarial example, which are not seen by the victim model. This means that we are improving the robustness of the victim model against gradient-based adversarial examples without actually showing them to the victim model.
FGSMFigure 6 demonstrates performance of proposed method in generating stronger adversarial examples, which are used to improve the performance of the victim model to defend against FGSM-based adversarial examples. As can be seen, after each iteration it was able to synthesize more samples that are fooling the victim model. Retraining the model using these synthesized examples, on the other hand, as it was expected improves the decision boundaries of the victim model such that fooling rate drops from 673 to 237 samples only after five iterations.
DeepFool The obtained results for iterative attack and defense based on DeepFool are shown in
Figure 7. Although the fooling rate of the DeepFool algorithm was only 67.73%, the GAN-based algorithm is able to generate similar number of successful synthesized adversarial examples as FGSM. This is promising as it demonstrates that our GAN-based approach is able to generate new and strong adversarial examples with even fewer number of samples. Similarly, the robustness of the retrained victim model is improved.
PGD The obtained results for iterative attack and defense based on PGD is shown in
Figure 8. The obtained results from our experiments demonstrates a similar trend to that of both FGSM and DeepFool methods.
In order to better understand the results, we present our obtained results in a normalized format in
Figure 9. As can be seen from this figure, FGSM and PGD methods follow very similar trends both in attack and defense phases. The DeepFool method performs relatively better in attack phase; however, it shows less success in the defense phase. In addition, the trend shows that although number of new fooling samples increases after each iteration, the defensive power of the system saturates after few iterations. This is due to the fact that distribution of the synthesized adversarial samples experiences a shifts after each iteration. Therefore, the victim model starts to learning new distribution other than the original distribution.
5. Conclusions
Thanks to the powerful learning capabilities of neural networks, they are actively being used in a wide range of domains, specifically sensitive domains. However, it has been shown that adversarial entities are able to manipulate the inference of the model through adversarial examples. Although there exist several defensive methods, it has shown that most of them can be circumvented. Thus, in this work, we propose a novel defense system leveraging the unique power of GANs to generate adversarial examples for retraining. We approach attacking the neural network in an automated way, using an external GAN, that is, Pix2Pix conditional GAN, to understand the transformations between adversarial examples and clean data, and to generate unseen adversarial examples automatically. After attacking the neural network, we create a plethora of adversarial examples in an iterative manner and use them to automate defense against adversaries. Via doing so, we develop a defense mechanism which is practical in real-world application with pre-trained neural networks. We evaluate the performance of our developed method against visualization-based botnet detection systems. Our results demonstrate the success of our proposed method. For example, the number of fooling adversarial examples generated by the PGD method is decreased from 824 to 226 samples after only five iterations.